Databricks MCP Server

By RafaelCartenet GitHub

Overview

What is Mcp Databricks Server?

Mcp Databricks Server is a Model Context Protocol (MCP) server designed for executing SQL queries against Databricks using the Statement Execution API. It allows AI assistants to directly query Databricks data warehouses, analyze database schemas, and retrieve query results in a structured format while adhering to permission boundaries.

How to use Mcp Databricks Server?

To use the server, install the required dependencies, set up your environment variables with your Databricks credentials, and run the server in standalone mode or configure it with Cursor for AI assistance.

Key features of Mcp Databricks Server?

Execute SQL queries on Databricks.
List available schemas in a catalog.
List tables in a schema.
Describe table schemas.
Handle long-running queries with polling.

Use cases of Mcp Databricks Server?

Executing complex SQL queries on large datasets.
Analyzing database schemas for better data management.
Integrating with AI assistants for automated data retrieval.

FAQ from Mcp Databricks Server?

What are the system requirements?

Python 3.10+ is required along with necessary dependencies.

How do I set up permissions?

Ensure that the user associated with the provided token has appropriate permissions for SQL warehouse access and data access.

Can I run long queries?

Yes, the server is designed to handle long-running queries with a default timeout of 10 minutes.

Content

Databricks MCP Server

This is a Model Context Protocol (MCP) server for executing SQL queries against Databricks using the Statement Execution API. It can retrieve data by performing SQL requests using the Databricks API. When used in an Agent mode, it can successfully iterate over a number of requests to perform complex tasks. It is even better when coupled with Unity Catalog Metadata.

Features

Execute SQL queries on Databricks
List available schemas in a catalog
List tables in a schema
Describe table schemas

Setup

System Requirements

Python 3.10+
If you plan to install via uv, ensure it's installed

Installation

Install the required dependencies:

pip install -r requirements.txt

Or if using uv:

uv pip install -r requirements.txt

Set up your environment variables:

Option 1: Using a .env file (recommended)

Create a .env file with your Databricks credentials:

DATABRICKS_HOST=your-databricks-instance.cloud.databricks.com
DATABRICKS_TOKEN=your-databricks-access-token
DATABRICKS_SQL_WAREHOUSE_ID=your-sql-warehouse-id

Option 2: Setting environment variables directly

export DATABRICKS_HOST="your-databricks-instance.cloud.databricks.com"
export DATABRICKS_TOKEN="your-databricks-access-token"
export DATABRICKS_SQL_WAREHOUSE_ID="your-sql-warehouse-id"

You can find your SQL warehouse ID in the Databricks UI under SQL Warehouses.

Permissions Requirements

Before using this MCP server, ensure that:

SQL Warehouse Permissions: The user associated with the provided token must have appropriate permissions to access the specified SQL warehouse. You can configure warehouse permissions in the Databricks UI under SQL Warehouses > [Your Warehouse] > Permissions.
Token Permissions: The personal access token used should have the minimum necessary permissions to perform the required operations. It is strongly recommended to:
- Create a dedicated token specifically for this application
- Grant read-only permissions where possible to limit security risks
- Avoid using tokens with workspace-wide admin privileges
Data Access Permissions: The user associated with the token must have appropriate permissions to access the catalogs, schemas, and tables that will be queried.

To set SQL warehouse permissions via the Databricks REST API, you can use:

GET /api/2.0/sql/permissions/warehouses/{warehouse_id} to check current permissions
PATCH /api/2.0/sql/permissions/warehouses/{warehouse_id} to update permissions

For security best practices, consider regularly rotating your access tokens and auditing query history to monitor usage.

Running the Server

Standalone Mode

To run the server in standalone mode:

python main.py

This will start the MCP server using stdio transport, which can be used with Agent Composer or other MCP clients.

Using with Cursor

To use this MCP server with Cursor, you need to configure it in your Cursor settings:

Create a .cursor directory in your home directory if it doesn't already exist
Create or edit the mcp.json file in that directory:

mkdir -p ~/.cursor
touch ~/.cursor/mcp.json

Add the following configuration to the mcp.json file, replacing the directory path with the actual path to where you've installed this server:

{
    "mcpServers": {
        "databricks": {
            "command": "uv",
            "args": [
                "--directory",
                "/path/to/your/mcp-databricks-server",
                "run",
                "main.py"
            ]
        }
    }
}

If you're not using uv, you can use python instead:

{
    "mcpServers": {
        "databricks": {
            "command": "python",
            "args": [
                "/path/to/your/mcp-databricks-server/main.py"
            ]
        }
    }
}

Restart Cursor to apply the changes

Now you can use the Databricks MCP server directly within Cursor's AI assistant.

Available Tools

The server provides the following tools:

execute_sql_query: Execute a SQL query and return the results
```
execute_sql_query(sql: str) -> str
```
list_schemas: List all available schemas in a specific catalog
```
list_schemas(catalog: str) -> str
```
list_tables: List all tables in a specific schema
```
list_tables(schema: str) -> str
```
describe_table: Describe a table's schema
```
describe_table(table_name: str) -> str
```

Example Usage

In Agent Composer or other MCP clients, you can use these tools like:

execute_sql_query("SELECT * FROM my_schema.my_table LIMIT 10")
list_schemas("my_catalog")
list_tables("my_catalog.my_schema")
describe_table("my_catalog.my_schema.my_table")

Handling Long-Running Queries

The server is designed to handle long-running queries by polling the Databricks API until the query completes or times out. The default timeout is 10 minutes (60 retries with 10-second intervals), which can be adjusted in the dbapi.py file if needed.

Dependencies

httpx: For making HTTP requests to the Databricks API
python-dotenv: For loading environment variables from .env file
mcp: The Model Context Protocol library
asyncio: For asynchronous operations

No tools information available.

No content found.