what is mcp-server-datahub?
The mcp-server-datahub is a data integration server designed to facilitate the synchronization and management of data across various sources.
how to use mcp-server-datahub?
To use mcp-server-datahub, set up the environment by running the provided setup commands, initialize the datahub token, and run the server in development mode.
key features of mcp-server-datahub?
- Easy setup and configuration for data synchronization.
- Development mode for testing and debugging.
- Integration with the datahub for seamless data management.
use cases of mcp-server-datahub?
- Synchronizing data from multiple sources for analysis.
- Managing data workflows in research projects.
- Facilitating data integration in cloud applications.
FAQ from mcp-server-datahub?
- What programming language is mcp-server-datahub built with?
mcp-server-datahub is built using Python.
- How do I run the server?
You can run the server by activating the virtual environment and executing the command
mcp dev mcp_server.py
.
- Is there any documentation available?
Yes, you can find the documentation on the project's GitHub page.
mcp-server-datahub
A Model Context Protocol server implementation for DataHub. This enables AI agents to query DataHub for metadata and context about your data ecosystem.
Supports both DataHub Core and DataHub Cloud.
Features
- Searching across all entity types and using arbitrary filters
- Fetching metadata for any entity
- Traversing the lineage graph, both upstream and downstream
- Listing SQL queries associated with a dataset
Demo
Check out the demo video, done in collaboration with the team at Block.
Usage
-
Install
uv
# On macOS and Linux. curl -LsSf https://astral.sh/uv/install.sh | sh
-
Locate your authentication details
For authentication, you'll need the following:
- The URL of your DataHub instance e.g.
https://tenant.acryl.io/gms
- A personal access token
Alternative: Using ~/.datahubenv for authentication
You can also use a
~/.datahubenv
file to configure your authentication. The easiest way to create this file is to rundatahub init
and follow the prompts.uvx --from acryl-datahub datahub init
- The URL of your DataHub instance e.g.
-
Configure your MCP client. See below - this will vary depending on your agent.
Claude Desktop
Run which uvx
to find the full path to the uvx
command.
In your claude_desktop_config.json
file, add the following:
{
"mcpServers": {
"datahub": {
"command": "<full-path-to-uvx>", # e.g. /Users/hsheth/.local/bin/uvx
"args": ["mcp-server-datahub"],
"env": {
"DATAHUB_GMS_URL": "<your-datahub-url>",
"DATAHUB_GMS_TOKEN": "<your-datahub-token>"
}
}
}
}
Cursor
In .cursor/mcp.json
, add the following:
{
"mcpServers": {
"datahub": {
"command": "uvx",
"args": ["mcp-server-datahub"],
"env": {
"DATAHUB_GMS_URL": "<your-datahub-url>",
"DATAHUB_GMS_TOKEN": "<your-datahub-token>"
}
}
}
}
Other MCP Clients
command: uvx
args:
- mcp-server-datahub
env:
DATAHUB_GMS_URL: <your-datahub-url>
DATAHUB_GMS_TOKEN: <your-datahub-token>
Troubleshooting
spawn uvx ENOENT
The full stack trace might look like this:
2025-04-08T19:58:16.593Z [datahub] [error] spawn uvx ENOENT {"stack":"Error: spawn uvx ENOENT\n at ChildProcess._handle.onexit (node:internal/child_process:285:19)\n at onErrorNT (node:internal/child_process:483:16)\n at process.processTicksAndRejections (node:internal/process/task_queues:82:21)"}
Solution: Replace the uvx
bit of the command with the output of which uvx
.
Developing
See DEVELOPING.md.