MCP Iceberg Catalog

By ahodroj GitHub

MCP server for interacting with Apache Iceberg catalog from Claude, enabling data lake discovery and metadata search through a LLM prompt.

mcp-iceberg data-lake

Overview

What is MCP Iceberg Catalog?

MCP Iceberg Catalog is a server implementation for interacting with Apache Iceberg, enabling data lake discovery and metadata search through a LLM prompt.

How to use MCP Iceberg Catalog?

To use the MCP Iceberg Catalog, install it via Smithery in Claude Desktop by running a specific command in your terminal and configuring the necessary settings in the claude_desktop_config.json file.

Key features of MCP Iceberg Catalog?

SQL interface for querying and managing Iceberg tables.
Integration with Claude desktop for enhanced data lake management.
Support for various SQL operations like LIST TABLES, DESCRIBE TABLE, SELECT, and INSERT.

Use cases of MCP Iceberg Catalog?

Managing and querying large datasets in data lakes.
Facilitating metadata search for Iceberg tables.
Enabling data operations through a user-friendly SQL interface.

FAQ from MCP Iceberg Catalog?

What are the prerequisites for installation?

You need Python 3.10 or higher, a UV package installer, and access to an Iceberg REST catalog and S3-compatible storage.

Can I use it without Claude Desktop?

The MCP Iceberg Catalog is designed to work with Claude Desktop for optimal performance and usability.

What SQL operations are supported?

Currently, it supports LIST TABLES, DESCRIBE TABLE, SELECT, and INSERT operations, with more features planned for future updates.

Content

MCP Iceberg Catalog

A MCP (Model Context Protocol) server implementation for interacting with Apache Iceberg. This server provides a SQL interface for querying and managing Iceberg tables through Claude desktop.

Claude Desktop as your Iceberg Data Lake Catalog

How to Install in Claude Desktop

Installing via Smithery

To install MCP Iceberg Catalog for Claude Desktop automatically via Smithery:

npx -y @smithery/cli install @ahodroj/mcp-iceberg-service --client claude

Prerequisites
- Python 3.10 or higher
- UV package installer (recommended) or pip
- Access to an Iceberg REST catalog and S3-compatible storage
How to install in Claude Desktop Add the following configuration to claude_desktop_config.json:

{
  "mcpServers": {
    "iceberg": {
      "command": "uv",
      "args": [
        "--directory",
        "PATH_TO_/mcp-iceberg-service",
        "run",
        "mcp-server-iceberg"
      ],
      "env": {
        "ICEBERG_CATALOG_URI" : "http://localhost:8181",
        "ICEBERG_WAREHOUSE" : "YOUR ICEBERG WAREHOUSE NAME",
        "S3_ENDPOINT" : "OPTIONAL IF USING S3",
        "AWS_ACCESS_KEY_ID" : "YOUR S3 ACCESS KEY",
        "AWS_SECRET_ACCESS_KEY" : "YOUR S3 SECRET KEY"
      }
    }
  }
}

Design

Architecture

The MCP server is built on three main components:

MCP Protocol Handler
- Implements the Model Context Protocol for communication with Claude
- Handles request/response cycles through stdio
- Manages server lifecycle and initialization
Query Processor
- Parses SQL queries using sqlparse
- Supports operations:
  - LIST TABLES
  - DESCRIBE TABLE
  - SELECT
  - INSERT
Iceberg Integration
- Uses pyiceberg for table operations
- Integrates with PyArrow for efficient data handling
- Manages catalog connections and table operations

PyIceberg Integration

The server utilizes PyIceberg in several ways:

Catalog Management
- Connects to REST catalogs
- Manages table metadata
- Handles namespace operations
Data Operations
- Converts between PyIceberg and PyArrow types
- Handles data insertion through PyArrow tables
- Manages table schemas and field types
Query Execution
- Translates SQL to PyIceberg operations
- Handles data scanning and filtering
- Manages result set conversion

Further Implementation Needed

Query Operations
- Implement UPDATE operations
- Add DELETE support
- Support for CREATE TABLE with schema definition
- Add ALTER TABLE operations
- Implement table partitioning support
Data Types
- Support for complex types (arrays, maps, structs)
- Add timestamp with timezone handling
- Support for decimal types
- Add nested field support
Performance Improvements
- Implement batch inserts
- Add query optimization
- Support for parallel scans
- Add caching layer for frequently accessed data
Security Features
- Add authentication mechanisms
- Implement role-based access control
- Add row-level security
- Support for encrypted connections
Monitoring and Management
- Add metrics collection
- Implement query logging
- Add performance monitoring
- Support for table maintenance operations
Error Handling
- Improve error messages
- Add retry mechanisms for transient failures
- Implement transaction support
- Add data validation

No tools information available.

Documentación del TFG: Interconexión entre Espacios de Datos e Inteligencia Artificial Generativa by jaimealruiz

Diseño e Implementación de interconexión entre LLM y Espacios de Datos mediante Model Context Protocol (MCP)

mcp data-lake

View Details