
Multimodal Model Context Protocal Server
A multimodal mcp server
What is Multimodal Model Context Protocol Server?
The Multimodal Model Context Protocol Server is a server implementation designed to handle multimodal data indexing and querying, including audio, video, images, and documents.
How to use the Multimodal Model Context Protocol Server?
To use the server, clone the repository, install the required packages, and run the services using Docker. Each service can be accessed through designated endpoints for audio, video, image, and document indexing.
Key features of the Multimodal Model Context Protocol Server?
- Audio file indexing with transcription capabilities
- Video file indexing with frame extraction
- Image indexing with object detection
- Document indexing with text extraction and Retrieval-Augmented Generation (RAG) support
- Multi-index support for various data types
Use cases of the Multimodal Model Context Protocol Server?
- Indexing and searching audio files for content-based retrieval.
- Extracting frames from videos for analysis and search.
- Performing similarity searches on images.
- Extracting text from documents for enhanced search capabilities.
FAQ from the Multimodal Model Context Protocol Server?
- What types of data can be indexed?
The server can index audio, video, images, and documents.
- How do I run the server locally?
You can run the server locally using Docker by following the installation instructions provided in the repository.
- Is there support for community engagement?
Yes! You can join the Pixeltable community on Discord for support and discussions.
Multimodal Model Context Protocal Server
This repository contains a collection of server implementations for Pixeltable, designed to handle multimodal data indexing and querying (audio, video, images, and documents). These services are orchestrated using Docker for local development.
🚀 Available Servers
Audio Index Server
Located in servers/audio-index/
, this server provides:
- Audio file indexing with transcription capabilities
- Semantic search over audio content
- Multi-index support for audio collections
- Accessible at
/audio
endpoint
Video Index Server
Located in servers/video-index/
, this server provides:
- Video file indexing with frame extraction
- Content-based video search
- Accessible at
/video
endpoint
Image Index Server
Located in servers/image-index/
, this server provides:
- Image indexing with object detection
- Similarity search for images
- Accessible at
/image
endpoint
Document Index Server
Located in servers/doc-index/
, this server provides:
- Document indexing with text extraction
- Retrieval-Augmented Generation (RAG) support
- Accessible at
/doc
endpoint
Base SDK Server
Located in servers/base-sdk/
, this server provides:
- Core functionality for Pixeltable integration
- Foundation for building specialized servers
📦 Installation
Local Development
pip install pixeltable
git clone https://github.com/pixeltable/mcp-server-pixeltable.git
cd mcp-server-pixeltable/servers
docker-compose up --build # Run locally with docker-compose
docker-compose down # Take down resources
🔧 Configuration
- Each service runs on its designated port (8080 for audio, 8081 for video, 8082 for image, 8083 for doc).
- Configure service settings in the respective Dockerfile or through environment variables.
🔗 Links
📞 Support
- GitHub Issues: Report bugs or request features
- Discord: Join our community
📜 License
This project is licensed under the Apache 2.0 License.