what is mcp-rag-server?
mcp-rag-server is a Model Context Protocol (MCP) server that enables Retrieval Augmented Generation (RAG) capabilities, allowing Large Language Models (LLMs) to answer questions based on indexed document content.
how to use mcp-rag-server?
To use mcp-rag-server, configure it with your MCP client, set the necessary environment variables, and run the server either from source or using Docker Compose.
key features of mcp-rag-server?
- Efficient indexing of documents for retrieval.
- Generation of vector embeddings for document chunks.
- Querying capabilities to retrieve contextually relevant information.
use cases of mcp-rag-server?
- Integrating RAG functionalities into applications.
- Answering user queries based on indexed documents.
- Enhancing LLM responses with relevant document context.
FAQ from mcp-rag-server?
- What types of documents can be indexed?
Supported file types include .txt, .md, .json, .jsonl, and .csv.
- How do I run the server?
You can run the server from source or using Docker Compose for easier setup.
- Is there a recommended way to run the server?
Yes, using Docker Compose is recommended for managing dependencies.
mcp-rag-server - RAG MCP Server
mcp-rag-server is a Model Context Protocol (MCP) server that enables Retrieval Augmented Generation (RAG) capabilities. It empowers Large Language Models (LLMs) to answer questions based on your document content by indexing and retrieving relevant information efficiently.
Table of Contents
- Overview
- MCP Server Usage
- Installation
- Available RAG Tools
- How RAG Works
- Environment Variables
- Integrating with Your Client and AI Agent
- Development
- Contributing
- License
Overview
mcp-rag-server allows you to seamlessly integrate RAG functionalities into your applications. It works by:
- Indexing: Parsing documents and splitting them into manageable chunks.
- Embedding: Generating vector embeddings for each text chunk.
- Querying: Matching query embeddings with stored document chunks to retrieve context.
This enables downstream LLMs (via MCP clients like Claude Desktop) to generate contextually relevant responses.
MCP Server Usage
Basic Configuration
Integrate the server with your MCP client by adding the following to your configuration:
{
"mcpServers": {
"rag": {
"command": "npx",
"args": ["-y", "mcp-rag-server"]
}
}
}
Note: Ensure that the required environment variables are set in the environment where your MCP client runs the command.
Advanced Configuration
For custom settings, including environment variables:
{
"mcpServers": {
"rag": {
"command": "npx",
"args": ["-y", "mcp-rag-server"],
"env": {
"BASE_LLM_API": "http://localhost:11434/v1",
"LLM_API_KEY": "",
"EMBEDDING_MODEL": "granite-embedding-278m-multilingual-Q6_K-1743674737397:latest",
"VECTOR_STORE_PATH": "/user-dir/vector_store_locate/",
"CHUNK_SIZE": "500"
}
}
}
}
Note: Environment variable configuration via the client depends on its capabilities. System-level environment variables are generally recommended.
Installation
From Source
-
Clone the Repository:
git clone https://github.com/yourusername/mcp-rag-server.git cd mcp-rag-server
-
Install Dependencies:
npm install
-
Build the Project:
npm run build
-
Run the Server:
Ensure your environment variables are set, then start the server:
npm start
Running with Docker Compose (Recommended)
This is the recommended way to run the server along with its dependencies (ChromaDB and Ollama) in isolated containers.
-
Prerequisites:
- Install Docker Desktop (or Docker Engine on Linux).
-
Clone the Repository (if not already done):
git clone https://github.com/yourusername/mcp-rag-server.git # Replace with actual repo URL cd mcp-rag-server
-
Start the Services:
From the project root directory, run:
docker-compose up -d
- This command will build the
rag-server
image (if not already built), download the official ChromaDB and Ollama images, and start all three services in the background (-d
). - The first time you run this, it might take a while to download the images.
- The server will automatically start indexing the project directory on startup (this behavior can be configured via environment variables in
docker-compose.yml
).
- This command will build the
-
Check Logs (Optional):
docker-compose logs -f rag-server
-
Stopping the Services:
docker-compose down
This will stop and remove the containers, but the data stored in volumes (ChromaDB data, Ollama models) will persist.
Available RAG Tools
The server provides the following operations accessible via MCP:
-
index_documents:
Index documents from a directory or a single file.
Supported file types:.txt
,.md
,.json
,.jsonl
,.csv
-
query_documents:
Retrieve context by querying the indexed documents using RAG. -
remove_document:
Delete a specific document from the index by its path. -
remove_all_documents:
Clear the entire document index (confirmation required). -
list_documents:
Display all indexed document paths.
How RAG Works
The RAG process in the server consists of the following steps:
-
Indexing:
Theindex_documents
tool accepts a file or directory path to begin processing. -
Chunking & Embedding:
The server splits documents into chunks (configurable viaCHUNK_SIZE
) and generates vector embeddings using the specifiedEMBEDDING_MODEL
via theBASE_LLM_API
. -
Storing:
The embeddings and chunks are stored in a local vector database at the path specified byVECTOR_STORE_PATH
. -
Querying:
Whenquery_documents
is called, the server generates an embedding for your query. -
Searching:
It retrieves the topk
document chunks that match the query. -
Contextualization:
The retrieved chunks are returned as context to your LLM, which then generates a final answer.
flowchart LR
A[User provides document path via index_documents] --> B(RAG Server Reads & Chunks Docs)
B --> C{Generate Embeddings via LLM API}
C --> D[Store Embeddings & Chunks in Vector DB]
E[User asks question via query_documents] --> F{Generate Query Embedding}
F --> G{Search Vector DB}
G -- Top k Chunks --> H[Return Context to User/Client]
H --> I(Client/LLM Generates Final Answer)
Environment Variables
The server relies on several environment variables. These can be set at the system level or passed via your MCP client configuration.
Default Environment Settings
If not explicitly set, the following defaults from the code will be used:
-
BASE_LLM_API
(Required)
The base URL for the embedding API endpoint.
Default:http://localhost:11434/v1
-
LLM_API_KEY
(Optional)
API key for the embedding service (if required).
Default:""
(empty string) -
EMBEDDING_MODEL
(Required)
The embedding model to use with the API.
Default:granite-embedding-278m-multilingual-Q6_K-1743674737397:latest
-
VECTOR_STORE_PATH
(Optional)
The directory path for storing the vector database.
Default:./vector_store
-
CHUNK_SIZE
(Optional)
The target size (in characters) for splitting documents into chunks.
Default:500
Configuration Examples for Embedding Providers
1. Ollama (Local)
- Setup:
- Ensure Ollama is running and the desired model is pulled (e.g.,
ollama pull nomic-embed-text
).
- Ensure Ollama is running and the desired model is pulled (e.g.,
- Variables:
BASE_LLM_API=http://localhost:11434/v1 LLM_API_KEY= EMBEDDING_MODEL=nomic-embed-text
2. LM Studio (Local)
- Setup:
- Start the LM Studio server and load an embedding model.
- Variables:
BASE_LLM_API=http://localhost:1234/v1 LLM_API_KEY= EMBEDDING_MODEL=lm-studio-model
3. OpenAI API
- Setup:
- Use your OpenAI credentials.
- Variables:
BASE_LLM_API=https://api.openai.com/v1 LLM_API_KEY=YOUR_OPENAI_API_KEY EMBEDDING_MODEL=text-embedding-ada-002
4. OpenRouter
- Setup:
- Use your OpenRouter API key.
- Variables:
BASE_LLM_API=https://openrouter.ai/api/v1 LLM_API_KEY=YOUR_OPENROUTER_API_KEY EMBEDDING_MODEL=openai/text-embedding-ada-002
5. Google Gemini (via OpenAI Compatibility Endpoint)
- Setup:
- Follow Google’s instructions to enable the compatibility endpoint.
- Variables:
BASE_LLM_API=https://generativelanguage.googleapis.com/v1beta LLM_API_KEY=YOUR_GEMINI_API_KEY EMBEDDING_MODEL=embedding-001
Important: Always refer to your provider’s documentation for precise API endpoints, model names, and authentication requirements.
Integrating with Your Client and AI Agent
After setting up the MCP server, integrate it with your client (or AI agent) so that it can leverage RAG operations seamlessly.
Configuring Your MCP Client
Ensure your client configuration includes the RAG server as shown below:
{
"mcpServers": {
"rag": {
"command": "npx",
"args": ["-y", "mcp-rag-server"],
"env": {
"BASE_LLM_API": "http://localhost:11434/v1",
"LLM_API_KEY": "",
"EMBEDDING_MODEL": "granite-embedding-278m-multilingual-Q6_K-1743674737397:latest",
"VECTOR_STORE_PATH": "./vector_store",
"CHUNK_SIZE": "500"
}
}
}
}
Example Chat Conversation
Below is an example conversation that demonstrates how an AI agent might instruct the MCP server to index documents and query the indexed documents:
User:
Hey, can you add my documents for indexing? I have them stored in /data/docs
.
AI Agent:
Sure, let me index the documents from /data/docs
now.
([Tool Call]: The agent issues an "index_documents" command with the path /data/docs
.)
AI Agent (after processing):
The documents have been successfully indexed.
User:
Great! Now, could you help me find out what the main topics are in our latest report?
AI Agent:
Okay, I'll query the indexed documents to retrieve context related to your report.
([Tool Call]: The agent issues a "query_documents" command with the query "What are the main topics in our latest report?")
AI Agent (after processing):
I found some relevant context from your documents. Based on the retrieved information, the main topics include market trends, customer feedback, and upcoming product features.
Development
Prerequisites
- Node.js (see
package.json
for version requirements) - npm
Building
npm run build
Testing
To be implemented:
# npm test
Contributing
Contributions are welcome! If you wish to propose changes or add features, please:
- Open an issue for discussion before submitting a pull request.
- Follow the code style and commit guidelines provided in the repository.
License
This project is licensed under the MIT License.