Crawl4ai MCP Server

By Kirill812 GitHub

A Model Context Protocol server that provides web crawling capabilities using Crawl4ai

Overview

What is Crawl4ai MCP Server?

Crawl4ai MCP Server is a Model Context Protocol server that provides web crawling capabilities using Crawl4ai, allowing users to extract markdown content from web pages.

How to use Crawl4ai MCP Server?

To use the Crawl4ai MCP Server, clone the repository, install the dependencies, build the server, and configure the environment variables for the Crawl4ai API.

Key features of Crawl4ai MCP Server?

Crawls web pages and retrieves markdown content with citations.
Supports multiple URLs in a single request.
Handles retries with different user agents if blocked by websites.

Use cases of Crawl4ai MCP Server?

Extracting content from multiple web pages for research purposes.
Automating the collection of data from various online sources.
Generating markdown documentation from web content.

FAQ from Crawl4ai MCP Server?

What are the prerequisites for using the server?

You need Node.js and access to a Crawl4ai instance.

How do I handle errors while crawling?

Ensure URLs are valid, check network connectivity, and manage authentication tokens if required.

Is the server free to use?

Yes, the server is open-source and licensed under the MIT License.

Content

Crawl4ai MCP Server

A MCP server that provides web crawling capabilities using crawl4ai with markdown output for the LLM.

Installation

Prerequisites

Node.js
Access to the crawl4ai instance: https://docs.crawl4ai.com/core/docker-deployment/

Setup

Clone the repository:

git clone https://github.com/Kirill812/crawl4ai-mcp-server.git
cd crawl4ai-mcp-server

Install dependencies:

npm install

Build the server:

npm run build

Add the server configuration to your environment:

{
  "mcpServers": {
    "crawl4ai": {
      "command": "node",
      "args": [
        "/path/to/crawl4ai-mcp-server/build/index.js"
      ],
      "env": {
        "CRAWL4AI_API_URL": "http://127.0.0.1:11235",
        "CRAWL4AI_AUTH_TOKEN": "your-auth-token"           // Optional: if authentication is needed
      }
    }
  }
}

Replace the environment variables with your values:

CRAWL4AI_API_URL: URL of the crawl4ai API service (optional)
CRAWL4AI_AUTH_TOKEN: Authentication token for the API (optional)

Features

Tools

crawl_urls - Crawl web pages and get markdown content with citations
- Parameters:
  - urls (required): List of URLs to crawl

Response Format

The tool returns markdown content with citations for each URL. Multiple URLs are separated by horizontal rules (---). Example:

This is content from the first URL [^1]

[^1]: https://example.com

---

This is content from the second URL [^2]

[^2]: https://example.org

Development

For development with auto-rebuild:

npm run watch

Error Handling

Common issues and solutions:

Make sure the URLs are valid and accessible
If using authentication, ensure the token is valid
Check network connectivity to the crawl4ai API service
For timeout errors, try reducing the number of URLs per request
If getting blocked by websites, the service will automatically handle retries with different user agents

License

This MCP server is licensed under the MIT License. This means you are free to use, modify, and distribute the software, subject to the terms and conditions of the MIT License. For more details, please see the LICENSE file in the project repository.

No tools information available.

No content found.