PDF Reader MCP Server (@shtse8/pdf-reader-mcp)

PDF Reader MCP Server (@shtse8/pdf-reader-mcp)

By shtse8 GitHub

An MCP server built with Node.js/TypeScript that allows AI agents to securely read PDF files (local or URL) and extract text, metadata, or page counts. Uses pdf-parse.

nodejs pdf
Overview

What is PDF Reader MCP?

PDF Reader MCP is a Node.js/TypeScript server that enables AI agents to securely read PDF files from local or URL sources and extract text, metadata, or page counts using the pdf-parse library.

How to use PDF Reader MCP?

To use PDF Reader MCP, configure it in your MCP host settings, either using npx or as a Docker container, ensuring the project root directory is correctly set for local file access.

Key features of PDF Reader MCP?

  • Secure project root focus to prevent unauthorized access.
  • Supports reading PDFs from both local files and public URLs.
  • Efficient text and metadata extraction using the pdf-parse library.
  • A single read_pdf tool for various extraction needs.
  • Easy integration with minimal configuration.
  • Available as a Docker image for consistent deployment.
  • Robust validation of incoming tool arguments using Zod schemas.

Use cases of PDF Reader MCP?

  1. Extracting metadata and page counts from multiple PDF files.
  2. Retrieving full text from a specific PDF document.
  3. Extracting text from specific pages of different PDF files.

FAQ from PDF Reader MCP?

  • Can PDF Reader MCP read all types of PDF files?

Yes! It can read both local and publicly accessible PDF files.

  • Is PDF Reader MCP easy to integrate into existing projects?

Yes! It is designed for easy integration with minimal setup required.

  • How does PDF Reader MCP ensure security?

It confines all local file operations to the project root directory, preventing unauthorized access.

Content

PDF Reader MCP Server (@shtse8/pdf-reader-mcp)

npm version Docker Pulls

Empower your AI agents (like Cline/Claude) with the ability to read and extract information from PDF files within your project, using a single, flexible tool.

This Node.js server implements the Model Context Protocol (MCP) to provide a consolidated read_pdf tool for interacting with PDF documents (local or URL) located within a defined project root directory.


⭐ Why Use This Server?

  • 🛡️ Secure Project Root Focus:
    • All local file operations are strictly confined to the project root directory (determined by the server's launch context), preventing unauthorized access.
    • Uses relative paths for local files. Important: The server determines its project root from its own Current Working Directory (cwd) at launch. The process starting the server (e.g., your MCP host) must set the cwd to your intended project directory.
  • 🌐 URL Support: Can directly process PDFs from public URLs.
  • ⚡ Efficient PDF Processing:
    • Leverages the pdf-parse library for extracting text, metadata, and page information.
  • 🔧 Flexible & Consolidated Tool:
    • A single read_pdf tool handles various extraction needs via parameters, simplifying agent interaction.
  • 🚀 Easy Integration: Get started quickly using npx with minimal configuration.
  • 🐳 Containerized Option: Also available as a Docker image for consistent deployment environments.
  • ✅ Robust Validation: Uses Zod schemas to validate all incoming tool arguments.

The simplest way is via npx, configured in your MCP host (e.g., mcp_settings.json).

{
  "mcpServers": {
    "pdf-reader-mcp": {
      "command": "npx",
      "args": ["@shtse8/pdf-reader-mcp"],
      "name": "PDF Reader (npx)"
    }
  }
}

(Alternative) Using bunx:

{
  "mcpServers": {
    "pdf-reader-mcp": {
      "command": "bunx",
      "args": ["@shtse8/pdf-reader-mcp"],
      "name": "PDF Reader (bunx)"
    }
  }
}

Important: Ensure your MCP Host launches the command with the cwd set to your project's root directory for local file access.


✨ The read_pdf Tool

This server provides a single, powerful tool: read_pdf.

  • Description: Reads content, metadata, or page count from a PDF file (local or URL), controlled by parameters.
  • Input: An object containing:
    • sources (array): Required. An array of source objects. Each object must contain either path (string, relative path to local PDF) or url (string, URL of PDF). Each source object can optionally include:
      • pages (string | number[], optional): Extract text only from specific pages (1-based) or ranges (e.g., [1, 3, 5] or '1,3-5,7') for this specific source. If provided, the global include_full_text flag is ignored for this source.
    • include_full_text (boolean, optional, default false): Include the full text content for each PDF. Ignored if pages is provided.
    • include_metadata (boolean, optional, default true): Include metadata (info and metadata objects) for each PDF.
    • include_page_count (boolean, optional, default true): Include the total number of pages (num_pages) for each PDF.
  • Output: An object containing a results array. Each element corresponds to a source in the input sources array. Processing continues even if some sources fail. Each result object has the following structure:
    • source (string): The original path or URL provided for identification.
    • success (boolean): Indicates if processing this specific source was successful.
    • error (string, optional): Provides an error message if success is false for this source.
    • data (object, optional): Contains the extracted data if success is true for this source:
      • full_text (string, optional)
      • page_texts (array, optional): Array of { page: number, text: string }.
      • missing_pages (array, optional)
      • info (object, optional)
      • metadata (object, optional)
      • num_pages (number, optional)
      • warnings (array, optional): Non-critical warnings for this source (e.g., requested page out of bounds).
  1. Get metadata and page count for multiple files:

    {
      "sources": [
        { "path": "report.pdf" },
        { "url": "http://example.com/another.pdf" },
        { "path": "nonexistent.pdf" }
      ]
    }
    

    (Example Output: { "results": [ { "source": "report.pdf", "success": true, "data": { "info": {...}, "metadata": {...}, "num_pages": 10 } }, { "source": "http://example.com/another.pdf", "success": true, "data": { "info": {...}, "metadata": {...}, "num_pages": 5 } }, { "source": "nonexistent.pdf", "success": false, "error": "File not found..." } ] })

  2. Get full text for one file:

    {
      "sources": [{ "url": "http://example.com/document.pdf" }],
      "include_full_text": true,
      "include_metadata": false,
      "include_page_count": false
    }
    

    (Example Output: { "results": [ { "source": "http://example.com/document.pdf", "success": true, "data": { "full_text": "..." } } ] })

  3. Get text from different pages for different files:

    {
      "sources": [
        { "path": "manual.pdf", "pages": "1-2" },
        { "url": "http://example.com/report.pdf", "pages": [5] }
      ],
      "include_metadata": false /* Default is true, explicitly set false */,
      "include_page_count": false /* Default is true, explicitly set false */
    }
    

    (Example Output: { "results": [ { "source": "manual.pdf", "success": true, "data": { "page_texts": [...] } }, { "source": "http://example.com/report.pdf", "success": true, "data": { "page_texts": [...] } } ] })


🐳 Alternative Usage: Docker

Configure your MCP Host to run the Docker container, mounting your project directory to /app.

{
  "mcpServers": {
    "pdf-reader-mcp": {
      "command": "docker",
      "args": [
        "run",
        "-i",
        "--rm",
        "-v",
        "/path/to/your/project:/app",
        "shtse8/pdf-reader-mcp:latest"
      ],
      "name": "PDF Reader (Docker)"
    }
  }
}

Note on Volume Mount Path: Instead of hardcoding /path/to/your/project, you can often use shell variables to automatically use the current working directory:

  • Linux/macOS: -v "$PWD:/app"
  • Windows Cmd: -v "%CD%:/app"
  • Windows PowerShell: -v "${PWD}:/app"
  • VS Code Tasks/Launch: You might be able to use ${workspaceFolder} if supported by your MCP host integration.

🛠️ Other Usage Options

Local Build (For Development)

  1. Clone: git clone https://github.com/shtse8/pdf-reader-mcp.git
  2. Install: cd pdf-reader-mcp && npm install
  3. Build: npm run build
  4. Configure MCP Host:
    {
      "mcpServers": {
        "pdf-reader-mcp": {
          "command": "node",
          "args": ["/path/to/cloned/repo/pdf-reader-mcp/build/index.js"],
          "name": "PDF Reader (Local Build)"
        }
      }
    }
    

💻 Development

  1. Clone, npm install, npm run build.
  2. npm run watch for auto-recompile.

🚢 Publishing (via GitHub Actions)

Uses GitHub Actions (.github/workflows/publish.yml) to publish to npm and Docker Hub on pushes to main. Requires NPM_TOKEN, DOCKERHUB_USERNAME, DOCKERHUB_TOKEN secrets.


🙌 Contributing

Contributions welcome! Open an issue or PR.

No tools information available.
Minecraft MCP Server
Minecraft MCP Server by yuniko-software

A Minecraft MCP Server powered by Mineflayer API. It allows to control a Minecraft character in real-time, allowing AI assistants to build structures, explore the world, and interact with the game environment through natural language instruction

nodejs javascript
View Details

MCP Tools Usage From LangChain ReAct Agent / Example in TypeScript

nodejs typescript
View Details

MCP Client Implementation Using LangChain ReAct Agent / TypeScript

nodejs typescript
View Details

【Every star you give feeds a hungry developer's motivation!⭐️】A Model Context Protocol (MCP) server implementation that provides Google Jobs search capabilities via SerpAPI integration. Features multi-language support, comprehensive search parameters, and smart error handling.

nodejs typescript
View Details

An MCP server built with Node.js/TypeScript that allows AI agents to securely read PDF files (local or URL) and extract text, metadata, or page counts. Uses pdf-parse.

nodejs pdf
View Details
MCP Hub
MCP Hub by ravitemer

A centralized manager for Model Context Protocol (MCP) servers with dynamic server management and monitoring

nodejs ai
View Details

【Star-crossed coders unite!⭐️】Model Context Protocol (MCP) server implementation providing Google News search capabilities via SerpAPI, with automatic news categorization and multi-language support.

nodejs typescript
View Details