MCP Web Extractor

MCP Web Extractor

By iemong GitHub

MCP server that extracts web content using Readability.js

mcp web-extractor
Overview

what is MCP Web Extractor?

MCP Web Extractor is a Model Context Protocol (MCP) server that extracts web content using Readability.js, allowing users to fetch and save clean, readable versions of articles.

how to use MCP Web Extractor?

To use MCP Web Extractor, clone the repository, install dependencies, build the project, and start the server. You can then extract content from a URL using the provided client example or integrate it with Obsidian.

key features of MCP Web Extractor?

  • Extracts readable content from any URL
  • Removes ads, sidebars, and other distractions
  • Returns clean text along with metadata (title, excerpt, etc.)
  • Easy integration with Obsidian via MCP

use cases of MCP Web Extractor?

  1. Saving clean versions of articles for note-taking in Obsidian.
  2. Extracting content for research purposes.
  3. Creating a distraction-free reading experience by removing unnecessary elements from web pages.

FAQ from MCP Web Extractor?

  • Can MCP Web Extractor extract content from any website?

Yes! It can extract readable content from any URL.

  • Is there a way to integrate it with other applications?

Yes! It can be integrated with Obsidian and other applications using the MCP protocol.

  • What technologies does MCP Web Extractor use?

It uses Readability.js for content extraction and is built with TypeScript.

Content

MCP Web Extractor

A Model Context Protocol (MCP) server that extracts web content using Readability.js. This tool fetches web pages and extracts the main content, making it ideal for saving clean, readable versions of articles to Obsidian notes.

Features

  • Extracts readable content from any URL
  • Removes ads, sidebars, and other distractions
  • Returns clean text along with metadata (title, excerpt, etc.)
  • Easy integration with Obsidian via MCP

Installation

# Clone the repository
git clone https://github.com/iemong/mcp-web-extractor.git
cd mcp-web-extractor

# Install dependencies
npm install

# Build the project
npm run build

# Start the server
npm start

The server will start on http://localhost:3000 with the MCP endpoint at http://localhost:3000/mcp.

Usage

As a standalone service

You can use the included client example to extract content from a URL:

ts-node-esm client-example.ts

With Obsidian

The obsidian-integration.ts file provides an example of how to integrate this MCP server with Obsidian. You can use it as a starting point for creating an Obsidian plugin that extracts web content.

API

The MCP server provides the following capability:

  • extract-content: Extracts readable content from a given URL
    • Parameters: { url: string }
    • Returns: { title, content, textContent, excerpt, siteName }

License

MIT

No tools information available.

This is a basic MCP Server-Client Impl using SSE

mcp server-client
View Details

-

mcp model-context-protocol
View Details

Buttplug.io Model Context Protocol (MCP) Server

mcp buttplug
View Details

MCP web search using perplexity without any API KEYS

mcp puppeteer
View Details

free MCP server hosting using vercel

mcp mantle-network
View Details

MCPHubs is a website that showcases projects related to Anthropic's Model Context Protocol (MCP)

mcp mcp-server
View Details