MCP Webscan Server

By MCP-Mirror GitHub

Mirror of

mcp webscan

Overview

what is MCP Webscan Server?

MCP Webscan Server is a Model Context Protocol (MCP) server designed for web content scanning and analysis, providing tools for fetching, analyzing, and extracting information from web pages.

how to use MCP Webscan Server?

To use the MCP Webscan Server, clone the repository, install dependencies, and start the server. You can then utilize various tools to fetch pages, extract links, crawl sites, check links, find patterns, and generate sitemaps.

key features of MCP Webscan Server?

Page Fetching: Convert web pages to Markdown for easy analysis
Link Extraction: Extract and analyze links from web pages
Site Crawling: Recursively crawl websites to discover content
Link Checking: Identify broken links on web pages
Pattern Matching: Find URLs matching specific patterns
Sitemap Generation: Generate XML sitemaps for websites

use cases of MCP Webscan Server?

Analyzing web content for research purposes
Checking for broken links on websites
Generating sitemaps for SEO optimization
Extracting specific data from web pages for data analysis

FAQ from MCP Webscan Server?

What is the main purpose of MCP Webscan Server?

The main purpose is to provide tools for web content analysis and information extraction.

Is MCP Webscan Server free to use?

Yes! It is open-source and free to use.

What technologies are required to run MCP Webscan Server?

You need Node.js and npm to run the server.

Content

MCP Webscan Server

A Model Context Protocol (MCP) server for web content scanning and analysis. This server provides tools for fetching, analyzing, and extracting information from web pages.

Features

Page Fetching: Convert web pages to Markdown for easy analysis
Link Extraction: Extract and analyze links from web pages
Site Crawling: Recursively crawl websites to discover content
Link Checking: Identify broken links on web pages
Pattern Matching: Find URLs matching specific patterns
Sitemap Generation: Generate XML sitemaps for websites

Installation

# Clone the repository
git clone <repository-url>
cd mcp-server-webscan

# Install dependencies
npm install

# Build the project
npm run build

Usage

Starting the Server

npm start

The server runs on stdio transport, making it compatible with MCP clients like Claude Desktop.

Available Tools

fetch_page
- Fetches a web page and converts it to Markdown
- Parameters:
  - url (required): URL of the page to fetch
  - selector (optional): CSS selector to target specific content
extract_links
- Extracts all links from a web page with their text
- Parameters:
  - url (required): URL of the page to analyze
  - baseUrl (optional): Base URL to filter links
crawl_site
- Recursively crawls a website up to a specified depth
- Parameters:
  - url (required): Starting URL to crawl
  - maxDepth (optional, default: 2): Maximum crawl depth
check_links
- Checks for broken links on a page
- Parameters:
  - url (required): URL to check links for
find_patterns
- Finds URLs matching a specific pattern
- Parameters:
  - url (required): URL to search in
  - pattern (required): Regex pattern to match URLs against
generate_sitemap
- Generates a simple XML sitemap
- Parameters:
  - url (required): Root URL for sitemap
  - maxUrls (optional, default: 100): Maximum number of URLs to include

Example Usage with Claude Desktop

Configure the server in your Claude Desktop settings:

{
  "mcpServers": {
    "webscan": {
      "command": "node",
      "args": ["path/to/mcp-server-webscan/dist/index.js"],
      "env": {
        "NODE_ENV": "development"
      }
    }
  }
}

Use the tools in your conversations:

Could you fetch the content from https://example.com and convert it to Markdown?

Development

Prerequisites

Node.js >= 18
npm

Project Structure

mcp-server-webscan/
├── src/
│   └── index.ts    # Main server implementation
├── dist/           # Compiled JavaScript
├── package.json
└── tsconfig.json