MCP Webscan Server

By bsmi021 GitHub

A Model Context Protocol (MCP) server for web content scanning and analysis. This server provides tools for fetching, analyzing, and extracting information from web pages.

mcp claude

Overview

what is MCP Webscan Server?

MCP Webscan Server is a Model Context Protocol (MCP) server designed for web content scanning and analysis, providing tools for fetching, analyzing, and extracting information from web pages.

how to use MCP Webscan Server?

To use the MCP Webscan Server, install it via Smithery or manually clone the repository, install dependencies, and start the server. You can then use various tools to interact with web content.

key features of MCP Webscan Server?

Page Fetching: Convert web pages to Markdown for easy analysis
Link Extraction: Extract and analyze links from web pages
Site Crawling: Recursively crawl websites to discover content
Link Checking: Identify broken links on web pages
Pattern Matching: Find URLs matching specific patterns
Sitemap Generation: Generate XML sitemaps for websites

use cases of MCP Webscan Server?

Analyzing web content for research purposes
Checking for broken links on a website
Generating sitemaps for SEO optimization
Extracting data from web pages for further analysis

FAQ from MCP Webscan Server?

Can MCP Webscan Server analyze any website?

Yes! It can analyze any publicly accessible website.

Is there a limit to the number of pages I can crawl?

The maximum crawl depth can be set, but there is no strict limit on the number of pages.

How do I report issues or contribute?

You can fork the repository, create a feature branch, and submit a pull request.

Content

MCP Webscan Server

A Model Context Protocol (MCP) server for web content scanning and analysis. This server provides tools for fetching, analyzing, and extracting information from web pages.

Features

Page Fetching: Convert web pages to Markdown for easy analysis
Link Extraction: Extract and analyze links from web pages
Site Crawling: Recursively crawl websites to discover content
Link Checking: Identify broken links on web pages
Pattern Matching: Find URLs matching specific patterns
Sitemap Generation: Generate XML sitemaps for websites

Installation

Installing via Smithery

To install Webscan for Claude Desktop automatically via Smithery:

npx -y @smithery/cli install mcp-server-webscan --client claude

Manual Installation

# Clone the repository
git clone <repository-url>
cd mcp-server-webscan

# Install dependencies
npm install

# Build the project
npm run build

Usage

Starting the Server

npm start

The server runs on stdio transport, making it compatible with MCP clients like Claude Desktop.

Available Tools

fetch-page
- Fetches a web page and converts it to Markdown.
- Parameters:
  - url (required): URL of the page to fetch.
  - selector (optional): CSS selector to target specific content.
extract-links
- Extracts all links from a web page with their text.
- Parameters:
  - url (required): URL of the page to analyze.
  - baseUrl (optional): Base URL to filter links.
  - limit (optional, default: 100): Maximum number of links to return.
crawl-site
- Recursively crawls a website up to a specified depth.
- Parameters:
  - url (required): Starting URL to crawl.
  - maxDepth (optional, default: 2): Maximum crawl depth (0-5).
check-links
- Checks for broken links on a page.
- Parameters:
  - url (required): URL to check links for.
find-patterns
- Finds URLs matching a specific pattern.
- Parameters:
  - url (required): URL to search in.
  - pattern (required): JavaScript-compatible regex pattern to match URLs against.
generate-site-map
- Generates a simple XML sitemap by crawling.
- Parameters:
  - url (required): Root URL for sitemap crawl.
  - maxDepth (optional, default: 2): Maximum crawl depth for discovering URLs (0-5).
  - limit (optional, default: 1000): Maximum number of URLs to include in the sitemap.

Example Usage with Claude Desktop

Configure the server in your Claude Desktop settings:

{
  "mcpServers": {
    "webscan": {
      "command": "node",
      "args": ["path/to/mcp-server-webscan/build/index.js"], // Corrected path
      "env": {
        "NODE_ENV": "development",
        "LOG_LEVEL": "info" // Example: Set log level via env var
      }
    }
  }
}

Use the tools in your conversations:

Could you fetch the content from https://example.com and convert it to Markdown?

Development

Prerequisites

Node.js >= 18
npm

Project Structure (Post-Refactor)

mcp-server-webscan/
├── src/
│   ├── config/
│   │   └── ConfigurationManager.ts
│   ├── services/
│   │   ├── CheckLinksService.ts
│   │   ├── CrawlSiteService.ts
│   │   ├── ExtractLinksService.ts
│   │   ├── FetchPageService.ts
│   │   ├── FindPatternsService.ts
│   │   ├── GenerateSitemapService.ts
│   │   └── index.ts
│   ├── tools/
│   │   ├── checkLinksTool.ts
│   │   ├── checkLinksToolParams.ts
│   │   ├── crawlSiteTool.ts
│   │   ├── crawlSiteToolParams.ts
│   │   ├── extractLinksTool.ts
│   │   ├── extractLinksToolParams.ts
│   │   ├── fetchPageTool.ts
│   │   ├── fetchPageToolParams.ts
│   │   ├── findPatterns.ts
│   │   ├── findPatternsToolParams.ts
│   │   ├── generateSitemapTool.ts
│   │   ├── generateSitemapToolParams.ts
│   │   └── index.ts
│   ├── types/
│   │   ├── checkLinksTypes.ts
│   │   ├── crawlSiteTypes.ts
│   │   ├── extractLinksTypes.ts
│   │   ├── fetchPageTypes.ts
│   │   ├── findPatternsTypes.ts
│   │   ├── generateSitemapTypes.ts
│   │   └── index.ts
│   ├── utils/
│   │   ├── errors.ts
│   │   ├── index.ts
│   │   ├── logger.ts
│   │   ├── markdownConverter.ts
│   │   └── webUtils.ts
│   ├── initialize.ts
│   └── index.ts    # Main server entry point
├── build/          # Compiled JavaScript (Corrected)
├── node_modules/
├── .clinerules
├── .gitignore
├── Dockerfile
├── LICENSE
├── mcp-consistant-servers-guide.md
├── package.json
├── package-lock.json
├── README.md
├── RFC-2025-001-Refactor.md
├── smithery.yaml
└── tsconfig.json