
Chrome API MCP Server
A Chrome browser control MCP server for AI assistants to directly interact with and control Chrome via Chrome DevTools Protocol
What is Chrome Control API?
Chrome Control API is a streamlined API designed for controlling the Chrome browser via the Chrome DevTools Protocol, enabling AI assistants to interact directly with Chrome.
How to use Chrome Control API?
To use the Chrome Control API, import the API in your JavaScript project, create an instance of the ChromeAPI class, and utilize its methods to manage tabs, navigate, interact with the DOM, and analyze page content.
Key features of Chrome Control API?
- Tab management (create, list, close)
- Navigation (go to URL, back, forward, reload)
- DOM interaction (find elements, click, type text)
- Page content analysis (extract structure, find content sections)
- Intelligent search handling (automatically find and use search fields)
- Screenshot capture (full page or specific elements)
- JavaScript execution
- Network traffic monitoring
Use cases of Chrome Control API?
- Automating web testing and interactions for applications.
- Building AI assistants that can navigate and extract information from web pages.
- Enhancing browser automation tasks for developers and testers.
FAQ from Chrome Control API?
- Can I control multiple tabs at once?
Yes! The API allows you to manage multiple tabs simultaneously.
- Is the Chrome Control API free to use?
Yes! The API is open-source and free to use.
- What programming language is required?
The API is designed for use with JavaScript and TypeScript.
Chrome API MCP Server
A Chrome API MCP (Model Context Protocol) server that provides semantic understanding of web pages for AI assistants like Claude, enabling DOM-based browsing without relying on screenshots.
Features
- Semantic DOM Analysis: Build structured representations of web pages
- Efficient Browsing: Provides content extraction without relying on screenshots
- Interactive Navigation: Identify and interact with elements based on semantics
- Reliable Element Selection: Multiple strategies for finding and interacting with page elements
- Cache Optimization: Smart caching system for improved performance
- Error Handling: Robust error management for reliable operation
- Detailed Logging: Comprehensive logging system for debugging
Requirements
- Node.js 16+
- Chrome browser (must be installed)
- npm or yarn
Quick Start
-
Make the startup script executable:
chmod +x start-custom-mcp.sh
-
Start the server:
./start-custom-mcp.sh
This script will:
- Check if Chrome is running with remote debugging enabled
- Start Chrome with the correct flags if needed
- Build the TypeScript code
- Start the MCP server on port 3001
-
Run the example to test functionality:
npx ts-node examples/analyze-page.ts https://example.com
API Methods
Basic Methods
initialize
: Initialize the connectionnavigate
: Open a URL in a new tabgetContent
: Get the raw HTML content of a pageexecuteScript
: Execute JavaScript code in a tabclickElement
: Click on an element matching a CSS selectortakeScreenshot
: Capture a screenshot (optional)closeTab
: Close a tab
Semantic Understanding Methods
getStructuredContent
: Get a structured representation of the page contentanalyzePageSemantics
: Analyze the page and build a semantic DOM modelfindElementsByText
: Find elements containing specific textfindClickableElements
: Find all interactive elements on the pageclickSemanticElement
: Click an element by its semantic IDfillFormField
: Fill a form field by its semantic IDperformSearch
: Use the page's search functionality
Example Usage
// Initialize and navigate to a page
const response1 = await fetch('http://localhost:3001', {
method: 'POST',
headers: {
'Content-Type': 'application/json'
},
body: JSON.stringify({
jsonrpc: '2.0',
method: 'navigate',
params: { url: 'https://example.com' },
id: 1
})
});
const { result: { tabId } } = await response1.json();
// Get structured content
const response2 = await fetch('http://localhost:3001', {
method: 'POST',
headers: {
'Content-Type': 'application/json'
},
body: JSON.stringify({
jsonrpc: '2.0',
method: 'getStructuredContent',
params: { tabId },
id: 2
})
});
const { result: { content } } = await response2.json();
console.log(content);
// Find and click a button
const response3 = await fetch('http://localhost:3001', {
method: 'POST',
headers: {
'Content-Type': 'application/json'
},
body: JSON.stringify({
jsonrpc: '2.0',
method: 'findElementsByText',
params: { tabId, text: 'Login' },
id: 3
})
});
const { result: { elements } } = await response3.json();
if (elements.length > 0) {
await fetch('http://localhost:3001', {
method: 'POST',
headers: {
'Content-Type': 'application/json'
},
body: JSON.stringify({
jsonrpc: '2.0',
method: 'clickSemanticElement',
params: { tabId, semanticId: elements[0].semanticId },
id: 4
})
});
}
Configuration
- Chrome debugging port: 9222 (default)
- MCP server port: 3001 (configurable via PORT environment variable)
- Debug mode: Set environment variable
DEBUG=true
to enable verbose logging - Cache settings: Configured in
config.ts
- Default TTL: 10 seconds
- Max cache size: 200 entries
- Connection timeouts: Configurable in
config.ts
Debugging
Enable debug mode by setting the DEBUG environment variable:
DEBUG=true ./start-custom-mcp.sh
For more granular debugging of specific modules, use:
DEBUG=chrome-page-analyzer:* ./start-custom-mcp.sh
Log Files
The server creates log files in the logs
directory with detailed information about all operations:
- Log location:
./logs/chrome-mcp.log
- Log rotation: Automatically rotates logs when they reach 10MB (configurable)
- Log levels: ERROR, WARN, INFO, DEBUG, TRACE
- Log content: Timestamps, request IDs, method calls, parameters, and results
You can view logs in real-time using:
tail -f logs/chrome-mcp.log
How It Works
- When a page is loaded, the server builds a semantic model of the page
- The model includes:
- Semantic element types (navigation, button, link, form, content, etc.)
- Text content and structure
- Interactive elements
- Hierarchical relationships
- AI systems can query this model to understand the page content and structure
- Actions can be performed through semantic references rather than coordinates
This approach is more efficient than using screenshots and provides better context for AI assistants to understand and interact with web pages.
Architecture
The server is built with a modular architecture:
- ChromeAPI: Main API class exposing methods to clients
- DOMInteractionLayer: Core DOM interaction functionality
- SemanticAnalyzer: Semantic understanding of page structure
- ContentExtractor: Extract structured content from pages
- Error Handler: Centralized error management
- DOM Helpers: Utility functions for DOM manipulation
- Cache: Optimized caching system for improved performance
- Logger: Comprehensive logging system for debugging
Development
-
Install dependencies:
npm install
-
Build TypeScript code:
npm run build
-
Run the server with debugging:
DEBUG=true node custom-mcp-server.js
-
Run tests:
npm test
Troubleshooting
- Chrome connection issues: Make sure Chrome is running with the remote debugging port open. You can start it manually with
google-chrome --remote-debugging-port=9222
. - Port conflicts: If port 3001 is already in use, set a different port with
PORT=3002 ./start-custom-mcp.sh
. - TypeScript build errors: Check for any type errors in the source code and fix them before building.
- Element interaction failures: If clicking elements fails, the server attempts multiple strategies (mouse events, JavaScript). Check the debug logs for details.
- Memory issues: If you encounter memory problems, adjust the cache settings in
config.ts
. - Log file access: If you encounter permission issues with log files, make sure the user running the server has write access to the
logs
directory.
License
MIT