MCP server w/ Browser Use

By JovaniPink GitHub

FastAPI server implementing MCP protocol Browser automation via browser-use library.

python mcp

Overview

What is MCP Browser Use?

MCP Browser Use is a FastAPI server that implements the Model Context Protocol (MCP) for browser automation, allowing AI agents to interact with web browsers using natural language commands.

How to use MCP Browser Use?

To use MCP Browser Use, clone the repository, set up a virtual environment, install the dependencies, and run the server. You can then send natural language commands to control the browser.

Key features of MCP Browser Use?

Automated browser interactions via natural language
Navigation, form filling, clicking, and scrolling capabilities
Tab management and screenshot functionality
Vision-based element detection and structured JSON responses

Use cases of MCP Browser Use?

Automating web testing and interactions
Assisting users in filling out forms and navigating websites
Enabling AI agents to perform tasks on the web without manual input

FAQ from MCP Browser Use?

Can MCP Browser Use automate any website?

Yes, it can automate interactions on most websites as long as the browser can access them.

Is there a risk of using this in production?

Yes, there are security risks associated with allowing a server to control a browser, and it is not recommended for production use.

What are the system requirements?

It requires Python 3.11 or higher and a compatible web browser like Chrome.

Content

MCP server w/ Browser Use

MCP server for browser-use.

Overview

This repository contains the server for the browser-use library, which provides a powerful browser automation system that enables AI agents to interact with web browsers through natural language. The server is built on Anthropic's Model Context Protocol (MCP) and provides a seamless integration with the browser-use library.

Features

Browser Control

Automated browser interactions via natural language
Navigation, form filling, clicking, and scrolling capabilities
Tab management and screenshot functionality
Cookie and state management

Agent System

Custom agent implementation in custom_agent.py
Vision-based element detection
Structured JSON responses for actions
Message history management and summarization

Configuration

Environment-based configuration for API keys and settings
Chrome browser settings (debugging port, persistence)
Model provider selection and parameters

Dependencies

This project relies on the following Python packages:

Package	Version	Description
Pillow	>=10.1.0	Python Imaging Library (PIL) fork that adds image processing capabilities to your Python interpreter.
browser-use	==0.1.19	A powerful browser automation system that enables AI agents to interact with web browsers through natural language. The core library that powers this project's browser automation capabilities.
fastapi	>=0.115.6	Modern, fast (high-performance) web framework for building APIs with Python 3.7+ based on standard Python type hints. Used to create the server that exposes the agent's functionality.
fastmcp	>=0.4.1	A framework that wraps FastAPI for building MCP (Model Context Protocol) servers.
instructor	>=1.7.2	Library for structured output prompting and validation with OpenAI models. Enables extracting structured data from model responses.
langchain	>=0.3.14	Framework for developing applications with large language models (LLMs). Provides tools for chaining together different language model components and interacting with various APIs and data sources.
langchain-google-genai	>=2.1.1	LangChain integration for Google GenAI models, enabling the use of Google's generative AI capabilities within the LangChain framework.
langchain-openai	>=0.2.14	LangChain integrations with OpenAI's models. Enables using OpenAI models (like GPT-4) within the LangChain framework. Used in this project for interacting with OpenAI's language and vision models.
langchain-ollama	>=0.2.2	Langchain integration for Ollama, enabling local execution of LLMs.
openai	>=1.59.5	Official Python client library for the OpenAI API. Used to interact directly with OpenAI's models (if needed, in addition to LangChain).
python-dotenv	>=1.0.1	Reads key-value pairs from a `.env` file and sets them as environment variables. Simplifies local development and configuration management.
pydantic	>=2.10.5	Data validation and settings management using Python type annotations. Provides runtime enforcement of types and automatic model creation. Essential for defining structured data models in the agent.
pyperclip	>=1.9.0	Cross-platform Python module for copy and paste clipboard functions.
uvicorn	>=0.22.0	ASGI web server implementation for Python. Used to serve the FastAPI application.

Components

Resources

The server implements a browser automation system with:

Integration with browser-use library for advanced browser control
Custom browser automation capabilities
Agent-based interaction system with vision capabilities
Persistent state management
Customizable model settings

Requirements

Operating Systems (Linux, macOS, Windows; we haven't tested for Docker or Microsoft WSL)
Python 3.11 or higher
uv (fast Python package installer)
Chrome/Chromium browser
Claude Desktop

Quick Start

Claude Desktop

On MacOS: ~/Library/Application\ Support/Claude/claude_desktop_config.json On Windows: %APPDATA%/Claude/claude_desktop_config.json

Installing via Smithery

To install Browser Use for Claude Desktop automatically via Smithery:

npx -y @smithery/cli install @JovaniPink/mcp-browser-use --client claude

Development Configuration

"mcpServers": {
  "mcp_server_browser_use": {
    "command": "uvx",
    "args": [
      "mcp-server-browser-use",
    ],
    "env": {
      "OPENAI_ENDPOINT": "https://api.openai.com/v1",
      "OPENAI_API_KEY": "",
      "ANTHROPIC_API_KEY": "",
      "GOOGLE_API_KEY": "",
      "AZURE_OPENAI_ENDPOINT": "",
      "AZURE_OPENAI_API_KEY": "",
      // "DEEPSEEK_ENDPOINT": "https://api.deepseek.com",
      // "DEEPSEEK_API_KEY": "",
      // Set to false to disable anonymized telemetry
      "ANONYMIZED_TELEMETRY": "false",
      // Chrome settings
      "CHROME_PATH": "",
      "CHROME_USER_DATA": "",
      "CHROME_DEBUGGING_PORT": "9222",
      "CHROME_DEBUGGING_HOST": "localhost",
      // Set to true to keep browser open between AI tasks
      "CHROME_PERSISTENT_SESSION": "false",
      // Model settings
      "MCP_MODEL_PROVIDER": "anthropic",
      "MCP_MODEL_NAME": "claude-3-5-sonnet-20241022",
      "MCP_TEMPERATURE": "0.3",
      "MCP_MAX_STEPS": "30",
      "MCP_USE_VISION": "true",
      "MCP_MAX_ACTIONS_PER_STEP": "5",
      "MCP_TOOL_CALL_IN_CONTENT": "true"
    }
  }
}

Environment Variables

Key environment variables:

# API Keys
ANTHROPIC_API_KEY=anthropic_key

# Chrome Configuration
# Optional: Path to Chrome executable
CHROME_PATH=/path/to/chrome
# Optional: Chrome user data directory
CHROME_USER_DATA=/path/to/user/data
# Default: 9222
CHROME_DEBUGGING_PORT=9222
# Default: localhost
CHROME_DEBUGGING_HOST=localhost
# Keep browser open between tasks
CHROME_PERSISTENT_SESSION=false

# Model Settings
# Options: anthropic, openai, azure, deepseek
MCP_MODEL_PROVIDER=anthropic
# Model name
MCP_MODEL_NAME=claude-3-5-sonnet-20241022
MCP_TEMPERATURE=0.3
MCP_MAX_STEPS=30
MCP_USE_VISION=true
MCP_MAX_ACTIONS_PER_STEP=5

Development

Setup

Clone the repository:

git clone https://github.com/JovaniPink/mcp-browser-use.git
cd mcp-browser-use

Create and activate virtual environment:

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

Install dependencies:

uv sync

Start the server

uv run mcp-browser-use

Debugging

For debugging, use the MCP Inspector:

npx @modelcontextprotocol/inspector uv --directory /path/to/project run mcp-server-browser-use

The Inspector will display a URL for the debugging interface.

Browser Actions

The server supports various browser actions through natural language:

Navigation: Go to URLs, back/forward, refresh
Interaction: Click, type, scroll, hover
Forms: Fill forms, submit, select options
State: Get page content, take screenshots
Tabs: Create, close, switch between tabs
Vision: Find elements by visual appearance
Cookies & Storage: Manage browser state

Security

I want to note that their are some Chrome settings that are set to allow for the browser to be controlled by the server. This is a security risk and should be used with caution. The server is not intended to be used in a production environment.

Security Details: SECURITY.MD

Contributing

We welcome contributions to this project. Please follow these steps:

Fork this repository.
Create your feature branch: git checkout -b my-new-feature.
Commit your changes: git commit -m 'Add some feature'.
Push to the branch: git push origin my-new-feature.
Submit a pull request.

For major changes, open an issue first to discuss what you would like to change. Please update tests as appropriate to reflect any changes made.

No tools information available.

School MCP by 54yyyu

A Model Context Protocol (MCP) server for academic tools, integrating with Canvas and Gradescope platforms.

canvas mcp

View Details

repo-template by loonghao

A Model Context Protocol (MCP) server for Python package intelligence, providing structured queries for PyPI packages and GitHub repositories. Features include dependency analysis, version tracking, and package metadata retrieval for LLM interactions.

pypi mcp

View Details

Google Calendar MCP Server by ymello

google-calendar mcp

View Details

strava-mcp by jeremysilva1098

MCP server for strava

strava mcp

View Details

Grasshopper MCP サーバー by norioh-japan

Model Context Protocol (MCP) server implementation for Rhinoceros/Grasshopper integration, enabling AI models to interact with parametric design tools

grasshopper mcp

View Details

Open Multi-Agent Canvas by CopilotKit

The open-source multi-agent chat interface that lets you manage multiple agents in one dynamic conversation and add MCP servers for deep research

python typescript

View Details

MCP Kali Server by Wh0am123

MCP configuration to connect AI agent to a Linux machine.

security mcp

View Details