🦊 MCPBench: A Benchmark for Evaluating MCP Servers

By modelscope GitHub

The evaluation benchmark on MCP servers

Overview

what is MCPBench?

MCPBench is an evaluation framework designed for assessing the performance of MCP Servers, specifically focusing on Web Search and Database Query tasks. It evaluates various servers like Brave Search and DuckDuckGo based on task completion accuracy, latency, and token consumption.

how to use MCPBench?

To use MCPBench, install the required dependencies, configure your LLM key and endpoint, launch the MCP server with the appropriate configuration, and run evaluations for Web Search or Database Query tasks.

key features of MCPBench?

Supports evaluation of multiple MCP Servers
Measures task completion accuracy, latency, and token consumption
Compatible with local and remote MCP Servers
Provides datasets for evaluation

use cases of MCPBench?

Evaluating the performance of different web search engines.
Comparing database query efficiency across various MCP Servers.
Analyzing the impact of different configurations on server performance.

FAQ from MCPBench?

What types of servers can be evaluated with MCPBench?

MCPBench can evaluate both Web Search and Database Query servers.

Is there a specific Python version required?

Yes, MCPBench requires Python version >= 3.11.

Where can I find the evaluation report?

The evaluation report is available in the project repository.