MCP로 RAG Application 구현하기

By kyopark2014 GitHub

It shows how to use model-context-protocol.

Overview

What is MCP?

MCP (Model Context Protocol) is an interface that allows generative AI applications to utilize external data efficiently. It has rapidly gained traction since its inception as an open-source project by Anthropic in November 2024, and is now supported by platforms like Cursor and OpenAI.

How to use MCP?

To use MCP, developers can connect their applications to the MCP server using tools like Claude Desktop or Cursor. The MCP server responds to client requests by providing capabilities and executing tasks such as querying local files or external APIs.

Key features of MCP?

Integration with various AI tools and applications.
Capability to access and manipulate external data sources.
Support for JSON-RPC protocol for communication between clients and servers.
Customizable application development using LangGraph.

Use cases of MCP?

Building applications that require real-time data access and processing.
Implementing RAG (Retrieval-Augmented Generation) services for enhanced information retrieval.
Developing multi-modal applications that analyze text, images, and tables.

FAQ from MCP?

What is the main purpose of MCP?

MCP serves as a bridge for generative AI applications to access and utilize external data effectively.

Is MCP free to use?

Yes, MCP is an open-source project and is free for developers to use.

Can MCP be integrated with existing applications?

Yes, MCP can be integrated with various AI tools and applications, making it versatile for different use cases.

Content

MCP로 RAG Application 구현하기

MCP(Model Context Protocal)은 생성형 AI application이 외부 데이터를 활용하는 주요한 인터페이스로 빠르게 확산되고 있습니다. 2024년 11월에 Anthropic의 오픈소스 프로젝트로 시작되었고, 현재 Cursor뿐 아니라 OpenAI에서도 지원하고 있습니다. 여기에서는 MCP with LangChain을 이용하여 LangGraph로 만든 application이 MCP를 활용하는 방법에 대해 설명합니다. 여기서 구현한 RAG는 Amazon의 완전관리형 RAG 서비스인 Knowledge base로 구현되었으므로, 문서의 텍스트 추출, 동기화, chunking과 같은 작업을 손쉽게 수행할 수 있으며, 멀티모달을 이용해 이미지/표를 분석할 수 있습니다. 여기에서는 MCP server에서 RAG에 손쉽게 접근할 수 있도록 AWS Lambda를 이용해 API를 구성하였습니다.

아래 architecture는 AWS 환경에서 MCP를 포함한 Agent를 구성하는것을 보여줍니다. Agent는 MCP server/client 구조를 활용하여 외부의 데이터 소스를 활용할 수 있습니다. MCP client는 MCP server와 JSON-RPC 프로토콜에 기반하여 stdio/SSE로 통신을 수행합니다. Stdio 사용시 MCP Server는 python, java와 같은 코드로 구성이 되고, client에서 요청이 오면 RAG나 인터넷등을 이용해 데이터를 수집하거나 전달하는 역할을 수행합니다. SSE로 할 경우에 MCP client와 server는 IP로 통신을 하게 됩니다. 여기서는 Streamlit을 이용해 application의 UI를 구성하고, 사용자는 ALB - CloudFront를 이용해 HTTPS 방식으로 브라우저를 통해 application을 이용합니다. 또한, 여기에서는 커스터마이징이 유리한 LangGraph를 이용해 MCP 기반의 application을 개발하는것을 설명합니다.

MCP 활용

MCP Basic

사용자는 자신의 Computer에 설치된 Claude Desktop, Cursor와 같은 AI 도구뿐 아니라 주로 Agent형태로 개발된 어플리케이션을 통해 MCP 서버에 연결할 수 있습니다. MCP server는 MCP client의 요청에 자신이 할수 있는 기능을 capability로 제공하고 client의 요청을 수행합니다. MCP server는 local computer의 파일이나 데이터베이스를 조회할 수 있을뿐 아니라 인터넷에 있는 외부 서버의 API를 이용해 필요한 정보를 조회할 수 있습니다. MCP Client는 Server와 JSON-RPC 2.0 프로토콜을 이용해 연결되는데, stdio나 SSE (Server-Sent Events)을 선택하여, Host의 요청을 MCP에 전달할 수 있고, 응답을 받아서 활용할 수 있습니다.

MCP의 주요 요소의 정의와 동작은 아래와 같습니다.

MCP Hosts: MCP를 통해 데이터에 접근하려는 프로그램으로 Claude Desktop, Cursor, User Agent Application이 해당됩니다.
MCP Clients: MCP Server와 1:1로 연결을 수행하는 Client로서 MCP Server와 stdio 또는 SSE 방식으로 연결할 수 있습니다.
MCP Servers: Client에 자신의 Capability를 알려주는 경량 프로그램으로 Local Computer의 파일이나 데이터베이스를 조회할 수 있고, 외부 API를 이용해 정보를 조회할 수 있습니다.

MCP Server Components에는 아래와 같은 항목이 있습니다.

Tools (Model-controlled): API와 같이 특정한 action을 수행합니다.

tools = await session.list_tools()

Resources (Application-controlled): 생성형 AI 어플리케이션이 접근 할 수 있는 데이터 소스입니다. 복잡한 계산(significant computation)이나 부작용(side effect)없이 데이터를 가져올 수 있습니다.

resources = await session.list_resources()

Prompts (User-controlled): tool나 resource를 사용할때에 이용하는 사전 정의된 템플렛으로서 추론(inference)전에 선택할 수 있습니다.

prompts = await session.list_prompts()

LangChain MCP Adapter

LangChain MCP Adapter는 MCP를 LangGraph agent와 함께 사용할 수 있게 해주는 경량의 랩퍼(lightweight wrapper)로서 MIT 기반의 오픈소스입니다. MCP Adapter의 주된 역할은 MCP server를 위한 tool들을 정의하고, MCP client에서 tools의 정보를 조회하고 LangGraph의 tool node로 정의하여 활용할 수 있도록 도와줍니다.

사전 준비

MCP와 LangChain MCP Adapter를 아래와 같이 설치합니다.

pip install mcp langchain-mcp-adapters

MCP Server

RAG 검색을 위한 MCP server는 아래와 같이 정의할 수 있습니다. Server의 transport를 "stdio"로 지정하면 server를 지속 실행시키지 않더라도, client가 server의 python code를 직접 실행할 수 있어서 편리합니다.

from mcp.server.fastmcp import FastMCP 

mcp = FastMCP(
    name = "Search",
    instructions=(
        "You are a helpful assistant. "
        "You can search the documentation for the user's question and provide the answer."
    ),
) 

@mcp.tool()
def search(keyword: str) -> str:
    "search keyword"

    return retrieve_knowledge_base(keyword)

if __name__ =="__main__":
    print(f"###### main ######")
    mcp.run(transport="stdio")

Server는 요청이 들어오면, retrieve_knowledge_base()로 RAG 검색을 수행합니다. Server의 python code는 경량(lightweight)이어야 하므로, 아래와 같이 lambda를 trigger하는 방식으로 구성하였습니다. Lambda에서는 retrieve, grade, generation의 동작을 수행합니다. 아래와 같이 "model_name"을 지정할 수 있고, 필요에 따라서는 "grading"을 선택적으로 사용할 수 있습니다. 또한 병렬처리로 속도를 빠르게 하고 싶은 경우에은 "multi_region"을 "Enable"로 설정합니다. 상세한 코드는 lambda-rag를 참조합니다.

def retrieve_knowledge_base(query):
    lambda_client = boto3.client(
        service_name='lambda',
        region_name=bedrock_region
    )
    functionName = f"lambda-rag-for-{projectName}"
    payload = {
        'function': 'search_rag',
        'knowledge_base_name': knowledge_base_name,
        'keyword': query,
        'top_k': numberOfDocs,
        'grading': "Enable",
        'model_name': model_name,
        'multi_region': multi_region
    }
    output = lambda_client.invoke(
        FunctionName=functionName,
        Payload=json.dumps(payload),
    )
    payload = json.load(output['Payload'])
    return payload['response'], []

MCP Client

MCP client이 하나의 MCP server만 볼 경우에는 아래와 같이 stdio_client와 StdioServerParameters를 이용해 구현할 수 있습니다. MCP server에 대한 정보는 config.json에서 읽어오거나 streamlit에서 사용자가 입력한 정보를 사용할 수 있습니다. load_mcp_server_parameters()에서는 mcp_json을 읽어와서 StdioServerParameters을 구성합니다. config.json의 MCP server에 대한 정보는 AWS CDK로 배포후 생성되는 output에서 가져옵니다.

from mcp import ClientSession, StdioServerParameters

def load_mcp_server_parameters():
    mcp_json = json.loads(mcp_config)
    mcpServers = mcp_json.get("mcpServers")

    command = ""
    args = []
    if mcpServers is not None:
        for server in mcpServers:
            config = mcpServers.get(server)
            if "command" in config:
                command = config["command"]
            if "args" in config:
                args = config["args"]
            break

    return StdioServerParameters(
        command=command,
        args=args
    )

아래와 같이 MCP server에 대한 정보로 stdio_client를 구성합니다. 이때 tools에 대한 정보를 load_mcp_tools로 가져옵니다. Agent에서는 tool 정보를 bind하고 ainvoke를 이용해 요청된 동작을 수행합니다.

from mcp.client.stdio import stdio_client
from langchain_mcp_adapters.tools import load_mcp_tools

async def mcp_rag_agent_single(query, st):
    server_params = load_mcp_server_parameters()

    async with stdio_client(server_params) as (read, write):
        async with ClientSession(read, write) as session:
            await session.initialize()
            tools = await load_mcp_tools(session)
            with st.status("thinking...", expanded=True, state="running") as status:       
                agent = create_agent(tools)
                agent_response = await agent.ainvoke({"messages": query})                

                result = agent_response["messages"][-1].content
            st.markdown(result)
            st.session_state.messages.append({
                "role": "assistant", 
                "content": result
            })            
            return result

MCP client는 아래와 같이 실행합니다. 비동기적으로 실행하기 위해서 asyncio를 이용하였습니다. 이후 사용자가 UI에서 MCP Config를 업데이트하면 정보를 업데이트 할 수 있습니다.

asyncio.run(mcp_rag_agent_single(query, st))

서버 정보가 여럿인 경우에 langchain-mcp-adapters에서 제공하는 MultiServerMCPClient을 이용합니다. 먼저, 아래와 같이 서버 정보를 가져옵니다.

def load_multiple_mcp_server_parameters():
    mcp_json = json.loads(mcp_config)
    mcpServers = mcp_json.get("mcpServers")

    server_info = {}
    if mcpServers is not None:
        command = ""
        args = []
        for server in mcpServers:
            config = mcpServers.get(server)
            if "command" in config:
                command = config["command"]
            if "args" in config:
                args = config["args"]

            server_info[server] = {
                "command": command,
                "args": args,
                "transport": "stdio"
            }
    return server_info

이후 아래와 같이 MCP server정보와 MultiServerMCPClient로 client를 정의합니다. MCP server로 부터 가져온 tool 정보는 client.get_tools()로 가져와서 agent를 생성할 때에 사용합니다. Single MCP server와 마찬가지로 ainvoke로 실행하여 결과를 얻을 수 있습니다.

from langchain_mcp_adapters.client import MultiServerMCPClient
asyncio.run(mcp_rag_agent_multiple(query, st))

async def mcp_rag_agent_multiple(query, st):
    server_params = load_multiple_mcp_server_parameters()
    async with  MultiServerMCPClient(server_params) as client:
        with st.status("thinking...", expanded=True, state="running") as status:                       
            tools = client.get_tools()
            agent = create_agent(tools)
            response = await agent.ainvoke({"messages": query})
            result = response["messages"][-1].content

        st.markdown(result)
        st.session_state.messages.append({
            "role": "assistant", 
            "content": result
        })
    return result

여기서는 customize가 용이하도록 agent를 정의하였습니다.

def create_agent(tools):
    tool_node = ToolNode(tools)

    chatModel = get_chat(extended_thinking="Disable")
    model = chatModel.bind_tools(tools)

    class State(TypedDict):
        messages: Annotated[list, add_messages]

    def call_model(state: State, config):
        system = (
            "당신의 이름은 서연이고, 질문에 친근한 방식으로 대답하도록 설계된 대화형 AI입니다."
            "상황에 맞는 구체적인 세부 정보를 충분히 제공합니다."
            "모르는 질문을 받으면 솔직히 모른다고 말합니다."
            "한국어로 답변하세요."
        )
        try:
            prompt = ChatPromptTemplate.from_messages(
                [
                    ("system", system),
                    MessagesPlaceholder(variable_name="messages"),
                ]
            )
            chain = prompt | model                
            response = chain.invoke(state["messages"])
        return {"messages": [response]}

    def should_continue(state: State) -> Literal["continue", "end"]:
        messages = state["messages"]    
        last_message = messages[-1]
        if isinstance(last_message, AIMessage) and last_message.tool_calls:
            return "continue"        
        else:
            return "end"

    def buildChatAgent():
        workflow = StateGraph(State)
        workflow.add_node("agent", call_model)
        workflow.add_node("action", tool_node)
        workflow.add_edge(START, "agent")
        workflow.add_conditional_edges(
            "agent",
            should_continue,
            {
                "continue": "action",
                "end": END,
            },
        )
        workflow.add_edge("action", "agent")
        return workflow.compile() 
    
    return buildChatAgent()

MCP Servers의 활용

Model Context Protocol servers에서도 아래와 같은 서버들에 대한 정보를 제공하고 있습니다.

Smithery에서 MCP server를 찾아보고 필요한 서버를 찾으면 접속할 수 있는 MCP 서버 정보를 JSON 형태로 조회할 수 있습니다.

Smithery - Google Search Server에서 확인한 구글 검색용 MCP 서버 정보는 아래와 같습니다. 검색엔진 ID와 API Key를 필요로 합니다.

{
  "mcpServers": {
    "google-search-mcp-server": {
      "command": "npx",
      "args": [
        "-y",
        "@smithery/cli@latest",
        "run",
        "@gradusnikov/google-search-mcp-server",
        "--config",
        "{\"googleCseId\":\"b5cd8c527fbd64b72\",\"googleApiKey\":\"AIzbSyDQlYpck8-9TbBSuxoew1luOGVB6unRPNk\"}"
      ]
    }
  }
}

아래와 같이 json 형식의 서버정보를 업데이트 할 수 있습니다. 아래에서는 mcp-server.py에서 정의한 search를 이용하고 있습니다.

{
  "mcpServers": {
    "search": {
      "command": "python",
      "args": [
        "application/mcp-server.py"
      ]
    }
  }
}

실행하기

Output의 environmentformcprag의 내용을 복사하여 application/config.json을 생성합니다. "aws configure"로 credential이 설정되어 있어야합니다. 만약 visual studio code 사용자라면 config.json 파일은 아래 명령어를 사용합니다.

code application/config.json

아래와 같이 필요한 패키지를 설치합니다.

python3 -m venv venv
source venv/bin/activate
pip install streamlit streamlit_chat 
pip install boto3 langchain_aws langchain langchain_community langgraph

deployment.md에 따라 AWS CDK로 Lambda, Knowledge base, Opensearch Serverless와 보안에 필요한 IAM Role을 설치합니다. 이후 아래와 같은 명령어로 streamlit을 실행합니다.

streamlit run application/app.py

MCP Inspector

Development Mode에서 mcp server를 테스트 하기 위해 MCP inspector를 이용할 수 있습니다. 아래와 같이 cli를 설치합니다.

pip install 'mcp[cli]'

이후 아래와 같이 실행하면 쉽게 mcp-server.py의 동작을 테스트 할 수 있습니다. 실행시 http://localhost:5173 와 같은 URL을 제공합니다.

mcp dev mcp-server.py

실행 결과

MCP로 RAG를 조회하여 활용하기

error_code.pdf을 다운로드 한 후에 파일을 업로드합니다. 이후 아래와 같이 "보일러 에러중 수압과 관련된 에러 코드를 검색해주세요."와 같이 입력하면 mcp를 이용해 tool의 정보를 가져오고, search tool로 얻어진 정보를 이용해 아래와 같은 정보를 보여줄 수 있습니다. 이때 search tool은 lambda를 실행하는데 lambda에서는 완전 관리형 RAG 서비스인 knowledge base를 이용하여 검색어를 조회하고 관련성을 평가한 후에 관련된 문서만을 전달합니다. Agent는 RAG를 조회하여 얻어진 정보로 답변을 아래와 같이 구합니다.

MCP로 인터넷 검색을 하여 활용하기

smithery-Tavily에 접속하여 환경에 맞는 설정값을 얻어옵니다. 아래는 Mac/Linux의 JSON format의 접속 정보입니다.

{
  "mcpServers": {
    "mcp-tavily": {
      "command": "npx",
      "args": [
        "-y",
        "@smithery/cli@latest",
        "run",
        "mcp-tavily",
        "--key",
        "132c5abd-6f2e-4e42-89a1-d0b1fcb75613"
      ]
    }
  }
}

아래는 기본 설정된 RAG를 위한 정보입니다.

{
  "mcpServers": {
    "search": {
      "command": "python",
      "args": [
        "application/mcp-server.py"
      ]
    }
  }
}

아래는 multiple mcp server를 설정시 config 입니다.

{
   "mcpServers":{
      "RAG":{
         "command":"python",
         "args":[
            "application/mcp-server.py"
         ]
      },
      "mcp-tavily":{
         "command":"npx",
         "args":[
            "-y",
            "@smithery/cli@latest",
            "run",
            "mcp-tavily",
            "--key",
            "132c5abd-6f2e-4e42-89a1-d0b1fcb75613"
         ]
      }
   }
}