Guide for creating high-quality MCP (Model Context Protocol) servers that enable LLMs to interact with external services through well-designed tools. This comprehensive analysis covers the mcp-builder skill's 4-phase workflow, Python implementation patterns, and practical evaluation strategies.

📚 Source Information

Original article:Anthropic Skills Repository

Author:Anthropic

🌐 Available in:English简体中文Français

ℹ️ This article was automatically imported and translated using Claude AI.

Analyzing mcp-builder: A Complete Guide to MCP Server Development

mcp-builder is a Claude skill that provides comprehensive guidance for creating high-quality MCP (Model Context Protocol) servers. This skill enables LLMs to interact with external services through well-designed tools, supporting both Python (FastMCP) and Node/TypeScript (MCP SDK) implementations.

This is a production-ready skill from the Anthropic skills repository, designed to guide developers through the complete MCP server development lifecycle from planning to evaluation.

Overview

What is mcp-builder?

Based on the description: Guide for creating high-quality MCP (Model Context Protocol) servers that enable LLMs to interact with external services through well-designed tools. Use when building MCP servers to integrate external APIs or services, whether in Python (FastMCP) or Node/TypeScript (MCP SDK).

Core Purpose

The mcp-builder skill aims to:

Guide developers through agent-centric MCP server design
Provide comprehensive implementation patterns for Python and TypeScript
Establish evaluation-driven development practices
Create tools optimized for LLM context limitations
Enable systematic external service integration

Target Audience

This skill is designed for:

Developers building MCP servers for external API integration
Teams creating reusable tools for Claude and other LLMs
Engineers implementing Model Context Protocol specifications
Anyone interested in LLM-external service communication patterns

Skill Anatomy

Directory Structure

SKILL.md

SKILL.md Structure

Every skill begins with metadata in YAML frontmatter:

---
name: mcp-builder
description: "Guide for creating high-quality MCP (Model Context Protocol) servers that enable LLMs to interact with external services through well-designed tools. Use when building MCP servers to integrate external APIs or services, whether in Python (FastMCP) or Node/TypeScript (MCP SDK)."
license: Complete terms in LICENSE.txt
---

Key Components

Scripts

Scripts provide deterministic, reusable code that Claude can execute. mcp-builder includes sophisticated Python scripts for MCP server connection handling and evaluation.

The mcp-builder includes two powerful scripts:

connections.py: Abstract MCP connection handling across different transport protocols (stdio, SSE, HTTP)

evaluation.py: Comprehensive evaluation harness for testing MCP server effectiveness with Claude

Technical Deep Dive

4-Phase MCP Server Development Workflow

The skill follows a structured workflow with four major phases:

Phase 1: Deep Research and Planning - Understand agent-centric design principles, study MCP protocol and API documentation, create comprehensive implementation plan

Phase 2: Implementation - Set up project structure, implement core infrastructure, build tools systematically following language-specific best practices

Phase 3: Review and Refine - Code quality review, test and build verification, quality checklist validation

Phase 4: Create Evaluations - Design comprehensive evaluation scenarios, create 10 complex questions, generate evaluation XML

How It Works

mcp-builder demonstrates sophisticated skill design with progressive disclosure, extensive reference documentation, and practical tooling.

Trigger Detection: Claude identifies when this skill should be used based on MCP server development queries and the detailed description

Context Loading: SKILL.md content (13KB, 329 lines) loads into Claude's context window with comprehensive workflow documentation

Resource Access: Reference documentation loaded on-demand during implementation phases for Python, TypeScript, and evaluation patterns

Execution: Claude follows the 4-phase process, using bundled scripts for connection handling and evaluation as needed

Phase 1: Deep Research and Planning

This phase establishes the foundation for agent-centric MCP server design:

1.1 Understand Agent-Centric Design Principles

Zap

Build for Workflows

Don't simply wrap API endpoints—build thoughtful, high-impact workflow tools that consolidate related operations and enable complete tasks

Eye

Optimize for Limited Context

Agents have constrained context windows—return high-signal information, provide concise/detailed options, and treat context budget as a scarce resource

MessageSquare

Design Actionable Error Messages

Error messages should guide agents toward correct usage patterns with specific next steps and educational feedback

GitBranch

Follow Natural Task Subdivisions

Tool names should reflect human task thinking with consistent prefixes for discoverability around natural workflows

TestTube

Use Evaluation-Driven Development

Create realistic evaluation scenarios early and let agent feedback drive tool improvements through rapid prototyping and iteration

1.3-1.6 Research and Planning Activities

The skill guides developers through:

MCP Protocol Documentation: Fetch from https://modelcontextprotocol.io/llms-full.txt
Framework Documentation: Load SDK-specific guides for Python or TypeScript
API Documentation: Exhaustively study target service API documentation
Implementation Plan: Create detailed plans for tool selection, shared utilities, input/output design, and error handling

Phase 2: Implementation

This phase focuses on systematic MCP server construction:

2.1 Set Up Project Structure

Create single .py file or organize into modules
Use MCP Python SDK for tool registration
Define Pydantic models for input validation

Create proper project structure with package.json and tsconfig.json
Use MCP TypeScript SDK
Define Zod schemas for input validation

2.3 Implement Tools Systematically

Each tool requires careful design of input schemas, comprehensive documentation, and proper error handling

For each tool, developers should:

Define Input Schema: Use Pydantic (Python) or Zod (TypeScript) with proper constraints and examples
Write Comprehensive Docstrings: Include one-line summary, detailed explanation, parameter types with examples, return schema, usage examples, and error handling
Implement Tool Logic: Use shared utilities, follow async/await patterns, support multiple response formats, respect pagination, and check character limits
Add Tool Annotations: Include readOnlyHint, destructiveHint, idempotentHint, and openWorldHint as appropriate

Phase 3: Review and Refine

MCP servers are long-running processes that wait for requests over stdio/stdin or SSE/HTTP. Running them directly will cause the process to hang indefinitely.

3.2 Test and Build - Critical Considerations

Safe Testing Approaches:

Use the evaluation harness (recommended)
Run the server in tmux to keep it outside the main process
Use timeout when testing: timeout 5s python server.py

Python Verification:

python -m py_compile your_server.py

TypeScript Build:

npm run build  # Verify dist/index.js is created

Phase 4: Create Evaluations

The most sophisticated aspect of mcp-builder is its evaluation framework:

4.2 Create 10 Evaluation Questions

Each question must meet six strict requirements:

Independent: Not dependent on other questions
Read-only: Only non-destructive operations required
Complex: Requiring multiple tool calls and deep exploration
Realistic: Based on real use cases humans would care about
Verifiable: Single, clear answer that can be verified by string comparison
Stable: Answer won't change over time

4.4 Output Format

Evaluations create XML files with this structure:

<evaluation>
  <qa_pair>
    <question>Find discussions about AI model launches with animal codenames...</question>
    <answer>3</answer>
  </qa_pair>
  <!-- More qa_pairs... -->
</evaluation>

Script Analysis

connections.py - Multi-Transport MCP Connection Handler

This 152-line Python module provides sophisticated connection abstraction:

class MCPConnection(ABC):
    """Base class for MCP server connections."""

    async def __aenter__(self):
        """Initialize MCP server connection."""
        # Handles AsyncExitStack, session initialization, and error cleanup

Key Features:

Abstract base class pattern for consistent interface
Support for three transport protocols: stdio, SSE, HTTP
Async context manager for resource cleanup
Automatic session initialization and error handling
Factory function create_connection() for transport-agnostic instantiation

Transport Classes:

MCPConnectionStdio: Standard input/output for local servers
MCPConnectionSSE: Server-Sent Events for streaming connections
MCPConnectionHTTP: Streamable HTTP for web-based MCP servers

evaluation.py - Comprehensive MCP Server Evaluation Harness

This 374-line module provides end-to-end evaluation capabilities:

The evaluation harness tests whether LLMs can effectively use MCP servers to answer realistic, complex questions.

Core Components:

Evaluation Prompt: Sophisticated system prompt requiring:
- Tool usage with step-by-step summaries
- Constructive feedback on tool design
- Properly formatted XML responses
XML Parsing: Robust parsing of evaluation files with error handling
Agent Loop: Async interaction with Claude API and MCP server tools
Metrics Collection: Comprehensive tracking of:
- Accuracy rates
- Task durations
- Tool call counts and performance
- Feedback quality

Usage Example:

# Evaluate a local stdio MCP server
python evaluation.py -t stdio -c python -a my_server.py eval.xml

# Evaluate an SSE MCP server with authentication
python evaluation.py -t sse -u https://example.com/mcp \
  -H "Authorization: Bearer token" eval.xml

Usage Examples

Basic Usage - Creating an MCP Server

mcp-builder guides you through the complete MCP server development process

Start with Research: Study MCP protocol documentation and your target API

Design Agent-Centric Tools: Focus on workflows, not just API endpoint mapping

Implement Systematically: Follow language-specific best practices with proper validation

Evaluate Thoroughly: Create 10+ complex evaluation scenarios and test with the harness

Advanced Scenario - Complex API Integration

When integrating complex APIs, mcp-builder emphasizes consolidation and workflow thinking

Example: Calendar Integration

Instead of separate tools:

❌ check_availability, create_event, send_invitation

Create workflow-oriented tools:

✅ schedule_meeting: Checks availability, creates event, and sends invitations in one operation

This approach:

Reduces context window usage
Minimizes agent decision points
Provides better error handling
Enables complete task accomplishment

Best Practices

Based on the design of mcp-builder, here are key principles for MCP server development:

Agent-Centric Design Principles

Gauge

Context Budget Awareness

Treat the agent's context window as a scarce resource. Return only essential information and provide concise/detailed options

Combine

Workflow Consolidation

Combine related operations into single tools that enable complete tasks rather than exposing raw API endpoints

GraduationCap

Educational Error Messages

Make errors actionable with specific guidance: "Try using filter='active_only' to reduce results"

Layers

Progressive Disclosure

Provide summary information by default with options for detailed exploration when needed

Repeat

Idempotent Operations

Design tools that can be safely retried without side effects when appropriate

Implementation Best Practices

Input Validation: Use Pydantic (Python) or Zod (TypeScript) with proper constraints, examples, and descriptive field documentation

Comprehensive Documentation: Every tool needs one-line summary, detailed explanation, parameter types with examples, return schema, usage examples, and error handling guidance

Async Patterns: Use async/await for all I/O operations to prevent blocking

Error Handling: Implement graceful failure modes with clear, LLM-friendly error messages that prompt further action

Response Formatting: Support both JSON and Markdown formats with configurable detail levels

Common Pitfalls

Common mistakes developers make when building MCP servers

API-First Design (Wrong Approach)

Symptom: Tools map 1:1 to API endpoints

Problem: Forces agents to understand API structure and make multiple calls for simple workflows

Solution: Design tools around workflows and tasks, not API endpoints

Excessive Information Return

Symptom: Tools return complete API responses with all fields

Problem: Wastes context window on irrelevant data

Solution: Return high-signal information only, provide detailed/concise options

Poor Error Messages

Symptom: "Error 404: Not Found"

Problem: Agents don't know how to recover

Solution: "Resource not found. Try using list_resources() to see available options"

Missing Tool Annotations

Symptom: Tools lack readOnlyHint, destructiveHint, etc.

Problem: Claude cannot optimize tool usage patterns

Solution: Add appropriate annotations to all tools

Insufficient Testing

Symptom: No evaluation harness, manual testing only

Problem: Cannot measure tool effectiveness or iterate based on feedback

Solution: Create comprehensive evaluations using mcp-builder's evaluation.py

Integration with Other Skills

mcp-builder works well with:

skill-creator - For creating new skills based on MCP patterns
skill-article-writer - For documenting MCP server implementations
api-contract-manager - For API specification management
testing-frameworks - For comprehensive MCP server testing

Real-World Applications

Use Case 1: Enterprise API Integration

Scenario: A company wants to enable Claude to interact with their internal CRM system

mcp-builder Workflow:

Research: Study CRM API documentation and identify common workflows (create lead, update contact, query opportunities)
Design: Create workflow-oriented tools like qualify_lead (combines data enrichment, scoring, and CRM updates)
Implement: Build Python MCP server with Pydantic validation and comprehensive error handling
Evaluate: Create 10+ scenarios testing lead qualification, opportunity tracking, and reporting capabilities

Outcome: Agents can accomplish complex CRM tasks through natural conversation without understanding API details

Use Case 2: Data Analysis Platform

Scenario: Building an MCP server for a business intelligence platform

mcp-builder Workflow:

Research: Understand data schemas, query capabilities, and common analysis patterns
Design: Tools like analyze_trend (combines data extraction, statistical analysis, and visualization)
Implement: Async Python server handling large dataset queries with pagination and truncation
Evaluate: Test complex analytical questions requiring multi-step data processing

Outcome: Claude can perform sophisticated data analysis through conversational interfaces

Use Case 3: DevOps Automation

Scenario: Creating MCP server for infrastructure management

mcp-builder Workflow:

Research: Study cloud provider APIs and common DevOps workflows
Design: Safety-focused tools with confirmation prompts for destructive operations
Implement: TypeScript server with strict validation and comprehensive logging
Evaluate: Create tests for deployment, scaling, and monitoring scenarios

Outcome: Safe, auditable infrastructure management through natural language

Troubleshooting

MCP Server Hanging on Startup

Symptom: Server starts but process hangs indefinitely

Cause: MCP servers are long-running processes waiting for requests

Solution: Run in tmux or use timeout for testing:

timeout 5s python server.py

Evaluation Harness Connection Failures

Symptom: evaluation.py cannot connect to MCP server

Cause: Transport protocol mismatch or authentication issues

Solution:

Verify correct transport type (stdio/SSE/HTTP)
Check authentication credentials for remote servers
Ensure server is running before starting evaluation

Tool Call Failures in Evaluation

Symptom: Tools execute but return errors

Cause: Input validation failures or API changes

Solution:

Review tool input schemas for proper validation
Check that API endpoints are accessible and unchanged
Verify error handling provides actionable feedback

Poor Evaluation Scores

Symptom: Low accuracy on evaluation questions

Cause: Tools not designed for agent workflows or poor documentation

Solution:

Review agent-centric design principles
Improve tool descriptions with better examples
Consolidate related operations into workflow tools
Add comprehensive error handling and guidance

Context Overflow Issues

Symptom: Tools return too much data, exceeding context limits

Cause: No pagination or truncation strategy

Solution:

Implement pagination parameters
Add character limit checks
Provide concise/detailed response options
Use 25,000 token ceiling as guidance

Next Steps

To use mcp-builder effectively:

Clone the repository: git clone https://github.com/anthropics/skills
Study the skill: Read SKILL.md thoroughly to understand the 4-phase process
Examine the scripts: Review connections.py and evaluation.py for implementation patterns
Start building: Choose a target API and begin Phase 1 research
Follow the workflow: Complete all 4 phases systematically
Evaluate thoroughly: Create and run comprehensive evaluations
Iterate based on feedback: Use evaluation results to improve tool design

Anthropic Skills Repository: github.com/anthropics/skills
Model Context Protocol: modelcontextprotocol.io
Python MCP SDK: github.com/modelcontextprotocol/python-sdk
TypeScript MCP SDK: github.com/modelcontextprotocol/typescript-sdk
MCP Builder Skill: /development/analyzing-mcp-builder

Conclusion

mcp-builder demonstrates exceptional Claude skill design through:

✅ Comprehensive Workflow: 4-phase process covering research, implementation, review, and evaluation ✅ Agent-Centric Design: Principles focused on LLM context limitations and workflow optimization ✅ Practical Tooling: Production-ready Python scripts for connection handling and evaluation ✅ Language Support: Detailed guidance for both Python and TypeScript implementations ✅ Quality Assurance: Systematic evaluation framework with rigorous testing requirements ✅ Progressive Disclosure: Structured SKILL.md with clear phases and actionable steps

The key insights from this skill can transform how developers approach MCP server development, leading to tools that truly enable LLMs to accomplish complex tasks through natural interaction with external services.

Summary

This comprehensive analysis covered:

✅ Skill structure and anatomy (4-phase workflow)
✅ Agent-centric design principles and best practices
✅ Python implementation patterns with Pydantic validation
✅ TypeScript implementation patterns with Zod schemas
✅ Sophisticated connection handling across transport protocols
✅ Comprehensive evaluation framework and harness
✅ Real-world applications and use cases
✅ Troubleshooting guide for common issues
✅ Integration strategies with related skills

Next Steps

Ready to build your first MCP server?

Study the complete SKILL.md: Understand all 4 phases in detail
Choose a target API: Start with a well-documented external service
Follow Phase 1: Conduct thorough research and create an implementation plan
Implement systematically: Use language-specific best practices
Evaluate thoroughly: Create 10+ complex scenarios and test with evaluation.py
Iterate and improve: Refine based on evaluation feedback
Share with the community: Contribute your MCP server to the ecosystem