Rate Limiting MCP Tool Calls: A Practical Guide

A coding agent created 47 GitHub issues in 90 seconds. Each one was titled “Bug: stale cache in auth module” — identical, because the agent was stuck in a loop. By the time the developer noticed, the repository’s issue tracker was unusable.

Rate limiting MCP tool calls means capping how many times an AI agent can invoke a specific tool within a given time window. Unlike API-level rate limits set by the service provider, MCP rate limits are enforced at the proxy layer before requests reach the upstream server, giving you control over agent behaviour regardless of the API’s own limits.

But MCP has no built-in rate limiting mechanism. The protocol forwards every tools/call from the client to the server without restriction. If you want limits, you need to add them yourself.

Intercept adds rate limiting by sitting between the agent and the MCP server as a proxy. Policies are defined in YAML, evaluated on every tool call, and enforced at the transport layer — the same deterministic enforcement approach that prompt guardrails can’t replicate.

Here’s how to implement rate limits at different levels of granularity.

Per-Tool Rate Limits

The most straightforward rate limit caps the number of times a tool can be called within a time window. Intercept provides a shorthand:

version: "1"
description: "GitHub MCP server policies"

tools:
  create_issue:
    rules:
      - name: "hourly issue limit"
        rate_limit: 5/hour
        on_deny: "Hourly limit of 5 new issues reached"

The format is <count>/<window>, where window is minute, hour, or day. This expands internally to a stateful counter that tracks calls and resets at the start of each window (top of the hour, midnight UTC, etc.).

When the agent hits the limit, it receives:

[INTERCEPT POLICY DENIED] Hourly limit of 5 new issues reached

The agent can see the denial reason and adapt — wait, inform the user, or move on to other work.

You can stack multiple limits on the same tool to handle different time scales — a burst limit and a daily cap:

tools:
  create_issue:
    rules:
      - name: "burst limit"
        rate_limit: 3/minute
        on_deny: "Slow down — max 3 issues per minute"

      - name: "daily limit"
        rate_limit: 20/day
        on_deny: "Daily issue creation limit (20) reached"

Both rules are evaluated on every call. The burst limit prevents runaway loops. The daily limit prevents gradual accumulation. All rules for a tool must pass — if either one denies the call, it’s blocked.

Different tools get different limits based on risk. Read operations are generally safe to call frequently. Write operations need tighter controls. Destructive operations get hidden entirely:

Global Rate Limits

Per-tool limits don’t catch an agent that spreads its activity across many different tools. An agent making 10 calls each to 20 different tools is making 200 calls — which might be fine individually but signals a runaway loop.

The wildcard key "*" applies to every tool call:

  "*":
    rules:
      - name: "global rate limit"
        rate_limit: 60/minute

This caps total MCP traffic at 60 calls per minute regardless of which tools are being called. Combine this with per-tool limits for layered protection:

version: "1"
description: "Layered rate limiting"

tools:
  create_issue:
    rules:
      - name: "issue limit"
        rate_limit: 5/hour

  create_pull_request:
    rules:
      - name: "pr limit"
        rate_limit: 3/hour

  "*":
    rules:
      - name: "global rate limit"
        rate_limit: 60/minute

Wildcard rules are evaluated after tool-specific rules. A call must pass both its tool-specific rules and all wildcard rules.

Stateful Counters: Beyond Simple Counting

The rate_limit shorthand counts calls. But sometimes you need to count something else — dollars spent, bytes transferred, records modified. Stateful counters with dynamic increments handle this:

tools:
  create_charge:
    rules:
      - name: "daily spend cap"
        conditions:
          - path: "state.create_charge.daily_spend"
            op: "lte"
            value: 1000000
        on_deny: "Daily spending cap of $10,000.00 reached"
        state:
          counter: "daily_spend"
          window: "day"
          increment_from: "args.amount"

Instead of incrementing by 1 on each call, this counter increments by args.amount — the actual dollar amount (in cents) of each charge. The condition checks the cumulative total, not the call count.

This is the difference between “you can make 100 charges per day” and “you can spend $10,000 per day.” The second is usually what you actually want. For a complete walkthrough of spending controls, see our step-by-step guide.

How Counter State Works

Counters persist in Intercept’s state store (SQLite by default). They survive process restarts, so a daily cap at $8,000 remains at $8,000 after a redeploy.

Windows are calendar-aligned in UTC:

minute resets at the start of each UTC minute
hour resets at the top of each UTC hour
day resets at midnight UTC

Intercept uses a two-phase model for counter updates:

Reserve: atomically read the counter and tentatively increment it. If the post-increment value exceeds the limit, deny immediately without modifying the counter.
Forward: send the call to the upstream MCP server.
Commit or rollback: if the upstream call succeeds, the reservation stands. If it fails, the increment is rolled back.

This means a failed Stripe charge doesn’t consume spend quota. A 500 error from GitHub doesn’t count against your rate limit. Only successful calls are counted.

Combining Rate Limits with Argument Validation

Rate limits alone don’t prevent an agent from making individual calls that are too large. Combine them with argument validation for comprehensive controls:

tools:
  create_charge:
    rules:
      - name: "max single charge"
        conditions:
          - path: "args.amount"
            op: "lte"
            value: 50000
        on_deny: "Single charge cannot exceed $500.00"

      - name: "allowed currencies"
        conditions:
          - path: "args.currency"
            op: "in"
            value: ["usd", "eur"]
        on_deny: "Only USD and EUR charges are permitted"

      - name: "daily spend cap"
        conditions:
          - path: "state.create_charge.daily_spend"
            op: "lte"
            value: 1000000
        on_deny: "Daily spending cap of $10,000.00 reached"
        state:
          counter: "daily_spend"
          window: "day"
          increment_from: "args.amount"

      - name: "daily call count"
        rate_limit: 100/day
        on_deny: "Daily charge call limit (100) reached"

Four layers of protection: no single charge over $500, only USD/EUR, no more than $10,000 per day total, and no more than 100 charge attempts per day. Each rule is evaluated independently, and all must pass.

Wiring It Into Your MCP Client

Once your policy is written, configure your MCP client to connect through Intercept. For clients that use .mcp.json:

{
  "mcpServers": {
    "github": {
      "command": "intercept",
      "args": [
        "-c", "/path/to/github-policy.yaml",
        "--",
        "npx", "-y", "@modelcontextprotocol/server-github"
      ],
      "env": {
        "GITHUB_PERSONAL_ACCESS_TOKEN": "ghp_..."
      }
    }
  }
}

For remote MCP servers over HTTP/SSE:

intercept -c policy.yaml --upstream https://mcp.example.com \
  --header "Authorization: Bearer tok_..."

The agent connects to Intercept as if it were the MCP server. Same tools, same schemas, same protocol. The rate limits are invisible until a rule triggers.

Choosing the Right Limits

There’s no universal formula for rate limits. The right values depend on your use case. But here are some heuristics:

Start tight, loosen as needed. It’s easier to increase a limit that’s too low than to recover from a limit that’s too high. If your agent hits a rate limit during normal operation, that’s a signal to adjust — not a failure.

Separate read and write. Read operations are usually safe to call frequently. Write operations create state changes that are harder to undo. Use different limits for each.

Match upstream limits. If the GitHub API allows 5,000 requests per hour, there’s no point setting your limit at 10,000. But setting it well below the upstream limit gives you headroom and prevents your agent from consuming your entire API quota.

Use multiple windows. A daily limit of 100 doesn’t prevent 100 calls in one minute. Add a per-minute or per-hour limit alongside your daily cap.

Monitor denials. Frequent rate limit denials either mean your limits are too tight or your agent has a bug. Both are worth investigating. Intercept returns denial messages to the agent, so you can log and alert on them.

Start with a global rate limit of 60/minute and per-tool limits on any write operation. You can always loosen them. You can’t undo what an unthrottled agent does in the first 30 seconds.

For the next level of control beyond rate limiting, add spending controls with stateful counters that track cumulative dollar amounts, not just call counts.

FAQ

What’s the difference between MCP rate limits and API rate limits?

API rate limits are set by the service provider (e.g. GitHub’s 5,000 requests/hour). MCP rate limits are enforced at the proxy layer by you, before requests reach the API. They give you independent control over agent behaviour — you can set much tighter limits than the API allows, and different limits per tool based on risk.

Do MCP rate limits persist across agent restarts?

Yes. Intercept stores counter state in a persistent store (SQLite by default). If your agent restarts mid-session, the rate limit counters pick up where they left off. A daily limit of 20 issues remains at 15 after a restart if 15 were created before the restart.

How do I choose the right rate limit values?

Start tight and loosen as needed. A good starting point: 60/minute global, 3-5/minute for write operations, 10-20/day for high-impact operations. Monitor denial logs — if the agent hits limits during normal operation, increase them incrementally. It’s easier to loosen a tight limit than to recover from a limit that was too high.