6 of 6 official MCP servers cluster at 56–60/100 on schema-description density

After ten days of running the v1.1.0 publishability rubric against every MCP server I can find on npm under the official @modelcontextprotocol scope, the cluster pattern is now
hard to ignore.

6 of 6 official Anthropic-shipped MCP servers score 56–60/100 on the v1.1.0 publishability composite. The cap that fires is the same axis every time: description-five-axis.

| Server | Composite | Protocol | Edge cases | Publish | Per-tool axis avg | Cap |
|---|---:|---:|---:|---:|---:|---|
| server-sequential-thinking | 60 | 100 | 100 | 20 | n/a (single tool) | description-five-axis |
| server-memory | 60 | 100 | 85 | 50 | 1.00 / 5 | description-five-axis |
| server-everything | 60 | 100 | 94 | 20 | 0.55 / 5 | description-five-axis |
| server-filesystem | 60 | 100 | 57 | 50 | 0.88 / 5 | description-five-axis |
| server-github (legacy) | 60 | 100 | 26 | 50 | 0.44 / 5 | description-five-axis |
| server-puppeteer (deprecated) | 56 | 100 | 50 | 20 | 0.17 / 5 | description-five-axis |

Every protocol score is 100. The wire format is right on every server. The 40-point gap is entirely how the schemas read.

## What "0.17 / 5" looks like in practice

Take Puppeteer's puppeteer_navigate. The full schema description is:

Navigate to a URL.
Score that against the 5 axes:

Purpose — "navigate to a URL" ✓ (1 axis)
Mutation signal — does it read or write? Silent. ✗
Side-effects — network call, can hit any URL, executes JS, arbitrary cookie state. High-blast. Silent. ✗
Invariants — does it close existing tabs? Open a new one? Same tab? Silent. ✗
Examples — none. ✗

1 / 5. The other six Puppeteer tools score the same way. Average 0.17.

A planner LLM that has to decide whether to call puppeteer_navigate from a tool list of 7 has nothing to pattern-match on. It cannot tell the difference between puppeteer_navigate (mutates browser state, can hit any URL) and puppeteer_screenshot (read-only, current page only) from the schema alone — they read identically.

## Why this matters more than it looks

The reference servers are calibration anchors. When a server author opens the docs to figure out "what does a good MCP server look like", they read these. When an LLM coding agent autocompletes a new MCP server skeleton, it pattern-matches on these. When the spec doc shows "here's how to write a tool", it links to these.

If the bar Anthropic ships at is 56–60/100, that's the bar most third-party servers will start from too — and probably stay at, because there's no public benchmark telling them they're under it.

That's the v1.1.0 thesis: surface the bar so authors can decide where they want to land. mcp-probe score is one command.

```bash npx -y @incultnitollc/mcp-probe score "" --full




  The 5-axis breakdown tells you exactly which axis is empty on which tool. Per-tool axis avg below 3.0/5 fires the ≤60 publishability cap. Fix two axes per tool (mutation signal + one concrete example is usually fastest) and the cap lifts.

  ## Methodology

  - v1.1.0 spec: <https://github.com/Incultnitollc/mcp-probe/blob/main/docs/specs/publishability-score-v1.1.0.md>
  - Calibration drift notes: <https://github.com/Incultnitollc/mcp-probe/blob/main/docs/specs/publishability-score-v1.1.0-amendments.md>
  - 6-server summary (canonical): <https://github.com/Incultnitollc/mcp-probe/blob/main/docs/publishability-scorecards/SUMMARY.md>
  - Individual server scorecards: under `docs/publishability-scorecards/` in the same repo

  ## Caveat — install-time security is a different lane

  `mcp-probe` is pre-publish quality (server authors, before they ship). For install-time security (server installers, before they connect a third-party server), see[`@stephenywilson/mcp-doctor`](https://www.npmjs.com/package/@stephenywilson/mcp-doctor). Different audience, different lane, complementary tool.