JSON or XML Tags for LLM Output: The Format That Holds Under Pressure

Book: Prompt Engineering Pocket Guide: Techniques for Getting the Most from LLMs
Also by me: Thinking in Go (2-book series) — Complete Guide to Go Programming + Hexagonal Architecture in Go
My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
Me: xgabriel.com | GitHub

Your extraction endpoint has run clean for weeks. Then a support ticket arrives in mixed German and English, the model decides to be helpful, and the response comes back as:

Here's the JSON you asked for:
{"name": "Müller GmbH", "role": "Lead"}
Let me know if you need anything else!

json.loads throws. Your parser sees the friendly preamble, the friendly sign-off, and dies on the first character. The data was right there. The format around it was wrong.

This is the recurring fight with structured LLM output, and it splits into two separate decisions people tend to blur together: what shape the data takes, and what wraps it so you can find it. JSON and XML tags answer different halves of that question.

Two jobs, not one

JSON is a data format. It describes a payload: keys, values, arrays, nesting. XML tags, the way most people use them in prompts, are a delimiter. They mark where a span starts and ends so you can slice it out of a stream of text.

When someone asks "JSON or XML for LLM output," they are usually conflating those jobs. The honest answer is that you often want both: an XML tag as the envelope, JSON as the letter inside it.

<result>
{"name": "Müller GmbH", "role": "Lead", "start": "2026-01"}
</result>

Now the model can ramble before <result> and apologize after </result>, and your extraction is a regex away. The JSON inside stays strict and typed. The tag absorbs the model's urge to talk.

import re, json

def extract(text: str) -> dict:
    m = re.search(r"<result>(.*?)</result>", text, re.S)
    if not m:
        raise ValueError("no <result> block found")
    return json.loads(m.group(1).strip())

That single pattern handles the preamble, the sign-off, and the markdown code fences models love to wrap JSON in. The tag gives you a landmark. JSON gives you the schema.

Where raw JSON wins

If you control the decoding, skip the wrapper. Most current APIs expose a structured-output or JSON mode that constrains generation to a schema you supply. The model physically cannot emit a stray sentence because the decoder rejects any token that breaks the grammar. When that mode is available, ask for raw JSON and validate it against the same schema you sent.

from pydantic import BaseModel

class Candidate(BaseModel):
    name: str
    role: str
    start: str

# pass Candidate.model_json_schema() to the API's
# structured-output / response_format parameter, then:
parsed = Candidate.model_validate_json(raw_response)

Raw JSON is the safer bet when:

The API enforces a schema at decode time (constrained decoding).
The payload is flat or shallow, with named keys.
You want to hand the output straight to a typed model like Pydantic or Zod.
You stream into a JSON-aware parser that can read incomplete objects.

In those cases an XML wrapper adds a parsing step that buys you nothing. The decoder already guarantees the shape.

Where XML tags win

Tags earn their place the moment the response holds more than one kind of thing, or the model needs room to think before it answers.

A chain-of-thought task is the clean example. You want the reasoning and a clean payload, and you do not want the reasoning inside your JSON:

<scratchpad>
The street name contains a comma, so the naive split
would break the address into two fields. Keep it whole.
</scratchpad>
<result>
{"address": "Hauptstraße 4, Hinterhaus", "city": "Berlin"}
</result>

Parse <result>, ignore <scratchpad>. The model gets its thinking space, you get a payload that never had prose mixed in.

Tags are the safer bet when:

The output has distinct sections (reasoning, answer, citations, confidence).
You want a place for the model to think that you then discard.
The model tends to wrap or annotate JSON no matter how firmly you ask.
You are mixing free text and structured data in one response.

There is a quieter reason too. Models have seen enormous amounts of tag-delimited text in training, and tags are forgiving. A missing closing brace breaks JSON. A missing closing tag still leaves you a recoverable opening landmark. The wrapper degrades more gracefully than the payload.

Nesting is where JSON pulls ahead

For deep, repeating structure, JSON is the format that holds. Nested XML built by a language model gets unreliable fast: the model loses track of which tag it opened, closes them in the wrong order, or invents a tag name halfway down.

<order>
  <items>
    <item><sku>A1</sku><qty>2</qty></item>
    <item><sku>B7</sku><qty>1</item>   <!-- missing </qty> -->
  </items>
</order>

That malformed block is a common failure mode for model-authored nested XML. The same data as JSON is flatter to generate and trivially validatable:

{
  "order": {
    "items": [
      {"sku": "A1", "qty": 2},
      {"sku": "B7", "qty": 1}
    ]
  }
}

Rule of thumb: tags for the outer envelope, JSON for anything nested or repeating. One level of tags is a landmark. Five levels of tags is a parser bug waiting to happen.

Partial streams: the part that bites in production

Streaming is where the two formats behave least alike, and where most people get surprised.

Stream raw JSON token by token and every intermediate state is invalid. {"name": "Mül is not parseable. You either buffer the whole response and parse once at the end (losing the point of streaming), or you reach for a tolerant incremental JSON parser that reads partial objects and emits keys as they complete. Those parsers exist and work, but they are extra dependency and extra care.

XML tags stream differently. You can watch the byte stream for <result> and </result> and know exactly when the payload is complete, without parsing anything mid-flight. A common production shape combines both: tags tell you the boundary, then you parse the JSON once the closing tag arrives.

async def read_result(stream):
    buf = ""
    async for chunk in stream:
        buf += chunk
        if "</result>" in buf:
            m = re.search(r"<result>(.*?)</result>",
                          buf, re.S)
            return json.loads(m.group(1).strip())
    raise ValueError("stream ended without </result>")

If you want to render reasoning live and only commit the payload at the end, tags give you a clean seam. Open <scratchpad>, stream its contents to the UI, and hold rendering of <result> until you have parsed valid JSON. The user sees the model think; your application only ever acts on a complete, validated object.

A decision you can keep

Boil it down to three questions:

Does the API enforce a schema at decode time? If yes, ask for raw JSON and validate against the same schema. The wrapper is redundant.
Does the response carry more than the payload — reasoning, citations, a second section? If yes, wrap each section in tags and put JSON inside the one that holds data.
Is the data deeply nested or repeating? Keep it as JSON. Do not ask a model to author nested XML by hand.

The combination that survives the widest range of model behavior is an XML envelope around a JSON payload, parsed by finding the tag first and decoding the JSON second. It tolerates preambles, sign-offs, code fences, and a model that wandered off before it found the schema. It costs you one regex.

The mistake is treating this as a single either/or. JSON describes data. Tags mark territory. Reach for the one that matches the job in front of you, and reach for both when the model needs room to talk and you need a payload that parses.

If this was useful

Output formatting is one of those decisions that feels trivial until a multilingual edge case or a streaming UI turns a clean parser into a 2 a.m. page. The Prompt Engineering Pocket Guide has a chapter on structured output that goes deeper on schema design, when constrained decoding is worth the latency, and how to keep the model inside the lines without burning tokens on instructions it ignores.