Four Ways to Truncate Enormous Tool Output Without Breaking UTF-8

The 50,000-token tool result

The agent had a search tool. The search tool called grep over a large codebase. The grep result came back. It was 50,000 tokens of output: file paths, line numbers, matched lines, context lines, the works.

The agent stuffed it all into a tool_result content block and submitted the next turn. The API returned a 400. Context window exceeded.

The fix in that moment was one line: take the first 8,000 characters of the grep result, truncate there, move on. It worked. Then the same thing happened with a database query tool that returned full table dumps. One line fix there too. Then a log file reader tool. Another one liner.

By the fourth time, it was clear this should be a library. Not because any individual fix was complex, but because each one was slightly different, some truncated by character count, some by line count, one truncated from the wrong end and lost the file path context, and none of them were UTF-8 safe.

The shape of the fix

Four strategies, one crate:

use tool_output_truncate::{truncate, Strategy};

let raw_output = run_grep_tool();  // might be enormous

// Keep the first 8000 chars, discard the rest
let safe = truncate(&raw_output, 8000, Strategy::Head);

// Keep the last 8000 chars (useful for log files: recent entries matter more)
let safe = truncate(&raw_output, 8000, Strategy::Tail);

// Keep first 40% and last 40%, drop middle, insert marker
let safe = truncate(&raw_output, 8000, Strategy::Middle);

// Same as Middle but counted by lines instead of chars
let safe = truncate(&raw_output, 200, Strategy::MiddleLines);

The output from Middle looks like:

/src/auth/login.ts:42: const token = await generateToken(user)
/src/auth/login.ts:43: return { token, user }
[...truncated 31204 chars...]
/src/auth/session.ts:88: if (!token) throw new AuthError("expired")
/src/auth/session.ts:89: return session.fromToken(token)

The first 40% gives context about where the search started. The last 40% gives the end of the result, which in a grep output often contains the most specific matches. The middle drop marker tells the model exactly how much was cut.

What this is NOT

This is not a token counter. The max_size parameter is in characters, not tokens. A rough rule for most LLM tokenizers is 3.5-4 characters per token, so divide your token budget by 4 to get a conservative character limit. A proper token counter (like prompt-token-counter) can give you exact numbers, but for truncation purposes the character approximation is usually close enough and has zero dependencies.

This is not a smart summarizer. Nothing in this crate reads the content and decides what is important. It keeps characters from the requested positions. If you need semantic compression, that is a different layer, usually involving a summarization call.

This is not a streaming chunker. This crate takes a complete string and returns a truncated string. If you are processing a stream and need to chunk it into context-sized pieces, that is a different use case.

Inside the lib

The UTF-8 safety is the part worth explaining. Rust strings are guaranteed to be valid UTF-8, but byte slicing is not. If you write &s[..8000] and byte 8000 happens to be in the middle of a three-byte Japanese character, you get a panic.

The crate uses char_indices to find the last valid character boundary at or before the target:

fn safe_char_boundary(s: &str, byte_target: usize) -> usize {
    if byte_target >= s.len() {
        return s.len();
    }
    // Walk back to a char boundary
    let mut pos = byte_target;
    while !s.is_char_boundary(pos) {
        pos -= 1;
    }
    pos
}

This is called for both the head cut and the tail cut in Middle strategy to make sure neither slice lands mid-character.

The MiddleLines strategy splits on \n, counts lines, and drops the middle block of lines. It reconstructs with join("\n") after dropping. This avoids any byte-level slicing entirely and is safe by construction.

The marker string is configurable:

let safe = truncate_with_marker(
    &raw_output,
    8000,
    Strategy::Middle,
    "[...{n} chars removed...]",
);

The {n} placeholder is replaced with the actual character count removed. Default marker is [...truncated {n} chars...].

When this is useful

Any tool that can return unbounded output. Search tools, log readers, database queries, file readers, shell command output. If the tool can return more than your context budget, you need truncation before the result goes into the message history.

Tool pipelines where the result feeds into the next call. If grep output is being passed to a summarization call, you want to truncate before the summarization call, not after, so you do not pay for tokens you then immediately drop.

Multi-tool agent loops. In a loop, context accumulates. Truncating each tool result aggressively early is cheaper than compressing the whole history later.

When NOT to use this

When the full output fits comfortably in context. Truncation is a last resort. If your tool output is consistently under 2,000 tokens, add the full result and skip the overhead.

When the middle section is the most relevant part. The Middle strategy assumes the ends of the output are more informative than the middle. That is true for grep (file paths at top, specific matches at bottom) and false for a sorted ranking list (most relevant at top, least relevant at bottom). Use Head for sorted results.

When you are building a document processing pipeline. Tool output truncation is for agent tool results. For document chunking and retrieval, use a dedicated chunking library that understands document structure.

Install

[dependencies]
tool-output-truncate = "0.1"

GitHub: MukundaKatta/tool-output-truncate

crates.io: tool-output-truncate

Siblings

Crate / lib	What it does
prompt-token-counter-rs	Approximate token counts for full message payloads
agentfit-rs	Token-aware message truncation for full conversation history
agent-message-window	Sliding window with paired tool_use/tool_result protection
tool-output-format	Render tool output as LLM-friendly markdown before truncating

What is next

A token-aware mode is the main thing missing. Right now max_size is in characters. A truncate_tokens entry point that accepts a token budget and a tokenizer function would make the API more natural for most agent code that thinks in tokens.

A Smart strategy that prefers cutting at logical boundaries (blank lines, section headers, --- separators) rather than raw character counts is also worth building. For now, MiddleLines gets you part of the way there since line boundaries are at least meaningful units.

The four current strategies cover the common cases. Add the crate, pick a strategy, stop rewriting the same one-liner in every tool handler.