How `sync.Pool` Helped Me Stabilize p99 Latency in a High-Throughput Log Processing Pipeline

A while ago, I was working on a log processing system that was very similar in spirit to Sentry.

The system was responsible for receiving events from different services, parsing JSON payloads, normalizing metadata, enriching logs with extra context, and then pushing the final events into storage and downstream queues.

At first, everything looked fine.

The code was clean. The architecture was simple. The throughput was acceptable in local testing.

But when the traffic increased, something interesting happened:

The average latency was still okay, but the p99 latency started to become unstable.

That was the first sign that the bottleneck was not just CPU or database performance. Something deeper was happening inside the runtime.

In this article, I want to explain how I found the problem, why the system became slower under load, and how using sync.Pool helped reduce allocation pressure, make the garbage collector work less, and keep latency more stable.

This is not a magical optimization. It is a very specific tool for a very specific kind of problem.

But when you are building high-throughput systems, especially systems that process JSON, logs, network payloads, or temporary buffers, understanding object pooling can make a big difference.

The System I Was Building

The service was a log ingestion pipeline.

The flow looked something like this:

Client / SDK
   ↓
HTTP ingestion API
   ↓
JSON decode
   ↓
Validation
   ↓
Normalization
   ↓
Enrichment
   ↓
Batching
   ↓
Queue / Storage

Each incoming request contained one or more log events.

A simplified event looked like this:

{
  "service": "payment-api",
  "level": "error",
  "message": "failed to charge customer",
  "timestamp": "2026-05-28T12:40:00Z",
  "trace_id": "6f9c9b9d7a1a",
  "metadata": {
    "customer_id": "cus_123",
    "region": "eu-west",
    "retry_count": 2
  }
}

For every request, the service had to:

read the request body
decode JSON
create internal event structs
normalize fields
create temporary buffers
encode the final payload again
send it to another component

This kind of workload creates many short-lived objects.

And in Go, short-lived objects are usually fine — until you create too many of them in a hot path.

The First Symptom: Average Latency Looked Fine, p99 Did Not

At low traffic, the service looked healthy.

Something like this:

Requests/sec:        2,000
Average latency:     8ms
p95 latency:         18ms
p99 latency:         35ms
CPU usage:           Normal
Memory usage:        Stable

But when the traffic increased, the picture changed:

Requests/sec:        15,000+
Average latency:     16ms
p95 latency:         90ms
p99 latency:         280ms - 400ms
CPU usage:           Higher than expected
Memory usage:        Sawtooth pattern
GC activity:         Frequent

The average latency was not terrible, but the tail latency was bad.

And for this kind of system, p99 matters a lot.

If a log processing system becomes slow during incidents, it creates a very bad situation: when you need observability the most, your observability pipeline becomes the bottleneck.

That is exactly the kind of failure mode I wanted to avoid.

Why p99 Latency Matters More Than Average Latency

Average latency can hide real production problems.

For example, imagine this:

95 requests finish in 10ms
4 requests finish in 50ms
1 request finishes in 500ms

The average may still look acceptable.

But that one slow request is part of your p99.

In high-throughput backend systems, p99 latency usually tells a more honest story than average latency.

When p99 starts jumping under load, it usually means some part of the system occasionally blocks, pauses, waits, or does too much work.

In my case, one of the main causes was allocation pressure and frequent garbage collection.

The Initial Code: Simple, But Allocation Heavy

The first version of the code was easy to read.

For every request, I created new buffers and temporary objects.

Something like this:

package main

import (
    "bytes"
    "encoding/json"
    "io"
    "net/http"
)

type LogEvent struct {
    Service   string                 `json:"service"`
    Level     string                 `json:"level"`
    Message   string                 `json:"message"`
    Timestamp string                 `json:"timestamp"`
    TraceID   string                 `json:"trace_id"`
    Metadata  map[string]interface{} `json:"metadata"`
}

func ingestHandler(w http.ResponseWriter, r *http.Request) {
    body, err := io.ReadAll(r.Body)
    if err != nil {
        http.Error(w, "failed to read body", http.StatusBadRequest)
        return
    }

    var events []LogEvent
    if err := json.Unmarshal(body, &events); err != nil {
        http.Error(w, "invalid json", http.StatusBadRequest)
        return
    }

    normalized := make([]LogEvent, 0, len(events))

    for _, event := range events {
        normalized = append(normalized, normalizeEvent(event))
    }

    var buf bytes.Buffer
    if err := json.NewEncoder(&buf).Encode(normalized); err != nil {
        http.Error(w, "failed to encode events", http.StatusInternalServerError)
        return
    }

    // sendToQueue(buf.Bytes())

    w.WriteHeader(http.StatusAccepted)
}

func normalizeEvent(event LogEvent) LogEvent {
    if event.Level == "" {
        event.Level = "info"
    }

    if event.Metadata == nil {
        event.Metadata = make(map[string]interface{})
    }

    return event
}

At first glance, this is not bad code.

Actually, for many applications, this is completely fine.

But under heavy load, the problem was that this path was executed thousands of times per second.

That means the service was constantly creating:

new byte slices
new buffers
new maps
new event slices
new encoder objects
new temporary JSON structures

Most of these objects were short-lived.

They were created, used for a few milliseconds, and then became garbage.

The garbage collector had to clean them again and again.

The Real Problem: Too Many Temporary Objects

In Go, allocation itself is not always expensive.

The real cost often appears later.

Every object that escapes to the heap becomes something the garbage collector may need to track.

When the system creates too many temporary objects, the GC has more work to do.

That can lead to:

more frequent GC cycles
more CPU used by the runtime
less CPU available for actual request processing
latency spikes during high traffic
unstable p95 and p99 latency

This was exactly what I saw.

The service was not slow because the logic was complex.

It was slow because the hot path was creating too much garbage.

That is an important distinction.

Sometimes performance problems are not caused by bad algorithms. Sometimes they are caused by too much memory churn.

How I Confirmed the Problem

Before changing anything, I wanted to confirm the source of the issue.

I checked runtime metrics and profiling data.

The signs were clear:

high allocation rate
frequent GC cycles
bytes.Buffer allocations in the hot path
JSON processing allocations
temporary slices created per request

A simplified benchmark showed the same pattern.

func BenchmarkIngestWithoutPool(b *testing.B) {
    payload := generatePayload(100)

    b.ReportAllocs()
    b.ResetTimer()

    for i := 0; i < b.N; i++ {
        _, err := processLogsWithoutPool(payload)
        if err != nil {
            b.Fatal(err)
        }
    }
}

The result looked like this in my local benchmark:

BenchmarkIngestWithoutPool-10       8200    145000 ns/op    128 KB/op    420 allocs/op

The exact numbers are not the important part.

The important part was the pattern:

allocs/op was high
bytes/op was high
GC activity increased with throughput
p99 latency became unstable under pressure

That told me the optimization target was not just request logic.

The target was allocation behavior.

Where `sync.Pool` Fits

sync.Pool is a temporary object pool provided by Go.

It allows you to reuse objects instead of allocating new ones every time.

A good mental model is this:

Instead of creating and throwing away the same type of object thousands of times per second, you keep reusable objects in a pool.

You get one when you need it.

You reset it.

You use it.

Then you put it back.

Get  →  Reset  →  Use  →  Put back

This can reduce pressure on the garbage collector because fewer temporary objects are allocated in the hot path.

But there is an important warning:

sync.Pool is not a cache.

Objects inside the pool can be removed by the garbage collector at any time.

So you should not use it to store important state.

It is best for temporary, reusable objects like:

bytes.Buffer
[]byte buffers
temporary encoders
scratch objects
serialization helpers

That made it a good fit for my log processing pipeline.

The Improved Version: Reusing Buffers

The first improvement was to reuse bytes.Buffer objects.

Instead of creating a new buffer for every request, I created a pool:

package main

import (
    "bytes"
    "encoding/json"
    "io"
    "net/http"
    "sync"
)

var bufferPool = sync.Pool{
    New: func() any {
        return new(bytes.Buffer)
    },
}

type LogEvent struct {
    Service   string                 `json:"service"`
    Level     string                 `json:"level"`
    Message   string                 `json:"message"`
    Timestamp string                 `json:"timestamp"`
    TraceID   string                 `json:"trace_id"`
    Metadata  map[string]interface{} `json:"metadata"`
}

func ingestHandler(w http.ResponseWriter, r *http.Request) {
    body, err := io.ReadAll(r.Body)
    if err != nil {
        http.Error(w, "failed to read body", http.StatusBadRequest)
        return
    }

    var events []LogEvent
    if err := json.Unmarshal(body, &events); err != nil {
        http.Error(w, "invalid json", http.StatusBadRequest)
        return
    }

    normalized := make([]LogEvent, 0, len(events))

    for _, event := range events {
        normalized = append(normalized, normalizeEvent(event))
    }

    buf := bufferPool.Get().(*bytes.Buffer)
    buf.Reset()

    defer func() {
        buf.Reset()
        bufferPool.Put(buf)
    }()

    if err := json.NewEncoder(buf).Encode(normalized); err != nil {
        http.Error(w, "failed to encode events", http.StatusInternalServerError)
        return
    }

    // Important:
    // If the downstream function stores this data or uses it asynchronously,
    // copy it before returning the buffer to the pool.
    payload := append([]byte(nil), buf.Bytes()...)

    // sendToQueue(payload)
    _ = payload

    w.WriteHeader(http.StatusAccepted)
}

func normalizeEvent(event LogEvent) LogEvent {
    if event.Level == "" {
        event.Level = "info"
    }

    if event.Metadata == nil {
        event.Metadata = make(map[string]interface{})
    }

    return event
}

This change looks small, but it matters under load.

The buffer is no longer allocated from zero for every request.

Instead, the service reuses an existing buffer.

That means fewer allocations, less memory churn, and less GC pressure.

The Most Important Rule: Never Reuse Dirty Objects

When using a pool, always reset the object before reusing it.

This is critical.

For bytes.Buffer, call:

buf.Reset()

For a slice, reset it like this:

items = items[:0]

For a struct, clear the fields manually or create a reset method:

type EventBuilder struct {
    service string
    level   string
    message string
    fields  map[string]string
}

func (b *EventBuilder) Reset() {
    b.service = ""
    b.level = ""
    b.message = ""

    for k := range b.fields {
        delete(b.fields, k)
    }
}

If you forget to reset objects properly, you can accidentally leak data between requests.

That is not just a performance bug.

In a log processing system, it can become a serious correctness or security issue.

Imagine one customer's metadata appearing inside another customer's event because a pooled object was not cleaned correctly.

That is why object pooling should be used carefully.

A Better Pool for Reusable Event Builders

In my case, buffers were only one part of the problem.

The pipeline also had a temporary event builder used during normalization and enrichment.

A simplified version looked like this:

type EventBuilder struct {
    Service string
    Level   string
    Message string
    TraceID string
    Fields  map[string]string
}

func NewEventBuilder() *EventBuilder {
    return &EventBuilder{
        Fields: make(map[string]string, 16),
    }
}

Creating this for every event caused extra allocations, especially because of the map.

So I moved it to a pool as well:

var eventBuilderPool = sync.Pool{
    New: func() any {
        return &EventBuilder{
            Fields: make(map[string]string, 16),
        }
    },
}

type EventBuilder struct {
    Service string
    Level   string
    Message string
    TraceID string
    Fields  map[string]string
}

func (b *EventBuilder) Reset() {
    b.Service = ""
    b.Level = ""
    b.Message = ""
    b.TraceID = ""

    for k := range b.Fields {
        delete(b.Fields, k)
    }
}

func buildEvent(raw LogEvent) *EventBuilder {
    builder := eventBuilderPool.Get().(*EventBuilder)
    builder.Reset()

    builder.Service = raw.Service
    builder.Level = raw.Level
    builder.Message = raw.Message
    builder.TraceID = raw.TraceID

    for k, v := range raw.Metadata {
        builder.Fields[k] = stringify(v)
    }

    return builder
}

func releaseEventBuilder(builder *EventBuilder) {
    builder.Reset()
    eventBuilderPool.Put(builder)
}

func stringify(v interface{}) string {
    switch value := v.(type) {
    case string:
        return value
    case int:
        return fmt.Sprintf("%d", value)
    case float64:
        return fmt.Sprintf("%f", value)
    case bool:
        return fmt.Sprintf("%t", value)
    default:
        return fmt.Sprintf("%v", value)
    }
}

The idea is simple:

Do not allocate a new builder for every event.
Reuse a builder.
Clean it carefully.
Return it to the pool.

This helped reduce the number of allocations per event.

But again, this requires discipline.

If the builder is still used somewhere else, do not put it back into the pool.

Only return an object to the pool when you are 100% sure nobody else will use it.

A Common Bug: Returning a Buffer Too Early

This is one of the most dangerous mistakes with sync.Pool.

Bad example:

buf := bufferPool.Get().(*bytes.Buffer)
buf.Reset()

defer bufferPool.Put(buf)

json.NewEncoder(buf).Encode(event)

sendAsync(buf.Bytes()) // dangerous

This is dangerous because sendAsync may use the byte slice later, after the buffer has already been returned to the pool.

Another request can get the same buffer and overwrite the data.

The result can be corrupted payloads.

The safer version is:

buf := bufferPool.Get().(*bytes.Buffer)
buf.Reset()

defer func() {
    buf.Reset()
    bufferPool.Put(buf)
}()

json.NewEncoder(buf).Encode(event)

payload := append([]byte(nil), buf.Bytes()...)
sendAsync(payload)

Yes, this copy creates an allocation.

But correctness is more important.

Pooling should not create hidden data races or corrupted messages.

The goal is not to remove every allocation from the system.

The goal is to remove unnecessary allocations safely.

Benchmark Before and After

After applying pooling to the hottest temporary objects, the benchmark improved.

The simplified benchmark before pooling:

BenchmarkIngestWithoutPool-10       8200    145000 ns/op    128 KB/op    420 allocs/op

After pooling reusable buffers and temporary event builders:

BenchmarkIngestWithPool-10         13500     89000 ns/op     62 KB/op    210 allocs/op

In this benchmark, the improvement was roughly:

~38% lower processing time per operation
~51% lower memory usage per operation
~50% fewer allocations per operation

But the real win was visible under load.

Before:

Requests/sec:        15,000+
Average latency:     16ms
p95 latency:         90ms
p99 latency:         280ms - 400ms
GC cycles:           Frequent

After:

Requests/sec:        15,000+
Average latency:     11ms
p95 latency:         42ms
p99 latency:         95ms - 140ms
GC cycles:           Reduced

These numbers are from a simplified internal benchmark scenario, not a universal promise.

Your results will depend on payload size, CPU, memory, JSON structure, traffic pattern, and how many allocations exist in your hot path.

But the direction was clear:

less allocation pressure
less GC work
more stable tail latency
better throughput under pressure

That was exactly what the system needed.

Why This Works Well for Log Processing Systems

Log processing systems are a good use case for pooling because they usually process many similar objects repeatedly.

For example:

network buffers
JSON payload buffers
temporary event builders
batch buffers
compression buffers
serialization buffers

The pattern repeats thousands of times per second.

That makes object reuse valuable.

In a normal CRUD API, this optimization may not matter.

But in ingestion systems, queues, observability pipelines, proxies, gateways, and stream processors, small allocation costs can become very large at scale.

This is where senior-level performance work usually starts:

Not by guessing.

Not by making the code complex immediately.

But by measuring the system, finding the hot path, and reducing unnecessary work where it actually matters.

When Not to Use `sync.Pool`

sync.Pool is powerful, but it is not something I use everywhere.

I avoid it when:

the object is very small
the code is not in a hot path
allocation rate is already low
pooling makes the code harder to understand
the object contains sensitive data and cleanup is risky
ownership is unclear

For example, pooling a tiny struct in a low-traffic admin endpoint is probably useless.

It makes the code more complex without improving the system.

That is not good engineering.

Good performance engineering is not about adding clever tricks everywhere.

It is about knowing where the system actually pays the cost.

Practical Rules I Follow

Here are the rules I personally follow when using sync.Pool:

1. Measure first

Do not add pooling just because it looks professional.

Check allocations with benchmarks:

b.ReportAllocs()

Use profiling when possible:

go test -bench=. -benchmem

For production-like analysis, use pprof and runtime metrics.

2. Pool only hot-path temporary objects

Good candidates:

bytes.Buffer
large []byte slices
temporary builders
compression buffers
serialization helpers

Bad candidates:

business state
long-lived objects
request-specific objects still used asynchronously
objects with unclear ownership

3. Always reset before reuse

Never trust the object you get from the pool.

Clean it before using it.

Clean it before putting it back.

4. Be careful with references

Do not return an object to the pool while another goroutine, function, or queue still references it.

This is especially important with:

[]byte
bytes.Buffer
maps
slices
pointers

5. Do not use `sync.Pool` as a cache

The garbage collector can clear pooled objects.

So never depend on the pool for correctness.

The system must work even if the pool is empty.

The Senior Engineering Lesson

The biggest lesson for me was this:

Performance problems are not always about slow code.

Sometimes they are about how much temporary work the code creates.

In my log processing service, the business logic was not very complicated.

The real issue was that every request created too many short-lived objects.

At low traffic, this was invisible.

At high traffic, it became a GC problem.

And once GC became more active, p99 latency became unstable.

Using sync.Pool did not magically make the system fast.

But it removed unnecessary pressure from the runtime.

That gave the service more breathing room under load.

The final architecture was still simple:

receive logs
parse JSON
normalize events
reuse temporary buffers
batch efficiently
send downstream

But the implementation became more careful about memory.

That is the difference between code that works and code that survives real production traffic.

Final Thoughts

sync.Pool is not something every Go application needs.

But if you are building high-throughput systems like:

log processors
observability pipelines
API gateways
stream processors
network services
JSON-heavy ingestion APIs

then object pooling is worth understanding.

The key is not to use it blindly.

The key is to measure first, identify allocation-heavy hot paths, and then reuse objects safely.

For my Sentry-like log processing pipeline, pooling buffers and temporary builders helped reduce allocations, lower GC pressure, and stabilize p99 latency under high load.

And in production systems, stable p99 latency is often more important than a beautiful average latency number.

Because users do not feel your average.

They feel the slow requests.

How sync.Pool Helped Me Stabilize p99 Latency in a High-Throughput Log Processing Pipeline

How `sync.Pool` Helped Me Stabilize p99 Latency in a High-Throughput Log Processing Pipeline

The System I Was Building

The First Symptom: Average Latency Looked Fine, p99 Did Not

Why p99 Latency Matters More Than Average Latency

The Initial Code: Simple, But Allocation Heavy

The Real Problem: Too Many Temporary Objects

How I Confirmed the Problem

Where `sync.Pool` Fits

The Improved Version: Reusing Buffers

The Most Important Rule: Never Reuse Dirty Objects

A Better Pool for Reusable Event Builders

A Common Bug: Returning a Buffer Too Early

Benchmark Before and After

Why This Works Well for Log Processing Systems

When Not to Use `sync.Pool`

Practical Rules I Follow

1. Measure first

2. Pool only hot-path temporary objects

3. Always reset before reuse

4. Be careful with references

5. Do not use `sync.Pool` as a cache

The Senior Engineering Lesson

Final Thoughts

Tags

Author

Stats

Published

You Might Also Like

Shipping Your Machine: Building a Container in 50 Lines of Code (Part 2)

How a 500 MB Buffer Killed Our Archival Job — And Why Streaming Fixed It

I turned an abandoned Go project into a full terminal Arcade Game

I Built a REST Microservice With a Database in 3 Files — and Wrote Zero Code

I'm not an ML engineer. I built one anyway.

Go Error Handling: Annoying or Awesome?

How sync.Pool Helped Me Stabilize p99 Latency in a High-Throughput Log Processing Pipeline

How sync.Pool Helped Me Stabilize p99 Latency in a High-Throughput Log Processing Pipeline

The System I Was Building

The First Symptom: Average Latency Looked Fine, p99 Did Not

Why p99 Latency Matters More Than Average Latency

The Initial Code: Simple, But Allocation Heavy

The Real Problem: Too Many Temporary Objects

How I Confirmed the Problem

Where sync.Pool Fits

The Improved Version: Reusing Buffers

The Most Important Rule: Never Reuse Dirty Objects

A Better Pool for Reusable Event Builders

A Common Bug: Returning a Buffer Too Early

Benchmark Before and After

Why This Works Well for Log Processing Systems

When Not to Use sync.Pool

Practical Rules I Follow

1. Measure first

2. Pool only hot-path temporary objects

3. Always reset before reuse

4. Be careful with references

5. Do not use sync.Pool as a cache

The Senior Engineering Lesson

Final Thoughts

Tags

Author

Stats

Published

You Might Also Like

Shipping Your Machine: Building a Container in 50 Lines of Code (Part 2)

How a 500 MB Buffer Killed Our Archival Job — And Why Streaming Fixed It

I turned an abandoned Go project into a full terminal Arcade Game

I Built a REST Microservice With a Database in 3 Files — and Wrote Zero Code

I'm not an ML engineer. I built one anyway.

Go Error Handling: Annoying or Awesome?

How `sync.Pool` Helped Me Stabilize p99 Latency in a High-Throughput Log Processing Pipeline

Where `sync.Pool` Fits

When Not to Use `sync.Pool`

5. Do not use `sync.Pool` as a cache