In the previous post, I argued that object storage is best understood as a distributed key-value system rather than a traditional file system.
That raises an interesting question:
If storing objects is relatively straightforward, what makes distributed storage systems so difficult to build?
The answer is usually not the data itself.
It's the metadata.
What is metadata?
When you upload an object, the system stores more than just the object's contents.
For example:
Key: photos/cats.png
Size: 4.2 MB
Created: 2026-06-17
Version: v3
Checksum: abc123
Location: Node-7
All of this information is metadata.
Without metadata, the system would have no idea:
- where the object lives
- whether it exists
- how large it is
- which version is current
- whether the object is healthy
In other words:
Data is the payload. Metadata is the map.
Storing data is surprisingly easy
Imagine a storage cluster with ten nodes.
When an object arrives, writing the data is often the simple part:
- Choose a destination
- Write the object
- Replicate or encode it
- Return success
Modern disks, SSDs, and networks are very good at moving bytes around.
The harder question is:
How does every node agree on where that object is?
That's a metadata problem.
The hidden database inside every storage system
Many people think object storage is primarily a storage problem.
In reality, every serious storage system contains a metadata system that behaves a lot like a database.
It must answer questions such as:
- Does this object exist?
- Which version is current?
- Which nodes store the data?
- Has replication completed?
- Is the object being deleted?
- Has the object become corrupted?
Every read and write depends on accurate answers.
A storage cluster can survive losing a disk.
It cannot survive losing track of its metadata.
Why metadata becomes difficult at scale
Imagine a system storing:
100 billion objects
The object data may be spread across hundreds of servers.
Metadata now faces several challenges:
Consistency
When a client uploads an object:
photos/cats.png
Every node must eventually agree that the object exists.
If one node thinks the upload succeeded while another thinks it failed, strange things happen:
- objects disappear
- stale versions appear
- reads become inconsistent
Concurrency
Two clients may attempt to update the same object simultaneously.
Which version wins?
Who decides?
The metadata layer must coordinate those decisions.
Failure Recovery
Nodes fail.
Disks fail.
Networks fail.
During failures, metadata must remain correct.
Incorrect metadata is often worse than missing data because the system may confidently return the wrong answer.
Why listing objects is so expensive
Many developers assume:
List all objects in /photos/
should be simple.
In distributed storage, it often isn't.
The system may need to:
- query multiple metadata partitions
- aggregate results
- sort keys
- remove duplicates
- handle concurrent updates
The actual object data may never be touched.
The entire operation is mostly metadata work.
The trade-off every storage system faces
Storage systems constantly balance three goals:
Performance
Fast reads and writes.
Consistency
Every client sees the same truth.
Availability
The system continues operating during failures.
Improving one often impacts another.
Many architectural decisions ultimately become metadata decisions.
Why metadata often determines system architecture
When engineers discuss storage systems, conversations frequently focus on:
- SSDs
- throughput
- networking
- replication
Those things matter.
But in many systems, the metadata architecture determines whether the entire platform succeeds.
Questions such as:
- centralized or distributed metadata?
- strong or eventual consistency?
- database-backed or custom metadata engine?
can have a larger impact than raw storage performance.
Where systems like RustFS fit
While exploring object storage architectures, one lesson has become increasingly clear:
The most interesting engineering problems are often not about storing data.
They are about coordinating information about data.
Systems like RustFS, MinIO, Ceph, and others all solve the fundamental challenge differently:
How do we maintain an accurate view of billions of objects across many machines while failures are constantly happening?
That question sits at the heart of modern storage engineering.
Key takeaway
When people think about object storage, they usually focus on where the data lives.
Experienced storage engineers often focus on something else:
Metadata is the real system. The stored objects are just the payload.
Once you understand that idea, many storage architecture decisions start to make much more sense.
Next in this series
Part 3: Strong vs Eventual Consistency in Distributed Storage (Without the Confusion)
We'll look at why two healthy nodes can sometimes disagree, and how storage systems decide what "truth" actually means.













