Cloud thinking : Data in 3rd denormalised form

One schema to rule them all and in the darkness bind them.

That’s the SQL way. One schema, many tables, be efficient with space. Join, don’t duplicate. Third normal form and structured data. More structure.

But that’s not always the best way to scale. How important is it that everything is identical everywhere? How important is history? When a customer moves house, do you want to know the difference between the address they want their next order delivered to and the address the last order was delivered to?

When you let go of SQL and joins, and when storage is cheap, does it matter if you have multiple copies, so long as each document has the right copy for itself?

When you aggregate data, when your change log turns multiple orders into 1 delivery schedule for your driver, does it matter that the driver has their own copy for when they lose signal? Does it matter that there’s a copy of the addresses in the route that will be deleted tomorrow because it serves no more purpose?

If storage is cheap and joins and transmissions are expensive, wouldn’t you want to cache data and trade storage for speed?

Shouldn’t you embrace heterogeneous data? The address on your payment looks like the address on the order, but it serves a different purpose, it’s embedded in a different context, and it has a different lifecycle.

It’s sometimes hard to change your mental model, but it can save you from doing the wrong thing.

Cloud thinking : Data in 3rd denormalised form

Tags

Author

Stats

Published