MongoDB in Production: Lessons Learned from Scaling to 1 Billion Documents

MongoDB's flexibility makes it easy to start with, but scaling to billions of documents requires careful planning. Here are the lessons we learned scaling a MongoDB deployment from prototype to production at scale.

Schema Design Matters

MongoDB being schema-less does not mean you should not design your schema. Embedding vs referencing decisions have massive performance implications at scale.

Embed when data is always accessed together and the embedded array does not grow unbounded
Reference when data is accessed independently or the related collection is large

Indexing Strategy

Every query should use an index -- full collection scans are death at scale
Compound indexes should follow the ESR rule: Equality, Sort, Range
Monitor index usage with db.collection.aggregate($indexStats) and remove unused indexes
Consider TTL indexes for time-series data that should auto-expire

Sharding

Choose your shard key carefully -- it cannot be changed without rebuilding the cluster. A good shard key has high cardinality, evenly distributed writes, and supports your most common query patterns.

Schema Design Matters

Indexing Strategy

Sharding

Get More Insights