MongoDB's flexibility makes it easy to start with, but scaling to billions of documents requires careful planning. Here are the lessons we learned scaling a MongoDB deployment from prototype to production at scale.

Schema Design Matters

MongoDB being schema-less does not mean you should not design your schema. Embedding vs referencing decisions have massive performance implications at scale.

  • Embed when data is always accessed together and the embedded array does not grow unbounded
  • Reference when data is accessed independently or the related collection is large

Indexing Strategy

  • Every query should use an index -- full collection scans are death at scale
  • Compound indexes should follow the ESR rule: Equality, Sort, Range
  • Monitor index usage with db.collection.aggregate($indexStats) and remove unused indexes
  • Consider TTL indexes for time-series data that should auto-expire

Sharding

Choose your shard key carefully -- it cannot be changed without rebuilding the cluster. A good shard key has high cardinality, evenly distributed writes, and supports your most common query patterns.

Get More Insights