mongoDB: 9 months on – The good to know
Article part of my "mongoDB 9 months" series:
- Setting the stage – introducing why and how I started to use mongoDB, which provides some context of the stuff spoken off after,
- The good – stuff I really appreciate with mongoDB,
- The bad – hurdles which could/should be fixed – mostly minor yet irritating points about mongoDB stack,
- mongoDB "Take it or leave it" technical choices
- The good to know – side discoveries which doesn’t change the world but are better known before hand – the current article,
- Conclusion – a small attempt at concluding over all of this while taking a step back.
Anyway, let’s have a look at "The good to know".
A document encompasses all related concepts, as well as potentially some de normalized data.
As such, a document can easily have its own content, plus a list of Cities, with each city having actually most of its content. Indeed, having a list in a document is normal in mongoDB, and while it changes from SQL one shouldn’t be surprised about it. In fact, mongoDB is even very good at indexing such lists. Then, for query/sorting purposes, de normalization kicks in quickly. While developing, one often finds oneself adding more and more into some document, which can then feel like "big" and, even worst feeling, growing quickly.
But then, it mostly doesn’t matter: if you’re only after one document, then normally its size shouldn’t matter much. mongoDB serves data fast and the network between DB server and application server should be blazing fast anyway. On top of that most likely the document "contains it all": additional query for some other documents should seldom happen.
For list, one uses field selection. Then no matter how big some documents grow, you’ll only fetch what you need, making the size matter irrelevant.
Next to the query aspects, mongoDB has also some very good compression of the data stored and a high upper size limit for documents, of 16Mo in 1.8. This is huge! Our biggest document don’t come close from this. Actually, apart from embedding binary data (like images and the like), I hardly see how content made of string and numbers can take more than 16Mo when compressed. This must be huge and very unusual. So don’t worry: if you need some data in a document, put it in. Don’t try to reduce its size, for example by shortening the JSON keys. It’s premature optimization at best (and purely lost time most probably): press forward and enjoy!
While mongoDB provides some nice master/slaves asynchronous replication, query routing is done solely on the application server side, with a default of all queries to the master, on a per query basis. This is IMHO quite error prone and thus, if needed, requires either some carefully done queries or tools. Distributed read queries are thus to handle carefully and not directly "out of the box".
mongoDB has very good performance. We did some tests and were amazed. Yet, retrospectively, this seems logic: mongoDB has no join and no transaction. All the complexity of joining is absent. No cartesian products and the like. The lack of transaction also significantly reduces lock and versions handling. No such thing as a Multiversion concurrency control engine. This almost has to be fast. Performance is almost the sole reason of all these ACID/transaction restrictions, so make good use of it!
Yet, while all true and well, let’s not forget than better than a (fast) DB query is… no query at all! At such, application server side first level and second level caches still provide welcomed performance boost and should be thought of, especially if some relationships are planned. They indeed save quite some additional and sequential queries, a danger always lurking around in mongoDB.