mongoDB 9 months on – setting the stage
At first, I was about to write a single entry on my 9 months of mongoDB use. However, the article started to grow tremendously, which made it slower to get written and harder to read. As such, I’ve decided to split in different parts:
- Setting the stage – the current article, introducing why and how I started to use mongoDB, which provides some context of the stuff spoken off after,
- The good – stuff I really appreciate with mongoDB,
- The bad – hurdles which could/should be fixed – mostly minor yet irritating points about mongoDB stack,
- mongoDB "Take it or leave it" technical choices – you have to be well aware of them,
- The good to know – side discoveries which doesn’t change the world but are better known before hand,
- Conclusion – a small attempt at concluding over all of this while taking a step back.
How I discovered mongoDB
My use of mongoDB came to me in a unexpected way. Indeed, the choice was made by our tech lead in order to figure out if mongoDB could fit some pretty demanding needs we have to fulfil some time in the future. As such, this post is a bit of retrospective about the lessons learned on this journey.
Reasons behind our mongoDB choice
The choice of mongoDB came from the need of combining large amount of data, normal queries as well as geo queries and full text queries all into one datastore. The reason for this is pretty simple: when you do your filtering and full text search in 2 distinct systems, you then have to merge the hits. Transporting all this ids can be an issue, especially if you have many of them but few present on both sides of the search: you could have spared come some way forth and back. When this merge implies some network in between, performances end up taking quite some hit.
This "collocation" of data filtering/fetching should for sure come with high performance and scalability in mind (notably in terms of continuity of service, read failover and the like).
When searching for a solution there, it looks like there are many contenders: most of the NoSql db looks like fitting from their descriptions, and even "old" RDBMs may fit.
For example, mysql + sphinx looked like a potential match, since sphinx provides full text searches and could be used for geo queries as well. We did try it and even used it for a full text search intensive project. Yet, we finally didn’t retain it. This was mainly due to:
- its relative immaturity (lack of widespread usage/feedback, at least back in 2008/2009 when we looked at it),
- some hits were missing in the results, compared to the current hits on the current system we’re about to replace. At the time we didn’t manage to explain this lack, while having partial results wasn’t "good enough" for us due to our use cases,
- its relative complexity in terms of infrastructure (we wanted indexes and the like to be managed from our hibernate entities).
So, why mongoDB? The killer feature for us was the ability to search efficiently key words list. Indeed, geo queries aren’t, generally, hard to get. Massive volume of data looked like as well kind of manageable. However, full text search is a different matter. It’s commonly addressed by building a list of keywords (by stemming the text) and then looking into it on each search.
However, efficiently searching such a list of words isn’t a task most databases tackle with ease, rather the other way around. For example, it was the pain point of full text searches in PostgreSQL, making them slow in the end. Yet mongoDB proved blazing fast in the tests we did. 10gen’s focus on failover and the like did also ring a bell. On top of that, when proceeding further into putting it into practice, the document approach (spoken of in The good) proved really great.
Scope of the present series
Up to now, the project on which we use mongoBD implies a vast number of collections (mongoDB equivalent of a table) and interactions among some of them, which may not be the perfect use case for mongoDB. However, it required us to dig quite into it and provides some nice ground for comparison for more classic RDBMS. For the time we’re missing first hand experience on geo stuff and full text search (or rather keywords based one), so I won’t speak of them.
Although we’re using morphia on top of mongoDB + Java Driver, I will only speak of these 2 latest elements, for clarity sake (even if latest developments make it likely that some part of morphia will go into the driver). I might speak of morphia in some later post.
This document recaps the different stuff discovered over this 9 months periods. Albeit I tried to update them at the time of writing (April 2011), some of these could easily be outdated when you read them. Do not hesitate to double check… and corrections are welcomed!
About corrections, I’m no guru, just a clueless Joe So help is welcomed, as well as diverging opinions: I’ve almost no definitive ones on the matter, since inner workings and implications of databases design and implementations aren’t at all my everyday work.
Enough preamble for now, let’s see mongoDB 9 months on – The good!
This document was updated on the 29/04/2011, mainly thanks to discussion we had at the company after first publication of the article.