Accueil > professional > mongoDB 9 months on – setting the stage

mongoDB 9 months on – setting the stage

At first, I was about to write a single entry on my 9 months of mongoDB use. However, the article started to grow tremendously, which made it slower to get written and harder to read. As such, I’ve decided to split in different parts:

Setting the stage

How I discovered mongoDB

My use of mongoDB came to me in a unexpected way. Indeed, the choice was made by our tech lead in order to figure out if mongoDB could fit some pretty demanding needs we have to fulfil some time in the future. As such, this post is a bit of retrospective about the lessons learned on this journey.

Reasons behind our mongoDB choice

The choice of mongoDB came from the need of combining large amount of data, normal queries as well as geo queries and full text queries all into one datastore. The reason for this is pretty simple: when you do your filtering and full text search in 2 distinct systems, you then have to merge the hits. Transporting all this ids can be an issue, especially if you have many of them but few present on both sides of the search: you could have spared come some way forth and back. When this merge implies some network in between, performances end up taking quite some hit.

This « collocation » of data filtering/fetching should for sure come with high performance and scalability in mind (notably in terms of continuity of service, read failover and the like).

When searching for a solution there, it looks like there are many contenders: most of the NoSql db looks like fitting from their descriptions, and even « old » RDBMs may fit.

For example, mysql + sphinx looked like a potential match, since sphinx provides full text searches and could be used for geo queries as well. We did try it and even used it for a full text search intensive project. Yet, we finally didn’t retain it. This was mainly due to:

  • its relative immaturity (lack of widespread usage/feedback, at least back in 2008/2009 when we looked at it),
  • some hits were missing in the results, compared to the current hits on the current system we’re about to replace. At the time we didn’t manage to explain this lack, while having partial results wasn’t « good enough » for us due to our use cases,
  • its relative complexity in terms of infrastructure (we wanted indexes and the like to be managed from our hibernate entities).

So, why mongoDB? The killer feature for us was the ability to search efficiently key words list. Indeed, geo queries aren’t, generally, hard to get. Massive volume of data looked like as well kind of manageable. However, full text search is a different matter. It’s commonly addressed by building a list of keywords (by stemming the text) and then looking into it on each search.
However, efficiently searching such a list of words isn’t a task most databases tackle with ease, rather the other way around. For example, it was the pain point of full text searches in PostgreSQL, making them slow in the end. Yet mongoDB proved blazing fast in the tests we did. 10gen’s focus on failover and the like did also ring a bell. On top of that, when proceeding further into putting it into practice, the document approach (spoken of in The good) proved really great.

Scope of the present series

Up to now, the project on which we use mongoBD implies a vast number of collections (mongoDB equivalent of a table) and interactions among some of them, which may not be the perfect use case for mongoDB. However, it required us to dig quite into it and provides some nice ground for comparison for more classic RDBMS. For the time we’re missing first hand experience on geo stuff and full text search (or rather keywords based one), so I won’t speak of them.

Although we’re using morphia on top of mongoDB + Java Driver, I will only speak of these 2 latest elements, for clarity sake (even if latest developments make it likely that some part of morphia will go into the driver). I might speak of morphia in some later post.

Disclaimer

This document recaps the different stuff discovered over this 9 months periods. Albeit I tried to update them at the time of writing (April 2011), some of these could easily be outdated when you read them. Do not hesitate to double check… and corrections are welcomed!

About corrections, I’m no guru, just a clueless Joe 🙂 So help is welcomed, as well as diverging opinions: I’ve almost no definitive ones on the matter, since inner workings and implications of databases design and implementations aren’t at all my everyday work.

Enough preamble for now, let’s see mongoDB 9 months on – The good!

Updates

This document was updated on the 29/04/2011, mainly thanks to discussion we had at the company after first publication of the article.

Publicités
Étiquettes : , , ,
  1. Aucun commentaire pour l’instant.
  1. No trackbacks yet.

Laisser un commentaire

Entrez vos coordonnées ci-dessous ou cliquez sur une icône pour vous connecter:

Logo WordPress.com

Vous commentez à l'aide de votre compte WordPress.com. Déconnexion / Changer )

Image Twitter

Vous commentez à l'aide de votre compte Twitter. Déconnexion / Changer )

Photo Facebook

Vous commentez à l'aide de votre compte Facebook. Déconnexion / Changer )

Photo Google+

Vous commentez à l'aide de votre compte Google+. Déconnexion / Changer )

Connexion à %s

%d blogueurs aiment cette page :