Posts Tagged ‘mongodb’

mongoDB: 9 months on – conclusion

Article part of my « mongoDB 9 months » series:

First of all, if you made it till here, congratulations. I had much in my mind and I fear I wrote it too quickly, and even after some updates on different parts of the series I still feel poor with the way they’re written. I guess I’ll have to improve them over time. Hence, I would be eager to know how readers feel about it, so please, let me know!

Back to mongoDB, it’s a powerful tool, whose document orientation really help reducing the infamous impedance mismatch, thus greatly reducing mapping issues. Its indexation abilities, especially of « in document » lists, are also impressive, as are its performances overall.

This comes at the price of some radical technical choices: no join and no transaction (mostly).

Hence to me the following « rules » on when to use it:

  • Domain logic which can be mapped to document without needing relationships: go for it, mongoDB is the perfect match. Its high performance will do wonders. And since documents can embed so much, it actually means more than one could expect.
  • Domain logic with relationships and few if any transactional needs, where performance matters. mongoDB should be looked at. One should first carefully consider the handling of relationships. A proof of concept for it is a must have I think. The transactional needs are a different beast: one should make sure they could fit somehow in mongoDB non existing support for such things. And remember it’s highly unlikely 10gen will ever introduce new features there, due to its (potential) impact on performances. This actually leads me to the third point:
  • Domain logic with lot of needs for transactional operations: mongoDB doesn’t fit.

For sure, these are only my humble opinions and I’m eager to hear other ones.

Some side notes comes to my mind as well:

  • if ever a real need for transaction pops up, it won’t be solved for free. A solution will have to be found. For example, it could the limited 2 stage commit (more on it in the No transaction & limited ACID chapter). It could also be some other mechanism for ensuring transactional matters (some people went to mysql for this, but one could also consider something like JTA or, rather, multiverse). Finally, one could split the data in two between a « transactional able » DB and mongoDB. Anyway, the bottom line stays: transactions won’t come for free, whereas it’s a feature which is usually taken for granted due to the massive use of « traditionnal » RDBMS. This leads me to the next point:
  • Plan for quite some explanation about mongoDB choice. From JSON content and queries to document approach, all this without join and transaction, mongoDB has a lot which differs and which will, for sure, unsettle and disturb. It’s true for the developers, going away from their safe RDBMS land (« ugh, where is my transaction? »), to admin (« what, I can’t export to an CVS file? ») to business analyst/product lead with a past history in IT or some knack at it. They’ll all be surprised! Take the time to consider and win their heart & mind properly.
  • Plan to keep looking at mongoDB progress: new features will come often and bugs get fixed. One has to follow carefully not to miss some of them.
  • Finally, a point which I haven’t really spoken of it, but which matters as well: watch for your driver/mapping framework. It would be a pity for it to be the limiting factor, yet at the same time the limitation (mostly regarding type info) and flexibility of JSON makes the issue pretty hard, at least on the mapping side. On top of this, refinements like first and second level caches are still welcome performance booster. Then some support for relationships and their management would also be welcome. In the end, the driver business isn’t straightforward.

On a more personal matter, retrospectively, I’m also wondering if the approach of putting write and read all into (one) mongodb is the right approach. Indeed, de normalization means that, quickly, some decisions will be taken based on de normalized data which could be stale. Even without going so far, the documents can quickly end up being pretty messy, with de normalized content for specific views and the like. And still is lurking in the background the lack of transaction: how to be sure all potentially failing multiple updates in a row are well handled?

As such, my current interest in CQRS and EDA, which states the separation of view database and the write one, rings a bell. Indeed, mongoDB makes a perfect fit for the view database: it can handle both full text search and complex queries, yet being quite flexible in terms of mapping for your views. On the other hand, the write database could stick to some RDBMS able of join and transaction, where needed (which in CQRS should be less than in traditional « dump in all into the RDBMS » approach). Sure it may involve extra work, but if you choose mongoDB for write you weren’t afraid, most likely, of extra work anyway. And still, the clarity and flexibility given might well pay off quickly. Yet this is just wild thoughts: I hadn’t any occasion to test them, even if a akka/scala for events, some RDBMS for write and mongoDB for view feels like truly appealing to me.

Actually, I would also love a document oriented db for the write part. Basically mongoDB with better relationship support (from integrity constraints to join) and transaction would be perfect. Yet, while the relationship support can and is likely to improve over time, the transaction aspect feels like way more remote. It simply doesn’t match with the current performance minded approach and, furthermore, would imply a massive amount of changes… Pity!

Before we part, let me thank a hell of a lot codesmell, my tech lead, who has always been eager to endure my lengthy questions and talks on mongodb and related matters. Without him the current series wouldn’t have seen light, it’s as simple as this!

And don’t forget: I more than welcome your view on this series!

Étiquettes : , ,

mongoDB: 9 months on – The good to know

Article part of my « mongoDB 9 months » series:

Anyway, let’s have a look at « The good to know ».

Documents are (mostly) large beasts, and we don’t care

A document encompasses all related concepts, as well as potentially some de normalized data.

As such, a document can easily have its own content, plus a list of Cities, with each city having actually most of its content. Indeed, having a list in a document is normal in mongoDB, and while it changes from SQL one shouldn’t be surprised about it. In fact, mongoDB is even very good at indexing such lists. Then, for query/sorting purposes, de normalization kicks in quickly. While developing, one often finds oneself adding more and more into some document, which can then feel like « big » and, even worst feeling, growing quickly.

But then, it mostly doesn’t matter: if you’re only after one document, then normally its size shouldn’t matter much. mongoDB serves data fast and the network between DB server and application server should be blazing fast anyway. On top of that most likely the document « contains it all »: additional query for some other documents should seldom happen.

For list, one uses field selection. Then no matter how big some documents grow, you’ll only fetch what you need, making the size matter irrelevant.

Next to the query aspects, mongoDB has also some very good compression of the data stored and a high upper size limit for documents, of 16Mo in 1.8. This is huge! Our biggest document don’t come close from this. Actually, apart from embedding binary data (like images and the like), I hardly see how content made of string and numbers can take more than 16Mo when compressed. This must be huge and very unusual. So don’t worry: if you need some data in a document, put it in. Don’t try to reduce its size, for example by shortening the JSON keys. It’s premature optimization at best (and purely lost time most probably): press forward and enjoy!

Read/write query routing not predefined

While mongoDB provides some nice master/slaves asynchronous replication, query routing is done solely on the application server side, with a default of all queries to the master, on a per query basis. This is IMHO quite error prone and thus, if needed, requires either some carefully done queries or tools. Distributed read queries are thus to handle carefully and not directly « out of the box ».

About performance

mongoDB has very good performance. We did some tests and were amazed. Yet, retrospectively, this seems logic: mongoDB has no join and no transaction. All the complexity of joining is absent. No cartesian products and the like. The lack of transaction also significantly reduces lock and versions handling. No such thing as a Multiversion concurrency control engine. This almost has to be fast. Performance is almost the sole reason of all these ACID/transaction restrictions, so make good use of it!

Yet, while all true and well, let’s not forget than better than a (fast) DB query is… no query at all! At such, application server side first level and second level caches still provide welcomed performance boost and should be thought of, especially if some relationships are planned. They indeed save quite some additional and sequential queries, a danger always lurking around in mongoDB.

Étiquettes : ,

mongoDB: 9 months on – « Take it or leave it » technical choices

mai 11, 2011 1 commentaire

Article part of my « mongoDB 9 months » series:

Anyway, let’s have a look at mongoDB « Take it or leave it » technical choices.

No join and relationships

Coming from a key value store background, mongoDB provides no equivalent of RDBMs relationships. Actually, the only help provided is the DBRef. However, DBRefs are very limited: it’s a convention to store a collection name and a document identifier. They don’t behave at all like foreign keys: mongoDB keeps no count/trace of them and a document can be deleted whatever the DBRefs pointing at it.

Actually, query can act upon only one collection type at a time. It means it’s possible to have multiple documents from the same collections in one query but fetching 2 documents of 2 different collections requires 2 queries. Be it for retrieval or only to take part in some query, joining is simply not available.

In more details, when displaying the document, it means as many queries as distinct collection types referenced. Let say you have a building collection and you want to display it and where it is. Maybe for the location you have the City, Country and Region. Then you have to query the City, Country and Region collections one a a time.

Doing four queries instead of one for displaying a document might sound « not perfect but bearable ». However, the matter get worse if you want to display a list of buildings and their locations. There, for each building, you would have to query 3 times on top. For a list of 20 buildings, you would end up with 61 queries. This most likely won’t fit anymore.

Starting from there, a long list of improvements is possible:

  • You could have some kind of unit of work on the client side in order to avoid to request twice the same document. While this again involves some tooling, you can’t rely on it to make queries fast enough, since they may all reference different locations as well as referencing many other documents in your database.
  • Going further on the unit of work business, one could have a second level cache, and do the relevant caching on the application server side. As such one could avoid hitting the database for each document, skipping the network overhead and potentially some serialization/deserialization on the way.
  • Another improvement would be to somehow collect first all the City, Country and Region ids, in order to do only 3 extra queries on top, then to rewire the fetched locations to their original buildings. While doable, this once again requires extra tooling
  • Depending on the use case, loading lazily the referenced documents could also be of some help.

In the end, each of these improvements requires some server side tooling, and sometime some serious one. Applying them all isn’t this trivial neither. In the case of the list of entities, lazy loading of references could in fact be a real downer (if lazy loads happen a lot).

These options are even better combined actually. An unit of work extending over the lazy loading, for example, sounds like the way to go. Similarly for the second level cache.

On top of that, the query to get the building list in the first place matters as well. It’s not just about fetching entities.

Indeed, when creating the building/location list, you may want to order by countries and cities names. You most likely want to offer pagination as well, to make it all user friendly. If you stick to plain DBRefs and the like, it means:

  • fetching all the buildings of interest (and not only the 20 first – hopefully you don’t have too many of them),
  • for all of these buildings, fetching in one row all the relevant countries and then all the relevant cities,
  • sorting the original list of building according to the names of the countries and cities fetched,
  • displaying only the 20 first.

On the way, you’ll also need to use field selection.

However, field selection/filtering comes with its issue as well. Indeed, in the given example, you may have to first work on all buildings with filtered out fields, but then for the 20 remaining you may have to fetch them fully, adding one more query on the way.

Overall, the improvements to tackle the lack of joins are quite something to build, use and maintain properly. Furthermore, in the end, the number of queries might still be too important, depending on the use case.

In the end, using only documents and references won’t always fit. Actually, you need to de-normalize your documents to embed the relevant part of the referenced documents. With the list of buildings example, it would most probably mean having the country and city names stored with each building. As such ordering and limiting can be done in one query.

However, all this now de-normalized data requires special cares. If the referenced city name changes, you would like the de-normalized names in building collection to reflect this, isn’t it? So, how to do it?

Well, first of all, that’s where, IMHO, mongoDB documentation lacks by quite some extend. The documentation and « company line » is fairly easy. One can sum it up in 4 words: « you should rather embed », as seen in their Schema design page. If you’re to de-normalize, you’re on your own. From what I got/read, they didn’t provide more hints on how to proceed there because it’s a client side matter with many options. Yet, personally, I would have love at least some presentation of the main ones. No guidance at all is really not enough.

Currently, I see these options:

  • ad hoc de normalization support: when saving/updating/deleting some Foo entity, one explicitly goes at linked documents and updates there the de normalized data. Since synchronous when saving, this solution is only possible on small scales. It is usually the first approach.
  • self made tools for update propagation: you somehow have some way of declaring that the Bar entities contain de normalized content of the Foo entities. The tools in turn take care of propagating changes. This approach scales better than the first one. It works well as long as no complex logic is needed to build the de normalized content. In this case, one has to switch to the 3rd option which follows. Furthermore, one may want to avoid polluting some documents with plenty of use cases specific de normalized data. Again, the 3rd option might be the way to go.
  • pre aggregation for some specific query/views. The idea there is to build a document for a given query/view in order to fulfill it in one query. The document’s content is then highly specific and efficient. Tools are still required to maintain it and likely these pre aggregated query/view can do with different update delays (some might be ok with night batch for example).

The 2 last options need some sort of modification/insertion/deletion propagation. Ideally, some kind of trigger would be welcomed, but they aren’t available in mongoDB. It means you’ve to do it yourself application server side.

Regarding tooling for de-normalization, you’re on your own as well. As said before, DBRefs is the only « tool » provided. You won’t get a database provided list of documents referencing some other one. There’s no tool to propagate update or handled deletion.

While not impossible to achieve, de-normalization handling requires skills and time to do right. This extra work on the client side isn’t common in database work. Having to do such a update/deletion propagation mechanism is not trivial, especially with the limited toolset available.

While not casting a shadow over mongoDB as a whole, the complexity of this extra work shouldn’t be underestimated . You should really plan to develop your de-normalization tools early and test them thoroughly to know how it performs and what you can expect from them. Same stands for the improvements (first/second level cache, lazy loading, collect and then fetch strategy) you may choose. And don’t forget mongoDB – The bad, because some other current limitations of mongoDB may bit you as well: better find them before having to deploy some update propagation mechanism in a hurry for a business critical reason!

No transaction & limited ACID

Due to the lack of transaction, mongoDB provides limited support for ACID:

  • atomicity applies only for changes on one document. If your update batch affecting 10 documents fails somewhere, some of the document will be updated but not the others. Rollbacking the already changes is left to the application server.
  • consistency isn’t present either. A long running update could see new documents inserted somewhere in the collection and not affect them. A failed batch update leaves the database in a inconsistent as long as some one doesn’t clean it.
  • finally, isolation is not present at all. Ongoing updates can be seen in between of their execution by other operations.

Some implementations and API choices privileging performances can also have unexpected behaviors. For example, when listing all the documents of a collection, having twice the same is possible with the default settings.

To me, usually, transaction implies mostly two things:

  • a certain level of atomicity of multiples operations: you can make sure some operations won’t be seen before their are fully completed, avoiding stale data to be taken in account anywhere,
  • an « easy » safety net: without thinking of it, if one operation of a transaction fails then all of them will be roll backed. It comes « out of the box ».

For the atomicity part, while the documentation provides some workarounds, namely through a two phase commit approach, this approach comes with its price: complexity and ad hoc aspect of it, since it needs to be done each time you need such 2 phase commit). This solution also is limited to update of documents of the same collection (and queries not involving other collection neither).

IMHO, mongoDB simply doesn’t fit if you have many update involving related documents, like the classic Account transfer example. And while having transaction and some ACID properties for sure doesn’t come for cheap in RDBMs neither, it can still be done there when needed, and without this many hoop jumping. Actually, having read the two phase commit page, I almost feel compelled of saying that transactions are possible even in high load environment. Indeed, I know of some finance company having transactions with read committed isolation level for write on their DB2, even for the tables handling stock exchanges back office operations, so with a hell of a lot of transactions. Sure, it ain’t be cheap. But possible.

Regarding the safety net part, it’s still somehow doable. Indeed, with the proper write concern. Indeed, the SAFE write concern makes db operation synchronous and throws exception in case of issue. As such, one can and should care for this application server side. In case of multiple operations in a row, one should then handle roll backing them, manually. However, while possible, this isn’t always easy. Indeed, one doesn’t necessarily know which documents where actually updated, in the failing operation or some before. As such, properly roll backing doesn’t come for free, and might even, depending on the use case, be fairly hard to achieve.


On the 27/05/2011: enumeration of the possible de normalization strategies.

On the 06/06/2011: introduction of the part about the isolation and safety net aspects of transactions.

Étiquettes : , , ,

mongoDB 9 months on – the bad

avril 18, 2011 1 commentaire

Article part of my « mongoDB 9 months » series:

The bad – hurdles which could/should be fixed

Bad surprises in the query language

I guess our project is a bit more complex that what most people do with mongoDB. Indeed, we quickly needed quite complex (and dynamic) queries, which where impeded by various limitations in the current query language:

In the end, not all the query logic was expressible in queries, which had many time consuming consequences:

    • some queries were reconsidered, either by changing them slightly or, even, adding more denormalized content to make them possible,
    • for the more complex ones, we had to resort to partial queries on the database which are then « completed » on the server side. This is pretty bad since the logic is then scattered along and ends up quite hard to follow.
    • dynamic queries, built on top of some kind of query object can easily end up failing (not applying all the logic they contain) without us having any clue of it, apart from some testers/users spotting broken data…

JavaScript: best to be avoided

Queries can embed some JavaScript part, and one can also write Map/reduce batch or even « direct » functions, through db.eval, all with JavaScript.

However, some 10gen employees said at some mongoDB conference that this inclusion of JavaScript to be executed on the database wasn’t really the best decision they did, and indeed JavaScript suffers of several issues:

  • all JS runs on only one thread per server
  • JavaScript can even « stop the world »
  • it is hard to debug: the console is picky on the characters it gets (no tab for example) and the scripts/batch/db.eval have no proper development environment to speak of: figuring out what goes wrong is really hard, be it either some dumb syntax error or more serious logic issue, which would have required debugging (which isn’t possible)
  • JavaScript is slow because it isn’t optimized, even if V8 is investigated
  • no JavaScript exception handling: the script just crashes silently, meaning it’s all to the developer to handle it properly.

In the end, JavaScript is really not appropriate for synchronous queries, since all queries are queued in one thread, quite a performance bottleneck. And even asynchronous JavaScript is hard to rely on, since it’s hard to write in the first place and then to monitor. Worst, some current restrictions might be « used » in the code, making it not proper once proper JavaScript support is there.

The only good point there is, after all, that splitting your application logic between different servers (database and application) as well as languages isn’t really recommended anyway. Still, it could have been handy sometime, especially to workaround some query language restrictions.

Java Driver issues

First of all, the Java Driver code isn’t the best I’ve ever seen.
It’s rather a confused one IMHO. Debugging through isn’t always a good/easy experience. Similarly, the driver has its rough edges, for example this Make the Mongo class proxy safe issue which triggered some unexpected behaviour at some point.
In the end, it sometimes feels like the code was written by good « low/kernel level » non Java developers. This can be sometime a bit unsettling for someone like me, used to many higher level Java frameworks.

A big good point though: when having to dig in for performance matters (hence this post a while ago: Multithreaded performance testing checklist), no contention were spotted, neither no big gotcha were spotted, so the driver feels like doing its job.

One last point about it: we weren’t thrilled by its performance. Its performance, when reading some text content, were in the same ballpark as the mysql driver ones. They degrade in a linear way with the number of concurrent threads up to the driver machine number of cores and then way worse. Most likely some compression of wire protocol will help, but somehow we were expecting better. Memcached java driver for example behaves differently: requests take the same time whatever the number of threads up to the number of cores. On top, a single request was significantly faster than the mysql/mongodb driver one (with the same text content).

Still, it doesn’t mean much of the overall performance of mongoDB, where its architecture helps a lot. I’ll discuss this point more in the About performance chapter of the « The good to know » article, not ready to go yet.

That’s all for now folks, the series next part, mongoDB « Take it or leave it » technical choices, will be published soon!

Étiquettes : , , ,

mongoBD: 9 months on – the good

Article part of my « mongoDB 9 months » series:

The good

Document orientation rocks

mongoDB is document oriented. It means its tables, named collections, contain documents, which themselves are kind of loosely defined.

Indeed, instead of having one definition of the structure for the whole collection, each document embeds its own description, in a hierarchical form. The various documents’ hierarchy descriptions don’t have to match at all. In fact, each document in the same collection could have a completely different hierarchy. However, for consistency sake and querying, they usually share some common structure, even if every document can diverge at any point.

This « document definition » is done through JSON data, so key value pairs which can be nested. For example, it can be something like this:

{ author: 'clueless joe',
  created : new Date('04/10/2011'),
  title : 'Yet another blog post',
  text : 'Here is the text...',
  tags : [ 'example', 'joe' ],
  comments : [ { author: 'jim', comment: 'I disagree' },
              { author: 'nancy', comment: 'Good post' }

The square brackets represent a collection.

Overall, the document approach really feels like a perfect mapping for the Aggregate root pattern of Domain Driven Design. As such, related elements are put all together, without dispersion around the database like proper database normalisation would often dictate.

As such, the mapping is potentially very close from your application server side data representation, even if it differs slightly in some parts. Relationships are also way less an issue: most of them are now directly put in the document itself, either through some list or even map. One doesn’t have any more to wonder where to put this lazy relationship and the like: being part of the same document, they’re all loaded together by default, in one non expensive query. Similarly, no issue with sorted content and the like: it’s just a matter of writing the data down in the proper order. In the end, writing a document is really a breeze and is done quickly.

Furthermore, mongoDB queries are really flexible and able to pick up only the documents with the relevant structure and content. One doesn’t have to care for missing part of the structure: these will just be skipped. Writing queries ends up being quite easy, once the syntax becomes familiar.

All in all, this document approach feels like able to reasonably match Object Oriented Design, getting way fewer in the way than traditional RDBMS.

Fast evolution

mongoDB is evolving fast. In 9 months, I think we moved from version 1.6 to 1.8, as well as 2.2 to 2.5 for the Java Driver.

This means lot of new features where added, some of which we made good use in between, which is always good.

Even more important, it shows that 10gen sticks to its words of fast paced development, which is especially nice since they’re considering new features rather frequently, when need arise.

In a broader view, 10gen puts a lot of effort into mongoDB, from the worldwide conferences to the mailing list and support, which is really nice.

As you might guess, though, it’s not all rosy: announce, communication and documentation of new features are sometimes lacking. Similarly, the scope of mongoDB extends each time a bit, making harder to get a proper full picture, especially since I haven’t some « roadmap-ish » kind of document (at least to my knowledge) which would help to figure out where it’s heading. For example, I discovered a lot of upcoming stuff when digging in for this article, mostly from the Jira. I would prefer them to be as well summed up somewhere…

Overall, though, I still think this fast paced development is good and makes me able to trust the mongoDB staff when they say something will come in some later release: it won’t be after I’m dead ;).

The mongoDB 9 months on series continues: The bad – hurdles which could/should be fixed.

Étiquettes : , , ,

mongoDB 9 months on – setting the stage

At first, I was about to write a single entry on my 9 months of mongoDB use. However, the article started to grow tremendously, which made it slower to get written and harder to read. As such, I’ve decided to split in different parts:

Setting the stage

How I discovered mongoDB

My use of mongoDB came to me in a unexpected way. Indeed, the choice was made by our tech lead in order to figure out if mongoDB could fit some pretty demanding needs we have to fulfil some time in the future. As such, this post is a bit of retrospective about the lessons learned on this journey.

Reasons behind our mongoDB choice

The choice of mongoDB came from the need of combining large amount of data, normal queries as well as geo queries and full text queries all into one datastore. The reason for this is pretty simple: when you do your filtering and full text search in 2 distinct systems, you then have to merge the hits. Transporting all this ids can be an issue, especially if you have many of them but few present on both sides of the search: you could have spared come some way forth and back. When this merge implies some network in between, performances end up taking quite some hit.

This « collocation » of data filtering/fetching should for sure come with high performance and scalability in mind (notably in terms of continuity of service, read failover and the like).

When searching for a solution there, it looks like there are many contenders: most of the NoSql db looks like fitting from their descriptions, and even « old » RDBMs may fit.

For example, mysql + sphinx looked like a potential match, since sphinx provides full text searches and could be used for geo queries as well. We did try it and even used it for a full text search intensive project. Yet, we finally didn’t retain it. This was mainly due to:

  • its relative immaturity (lack of widespread usage/feedback, at least back in 2008/2009 when we looked at it),
  • some hits were missing in the results, compared to the current hits on the current system we’re about to replace. At the time we didn’t manage to explain this lack, while having partial results wasn’t « good enough » for us due to our use cases,
  • its relative complexity in terms of infrastructure (we wanted indexes and the like to be managed from our hibernate entities).

So, why mongoDB? The killer feature for us was the ability to search efficiently key words list. Indeed, geo queries aren’t, generally, hard to get. Massive volume of data looked like as well kind of manageable. However, full text search is a different matter. It’s commonly addressed by building a list of keywords (by stemming the text) and then looking into it on each search.
However, efficiently searching such a list of words isn’t a task most databases tackle with ease, rather the other way around. For example, it was the pain point of full text searches in PostgreSQL, making them slow in the end. Yet mongoDB proved blazing fast in the tests we did. 10gen’s focus on failover and the like did also ring a bell. On top of that, when proceeding further into putting it into practice, the document approach (spoken of in The good) proved really great.

Scope of the present series

Up to now, the project on which we use mongoBD implies a vast number of collections (mongoDB equivalent of a table) and interactions among some of them, which may not be the perfect use case for mongoDB. However, it required us to dig quite into it and provides some nice ground for comparison for more classic RDBMS. For the time we’re missing first hand experience on geo stuff and full text search (or rather keywords based one), so I won’t speak of them.

Although we’re using morphia on top of mongoDB + Java Driver, I will only speak of these 2 latest elements, for clarity sake (even if latest developments make it likely that some part of morphia will go into the driver). I might speak of morphia in some later post.


This document recaps the different stuff discovered over this 9 months periods. Albeit I tried to update them at the time of writing (April 2011), some of these could easily be outdated when you read them. Do not hesitate to double check… and corrections are welcomed!

About corrections, I’m no guru, just a clueless Joe 🙂 So help is welcomed, as well as diverging opinions: I’ve almost no definitive ones on the matter, since inner workings and implications of databases design and implementations aren’t at all my everyday work.

Enough preamble for now, let’s see mongoDB 9 months on – The good!


This document was updated on the 29/04/2011, mainly thanks to discussion we had at the company after first publication of the article.

Étiquettes : , , ,

Quest for a MongoDB ubuntu admin ui: quick feedback

octobre 20, 2010 Laisser un commentaire

EDIT: since I wrote this article, Antoine Girbal released a promising admin ui, JMongoBrowser. First experiences with it good. Let’s see! 🙂
As a side note, I’m also writting a series on my mongoDB experience, so feel free to have a look and comment!


lastly, I’ve been looking for a better than command line admin ui for MongoDB. The reason for that is simple: large documents, spanning on more than a console screen, are a pain to work with in the console. You have to heavily scroll, there’s no « content » search on the document output, filter means extra query with no auto completion.

The MongoDB website offers a list of admin ui, but with no details of them. So one has to figure out which one to use.

As an ubuntu user, all non linux platform solutions were out, making the list shorter, and letting me mostly with php based web admin ui. I narrowed it down to rockmongo and phpmoadmin, thanks i part to Mitch Pirtle, who told me as well he was working on some joomla based admin ui, not yet ready for prime time. Still nice to know and maybe something to look at later.

Installation of the php server and mongo php driver was pretty straightforward, once I found this blog entry: How To : Install PHP MongoDB (mongo) Driver on Linux (Ubuntu). One extra note about the installation process: the blog entry isn’t cristal clear on the need to add the line « » (without the double quotes) to the php.ini file.

Installations of rockmongo and phpmoadmin were as easy, you just have to drop the extracted files in /var/www, preferably in new folders. As we’ll see later, memory consumption by the php pages is quickly an issue, so maybe you are better increasing upfront the memory_limit of the php.ini file. Personnaly I went for 128Mo.

So, let’s about the 2 ui in details.

  • rockmongo 1.0.8: the interface is pretty user friendly. Accessible list of databases, easy to browse to the right collection. Pagination by default for the collection content. Diverse admin tools like master/slave, import/export and the like.
    However, the ui doesn’t call nicely. Browsing large documents is just damn slow. Actually, at first, it was simply not working, not enough memory failure. And then one couldn’t even do a simple query on the collection, since this was part of the page crashing. With 128Mo, it’s just dead slow. Furthermore, it burns resources like hell: a chromium 6 instance with just a tab on rockmongo took frequently (on the large document browsing) as much as… 1.1 gig of ram!!
  • phpmoadmin 1.0.8: as they put it, it’s « Built on a stripped-down version of the Vork high-performance framework ». They really mean it: the ui allows to do one thing at a time, no more no less. To see the other collections from one collection, one has to navigate away from the current page.
    On the large document collection, the memory error was show as well yet the query interface was still accessible: it’s still possible to go at just one doc and display it. On top of that, one can easily limit the result (by default the 100 items of my collection were displayed, or at least it tried to display them but ran out of mem in the middle of it). Limiting the page to 10 made it usable.

Overall, the scaling issue of rockmongo takes it out, even if the ui looks and feels better. So I’m left with phpmoadmin for now. Let’s see how it withstands the test of daily use!


Étiquettes : , , ,
%d blogueurs aiment cette page :