mongoDB 9 months on – the bad
Article part of my "mongoDB 9 months" series:
- Setting the stage – introducing why and how I started to use mongoDB, which provides some context of the stuff spoken off after,
- The good – stuff I really appreciate with mongoDB,
- The bad – hurdles which could/should be fixed -the current article, mostly minor yet irritating points about mongoDB stack,
- mongoDB "Take it or leave it" technical choices – you have to be well aware of them,
- The good to know – side discoveries which doesn’t change the world but are better known before hand,
- Conclusion – a small attempt at concluding over all of this while taking a step back.
The bad – hurdles which could/should be fixed
Bad surprises in the query language
I guess our project is a bit more complex that what most people do with mongoDB. Indeed, we quickly needed quite complex (and dynamic) queries, which where impeded by various limitations in the current query language:
- $or cannot be nested.
- no $and operator, only an implicit one, with a subtle and quite hidden behaviour: if same property path used twice, only one is taken in account
- the $ positional operator doesn’t handle nested collections
In the end, not all the query logic was expressible in queries, which had many time consuming consequences:
- some queries were reconsidered, either by changing them slightly or, even, adding more denormalized content to make them possible,
- for the more complex ones, we had to resort to partial queries on the database which are then "completed" on the server side. This is pretty bad since the logic is then scattered along and ends up quite hard to follow.
- dynamic queries, built on top of some kind of query object can easily end up failing (not applying all the logic they contain) without us having any clue of it, apart from some testers/users spotting broken data…
JavaScript: best to be avoided
Queries can embed some JavaScript part, and one can also write Map/reduce batch or even "direct" functions, through db.eval, all with JavaScript.
However, some 10gen employees said at some mongoDB conference that this inclusion of JavaScript to be executed on the database wasn’t really the best decision they did, and indeed JavaScript suffers of several issues:
- all JS runs on only one thread per server
- JavaScript can even "stop the world"
- it is hard to debug: the console is picky on the characters it gets (no tab for example) and the scripts/batch/db.eval have no proper development environment to speak of: figuring out what goes wrong is really hard, be it either some dumb syntax error or more serious logic issue, which would have required debugging (which isn’t possible)
- JavaScript is slow because it isn’t optimized, even if V8 is investigated
- no JavaScript exception handling: the script just crashes silently, meaning it’s all to the developer to handle it properly.
In the end, JavaScript is really not appropriate for synchronous queries, since all queries are queued in one thread, quite a performance bottleneck. And even asynchronous JavaScript is hard to rely on, since it’s hard to write in the first place and then to monitor. Worst, some current restrictions might be "used" in the code, making it not proper once proper JavaScript support is there.
The only good point there is, after all, that splitting your application logic between different servers (database and application) as well as languages isn’t really recommended anyway. Still, it could have been handy sometime, especially to workaround some query language restrictions.
Java Driver issues
First of all, the Java Driver code isn’t the best I’ve ever seen.
It’s rather a confused one IMHO. Debugging through isn’t always a good/easy experience. Similarly, the driver has its rough edges, for example this Make the Mongo class proxy safe issue which triggered some unexpected behaviour at some point.
In the end, it sometimes feels like the code was written by good "low/kernel level" non Java developers. This can be sometime a bit unsettling for someone like me, used to many higher level Java frameworks.
A big good point though: when having to dig in for performance matters (hence this post a while ago: Multithreaded performance testing checklist), no contention were spotted, neither no big gotcha were spotted, so the driver feels like doing its job.
One last point about it: we weren’t thrilled by its performance. Its performance, when reading some text content, were in the same ballpark as the mysql driver ones. They degrade in a linear way with the number of concurrent threads up to the driver machine number of cores and then way worse. Most likely some compression of wire protocol will help, but somehow we were expecting better. Memcached java driver for example behaves differently: requests take the same time whatever the number of threads up to the number of cores. On top, a single request was significantly faster than the mysql/mongodb driver one (with the same text content).
Still, it doesn’t mean much of the overall performance of mongoDB, where its architecture helps a lot. I’ll discuss this point more in the About performance chapter of the "The good to know" article, not ready to go yet.
That’s all for now folks, the series next part, mongoDB "Take it or leave it" technical choices, will be published soon!
-
avril 19, 2011 à 6:03 | #1mongoDB 9 months on – the bad « Clueless Joe | Neorack Tutorials