database - The Road to Silicon Valley

You may recall my previous post on MongoDB and how powerful it is as an alternative to a relational database. Since then I’ve had a bunch of discussions with other software engineers around this space and even met up with MongoDB core engineer Chris Westin from 10gen at Red Rock to gain further insights into MongoDB.

New kick ass GUI for MongoDB

Chris introduced me JMongoBrowser written by Antoine Girbal (10gen engineer). It’s written in Java so you can run it on Linux, Windows and Mac OSX. So far this GUI has proven to be a success and fills the holes where MongoHub couldn’t. Out goes MongoHub and in goes JMongoBrowser.

Fast crash recovery using Journaling

MongoDB uses memory 1st to write data to vs directly to file/store. This is where huge performance gains are attained. It also has Journaling, a write-ahead for operations to facilitate fast crash recovery in the storage engine. This means the stuff in memory is stored in a log incase your server goes down without affecting MongoDB’s performance.

So what happens if your box goes down? .. a common question amongst new engineers to MongoDB. Does this data also gets lost? The answer lays somewhere in between how you balance your performance needs against the risks you are willing to take with your data.

The journal is synced to disk every 100ms. So the maximum that can be lost is up to 100ms worth of changes. At the cost of additional performance degradation, you can make your application proof against even that. The j option to getLastCommand will cause the application to block until the journal entries with the last change have been written to disk. See http://www.mongodb.org/display/DOCS/getLastError+Command . Of course, if you use this, your call to getLastCommand can wait up to 100ms, depending on where in the flush cycle you are. Therefore, this is left up to the user to change this default of 100ms.

Therefore, always load MongoDB with Journaling enabled, like this:

mongod --journal

This also auto cleans up crashes and puts crashed data back into MongoDB.

Don’t be alarmed when you see this

I found this in the /Journal (1GB files):

-rw------- 1 root root 1.0G 2011-06-23 02:26 prealloc.0
-rw------- 1 root root 1.0G 2011-06-23 02:25 prealloc.1
-rw------- 1 root root 1.0G 2011-06-23 02:26 prealloc.2

With Journaling enabled the server always creates those three 1GB files. It rotates through them, recycling them. They won’t grow any more. But they are always that size, regardless of the size of your database. If the server dies unexpectedly, the files remain, and contain the material necessary for the automated recovery that happens when you restart the server.

More here:

Journaling: http://www.mongodb.org/display/DOCS/Journaling
Durability and Repair: http://www.mongodb.org/display/DOCS/Durability+and+Repair

Checking server memory usage

As mentioned above, Memory is used by Mongo to speed things up. The more memory you have the better and MongoDB will use your RAM as it sees need for it taking into consideration other server resources.

It’s always a good practice to check memory usage.
Details here: http://www.mongodb.org/display/DOCS/Checking+Server+Memory+Usage

Your MongoDB configuration file

Here’s a recommend set of switches to have enabled in your mongodb.conf.

sudo nano /etc/mongodb.conf

add or update your settings to these:

journal=true
directoryperdb=true
logappend=true

journal = as discussed above to enable fast recovery from crashes.
directoryperdb = creates a new physical directory for each new database. Clean way to seperate your databases.
logappend = Whether the log file will be appended or over-written. Always have true else after a reboot your old logs will be overwritten and you may lose important crash specific data.

More detail here: http://www.mongodb.org/display/DOCS/File+Based+Configuration

MySQL to MongoDB

Finally, I found this fantastic chart illustrating the difference in commands between MySQL and MongoDB. Should help the transition for us MySQL folks. Click the image below to download a large PDF version (Size: 213Kb).

There is also mapping chart SQL to Mongo located on the MongoDB website here.

Have more questions? Attend the weekly MongoDB Office Hours in Mountain View Red Rock: http://www.meetup.com/San-Francisco-MongoDB-User-Group/events/16985746/ or post a question/comment below.

~ Ernest

<strong opinions>

Whoah! That was my 1st and 2nd, and 3rd.. and goes on… and on.. impression of this amazing database. Having experience with SQL Server and MySQL (both relational databases) for a few good years I decided to take this NoSQL database for a real world run. I also think the massive signs for MongoDB conference in San Francisco on the 101 had subliminally stamped a mark on my neurons 😉 I also read some interesting articles here and here comparing MongoDB to other NoSQL solutions.. After all that I was convinced that MongoDB would be the NoSQL database I would invest some serious time into.

As expressed above, I was impressed by MongoDB. I hooked it into a Zend MVC (Model–view–controller) application accessing Shanty-Mongo ORM through custom Model classes which I wrote from ground up. Everything just fit in so snugly.. and when I threw data against MongoDB it created collections (SQL world you’d call this tables) on the fly. Yes on the fly! That was super cool – loosely coupled interfacing – > create a Class Model and let the DB handle the rest. Super cool. Plus, this baby just flies! Everything from how fast it retrieves, stores, updates (even partial updates) and searches your data to how it stores it as Binary JSON both on the file structure and in memory (during open connection) on the server to speed things up. Everything about this database is impressive.

</strong opinions>

Ok enough of my ramblings. I think you get the picture. I am impressed.

If you are impressed and want to give MongoDB a try, read on. Next let’s dig in and explore stuff that is important (and what I learnt) about MongoDB, how to set it up and common commands to keep handy when working in the terminal.

Hello MongoDB

“MongoDB (from “humongous”) is a scalable, high-performance, open source, document-oriented database. MongoDB bridges the gap between key-value stores (which are fast and highly scalable) and traditional RDBMS systems (which provide rich queries and deep functionality).” ~ from MongoDB

Then (RDBMS) and now

Tables as you know in SQL are called “collections” in MongoDB.
Relational DB has records (record sets), MongoDB calls them “documents”.
MongoDB stores all data in JSON objects and serialized to BSON (Binary JSON) for storage. CouchDB (you may also know of) stores in just JSON.
In MongoDB, “ObjectId” in a collection is similar to auto-incrementing ID in a Relational database table.
Here’s a nice mapping chart between SQL and Mongo: http://www.mongodb.org/display/DOCS/SQL+to+Mongo+Mapping+Chart

More FYI notes:

You never create a database or collection. MongoDB does not require that you do so. As soon as you insert something, MongoDB creates the underlying collection and database.
If you query a collection that does not exist, MongoDB treats it as an empty collection.
Switching to a database with the use command won’t immediately create the database – the database is created lazily the first time data is inserted. This means that if you use a database for the first time it won’t show up in the list provided by `show dbs` until data is inserted.
Mongo uses memory mapped files to access data, which results in large numbers being displayed in tools like top for the mongod process. Think performance! You can get a feel for the “inherent” memory footprint of Mongo by starting it fresh, with no connections, with an empty /data/db directory and looking at the resident bytes.

Installing MongoDB on a Debian OS (Ubuntu)

If you’re using Ubuntu Server (I used 10.10), you can also install MongoDB using aptitude. Default Ubuntu sources do not contain MongoDB so you need to add distro location to your /etc/apt/sources.list file. That is easily done. 1st open sources.list in terminal editor (nano) like this:

sudo nano /etc/apt/sources.list

and drop & save this line to the end of the file:

deb http://downloads-distro.mongodb.org/repo/ubuntu-upstart dist 10gen

Exit nano and add the following 10gen GPG key, or apt will disable the repository (apt uses encryption keys to verify the repository is trusted and disables untrusted ones).

sudo apt-key adv --keyserver keyserver.ubuntu.com --recv 7F0CEB10

Now your ready to install the package with aptitude by executing the following commands:

shell> sudo apt-get update
shell> sudo apt-get install mongodb-10gen

Finally, fork mongo as a Daemon (to auto run on boot).

shell> sudo mongod --fork --logpath /var/log/mongodb.log --logappend

You can now use the command-line client to access the MongoDB database server, as below:

shell> mongo

You may want to hook up MongoDB to be accessible from your PHP application by editing your php.ini and allowing mongo to run as an extension. 1st find & edit your php.ini location like this:

sudo find / -name php.ini
sudo nano /php.ini

Then add & save mongo as an extension inside php.ini under “Dynamic Extensions”:

extension=mongo.so

Save & restart Apache:

sudo /etc/init.d/apache2 restart

Common MongoDB commands

Here’s a short list of the most common commands you will end up using when interfacing with the database. If you want to use a GUI to access MongoDB, I found MongoHub the best GUI administration tool for Mac. There is also a very comprehensive MongoDB documentation located here.

Purpose	Shell Command
Login to interface	mongo
Show all dbs on record	show dbs
Switch to my database	use mydb
Show all collections on record	show collections
List db version	db.version()
Insert data into a new collection	db.items.insert({ name:’eggs’, quantity: 10, price: 1.50 })
Display a whole list of documents in a collection	db.items.find({})
Display a select list of documents in a collection	db.items.find({guid:xyz})
Remove a whole list of documents in a collection	db.guid.remove({}) or where n == 1 db.things.remove({n:1});
Drop the collection	db.<>.drop()

Cons: when you delete a document you cannot return it’s ObjectId. Would be nice to have this feature. MongoDB folks?

MongoDB start ritual (habit forming):

mongo
show dbs
use mydbname
show collections

This will become a habit so don’t resist.

Don’t forget to read the follow up to this post located here: http://www.theroadtosiliconvalley.com/technology/mongodb-update/
Plenty of new knowledge on tools, crash recovery & best practices.

Give MongoDB a spin

If you are in doubt, give MongoDB a spin and make up your own mind ~ http://www.mongodb.org/
Don’t forget to let me know how your experience goes.. and if you have questions on getting this setup please contact me and I will be more than happy to help you out!