Introduction to MongoDB NoSQL database for SQL Developers – Part 4
January 30th, 2014 / No Comments » / by admin
In the last three posts in this series (post 1 HERE, post 2 HERE and post 3 HERE) I outlined the core aspects of provisioning MongoDB instance as well as a few fundamental concepts relating to CRUD database operations using Mongo’s syntax. In this post I would like to further elaborate on this information which should hopefully bring those details together in a concise and simple tutorial.
I already assume you have MongoDB files downloaded and extracted on your machine (if not, please see post 1 HERE) so let’s start the server, Mongo’s console and create a sample collection with a few documents in it. Below you can see my command line in Microsoft Windows with all the start-up entries included. The steps outlined here are responsible for the following:
- Session 1 – change working directory where mongod demon is installed and start mongoDB server.
- Session 2 – change working directory to where MongoDB is extracted to, start Mongo’s Shell and connect to database ‘foo’ (notice that the database does not need to be created prior to issuing ‘use foo’ to connect to it).
Next, we will insert 10 sample records (documents) into ‘Customers’ collection on ‘foo’ database. If you remember from reading post 2 and post 3, MongoDB uses JSON format to structure its data and an ‘insert’ function to add data to an existing or new collection. Let’s continue with Session 2 and run the following to populate the new ‘Customers’ collection:
db.purchase_hist.insert([{ "_id" : ObjectId("52ce3648a4a0bb15a8e07b75"), "CustomerID" : 23424, "First_Name" : "Mary", "Middle_Name" : "Joeanne", "Last_Name" : "Black", "Email_Address" : "mary0120@yahoo.com.au", "Phone_Number" : "03-8573-9455", "Purchased_Units" : 1, "Purchased_Value" : 29.99, "Item_SKU" : "RMK973", "Rec_Created_Date" : ISODate("2011-12-31T13:00:00Z") }, { "_id" : ObjectId("52ce3648a4a0bb15a8e07b76"), "CustomerID" : 84933, "First_Name" : "John", "Middle_Name" : "Lorance", "Last_Name" : "Moore", "Email_Address" : "johnnym@awol.com", "Phone_Number" : "03-3423-1155", "Purchased_Units" : 1, "Purchased_Value" : 49.0, "Item_SKU" : "MKK833", "Rec_Created_Date" : ISODate("2011-12-31T13:00:00Z") }, { "_id" : ObjectId("52ce3648a4a0bb15a8e07b77"), "CustomerID" : 19583, "First_Name" : "Martin", "Middle_Name" : null, "Last_Name" : "Laser", "Email_Address" : "mlaser91@aol.com", "Phone_Number" : "03-2355-1109", "Purchased_Units" : 5, "Purchased_Value" : 2099.49, "Item_SKU" : "HHY009", "Rec_Created_Date" : ISODate("2012-03-31T13:00:00Z") }, { "_id" : ObjectId("52ce3648a4a0bb15a8e07b78"), "CustomerID" : 34311, "First_Name" : "Spencer", "Middle_Name" : "Chris", "Last_Name" : "McEvans", "Email_Address" : "spencerdude@hotmail.com", "Phone_Number" : "03-1122-0007", "Purchased_Units" : 32, "Purchased_Value" : 0.99, "Item_SKU" : "YNK227", "Rec_Created_Date" : ISODate("2012-03-31T13:00:00Z") }, { "_id" : ObjectId("52ce3648a4a0bb15a8e07b79"), "CustomerID" : 74998, "First_Name" : "Mark", "Middle_Name" : null, "Last_Name" : "King", "Email_Address" : "mk038722@gmail.com", "Phone_Number" : "03-3423-1155", "Purchased_Units" : 1, "Purchased_Value" : 88.0, "Item_SKU" : "JKL777", "Rec_Created_Date" : ISODate("2011-12-31T13:00:00Z") }, { "_id" : ObjectId("52ce3648a4a0bb15a8e07b7a"), "CustomerID" : 30002, "First_Name" : "Mary", "Middle_Name" : "Susan", "Last_Name" : "Grey", "Email_Address" : "mmgrey@gmail.com", "Phone_Number" : "03-1299-3859", "Purchased_Units" : 2, "Purchased_Value" : 22.99, "Item_SKU" : "AAN199", "Rec_Created_Date" : ISODate("2011-12-31T13:00:00Z") }, { "_id" : ObjectId("52ce3648a4a0bb15a8e07b7b"), "CustomerID" : 22374, "First_Name" : "Luis", "Middle_Name" : "Blake", "Last_Name" : "Shimaro", "Email_Address" : "shimarolou@yahoo.com.au", "Phone_Number" : "03-0385-3999", "Purchased_Units" : 77, "Purchased_Value" : 1229.49, "Item_SKU" : "NMN800", "Rec_Created_Date" : ISODate("2012-11-10T13:00:00Z") }, { "_id" : ObjectId("52ce3648a4a0bb15a8e07b7c"), "CustomerID" : 99931, "First_Name" : "Natalie", "Middle_Name" : "G", "Last_Name" : "Chin", "Email_Address" : "nataliechin@mediasmarts.com.au", "Phone_Number" : "03-3759-1001", "Purchased_Units" : 1, "Purchased_Value" : 123.0, "Item_SKU" : "HZX201", "Rec_Created_Date" : ISODate("2012-07-22T14:00:00Z") }, { "_id" : ObjectId("52ce3648a4a0bb15a8e07b7d"), "CustomerID" : 22374, "First_Name" : "Marian", "Middle_Name" : null, "Last_Name" : "McErin", "Email_Address" : "marianmcerin@gmail.com", "Phone_Number" : "03-3400-3331", "Purchased_Units" : 10, "Purchased_Value" : 999.99, "Item_SKU" : "YQP232", "Rec_Created_Date" : ISODate("2012-12-18T13:00:00Z") }, { "_id" : ObjectId("52ce3648a4a0bb15a8e07b7e"), "CustomerID" : 23424, "First_Name" : "Rick", "Middle_Name" : "Tony", "Last_Name" : "Webster", "Email_Address" : "rikky69@gmail.com", "Phone_Number" : "03-9459-1112", "Purchased_Units" : 3, "Purchased_Value" : 239.99, "Item_SKU" : "AAL155", "Rec_Created_Date" : ISODate("2012-01-22T13:00:00Z") }])
Once the insert has been committed, let’s display the data to confirm the collection and its documents have indeed been created using db.collection.find() function as per below:
Now that we have out sample database, collection and a number of documents created, we can run a few queries against those to output the data we’re interested. For example:
/*================================================================================= To select all customers where their middle name is missing in a sorted order: =================================================================================*/ db.purchase_hist.find({"Middle_Name": null}).sort({"CustomerID": 1}); --Which is equivalent to the following SQL in RDBMS: SELECT * FROM purchase_hist WHERE Middle_Name IS NULL ORDER BY CustomerID; /*================================================================================= To sum up purchased value and purchased units (where there was more than 1), grouping by when the record was first created: =================================================================================*/ db.purchase_hist.group({ "key": { "Rec_Created_Date": true }, "initial": { "Units": 0, "Value": 0 }, "reduce": function(obj, prev) { prev.Units = prev.Units + obj.Purchased_Units - 0; prev.Value = prev.Value + obj.Purchased_Value - 0; }, "cond": { "Purchased_Units": { "$gt": 1 } } }); --Which is equivalent to the following SQL in RDBMS: SELECT SUM(Purchased_Units) as Units, SUM(Purchased_Value) as Value, Rec_Created_Date FROM purchase_hist WHERE Purchased_Units > 1 GROUP BY Rec_Created_Date; /*================================================================================= To select average purchase value grouped by customer ID where their last name is either 'Webster' or 'Grey' and averaged purchased value is greater than 0: =================================================================================*/ db.purchase_hist.group({ "key": { "CustomerID": true }, "initial": { "sumforaverageAvg_Purchased_Value": 0, "countforaverageAvg_Purchased_Value": 0 }, "reduce": function(obj, prev) { prev.sumforaverageAvg_Purchased_Value += obj.Purchased_Value; prev.countforaverageAvg_Purchased_Value++; }, "finalize": function(prev) { prev.Avg_Purchased_Value = prev.sumforaverageAvg_Purchased_Value / prev.countforaverageAvg_Purchased_Value; delete prev.sumforaverageAvg_Purchased_Value; delete prev.countforaverageAvg_Purchased_Value; }, "cond": { "Last_Name": { "$in": ["Webster", "Grey"] } } }); --Which is equivalent to the following SQL in RDBMS: SELECT AVG(Purchased_Value) as Avg_Purchased_Value, CustomerID FROM purchase_hist WHERE Last_Name IN ('Webster', 'Grey') GROUP BY CustomerID HAVING AVG(Purchased_Value) > 0;
If you’re working with such RDBMS systems as Microsoft SQL Server, MySQL or PostgeSQL and need a tool to migrate your relational data into MongoDB, write some queries in an interactive manner or simply to relieve yourself from the ‘burden’ of working with Mongo’s Shell all the time, there are a few tools that can help with that. I personally tried a few different applications but eventually settled for two I consider the best MongoVue and RoboMongo. Both are fairly simplistic in their design but making a transition from Mongo’s Shell to either one of those will be like playing Pong on a monochrome Atari from 1972 and the latest version of Angry Birds on the current smartphone. Having said that, if you came from SQL Server management Studio or Oracle’s SQL Developer background, chances are you will be underwhelmed. Each tool has its strengths and advantages and they somewhat complement each other. I, for example, much prefer RoboMongo’s Shell implementation which also includes syntax highlighting, auto-completion, different view modes (text, tree, custom) etc. Below is a sample view of RoboMongo’s interface.
On the other hand, MongoVue (paid version) has a relational database data import tool (providing you’re using MS SQL Server, MySQL or PostreSQL) which in case of MS SQL Server worked really well. It also provides some basic database monitoring/profiling capabilities, GridFS explorer and Map Reduce feature. MongoVue interface look as per below.
Final Thoughts
This is the last post in this four part series on MongoDB for SQL developers – the other three can be found HERE, HERE and HERE. So far I only scratched the surface of MongoDB’s features and functionality not to mention the whole NoSQL paradigm. Concepts such as indexing, MapReduce, replication, sharding, monitoring, backups or application design are unfortunately out of scope for this series but there is plenty of great resources (including MongoDB’s own documentation) that can be found – as always Google is your friend. Also, Amazon has got a nice collection of books which cover both admin and development side of using MongoDB so with a little bit of reading/hands-on training it’s not as overwhelming as it may initially seem.
RDBMS systems have long been the staple, go-to applications for data storage layer for a long time now, however, with the recent rise of data volume, velocity and variety a new crop of technologies was required to adapt to these changes. NoSQL databases, in specific contexts such as big data and real-time web applications, provide better simplicity of design, horizontal scaling and finer control over availability when compared to relational databases. At present (Jan 2014), MongoDB specifically is the most popular NoSQL database on the market and the fact that it has been adopted as backend software by a number of major websites and services, including Craigslist, eBay, Foursquare, SourceForge, and The New York Times, among others is a testament to its credibility.
Working with MongoDB is not without challenges though and, depending how you look at it, it has a number of faults. For starters, Mongo requires a lot of memory, preferring to put as much data as possible into working memory for fast access. Furthermore, Mongo doesn’t support the notion of ACID transactions, which is a touchstone of the RDBMS world. Data isn’t immediately written to disk upon an insert (although you can optionally require this via a flag) – a background process eventually writes unsaved data to disk. This makes writes extremely fast, but corresponding reads can occasionally be inconsistent. Coming from a relational database background, it is also hard to wrap your head around the fact that queries don’t support joins between collections, not to mention sparse security features and the lack of Business Intelligence tools integration. You can read some interesting insights into why MongoDB is not a great fit for all data storage needs HERE or HERE. There is also an interesting video on migrating Diaspora database from MongoDB to MySQL due issues the developers encountered when using is for social network data HERE. Finally, Andrew C. Oliver from the Open Software Integrators has written a short post on the challenges of porting RDBMS schema to Mongo HERE.
With all those points considered, the relational database is still the staple data store for the vast majority of applications built today. But for some applications, the flexibility offered by Mongo provides advantages with respect to development speed and overall application performance. Additionally, Mongo’s relative ease-of-use and low price tag make it an attractive option for companies whose project objective can fit into the NoSQL space and MongoDB’s document storage architecture.