Introduction to MongoDB NoSQL database for SQL Developers – Part 4

January 30th, 2014 / No Comments » / by admin

In the last three posts in this series (post 1 HERE, post 2 HERE and post 3 HERE) I outlined the core aspects of provisioning MongoDB instance as well as a few fundamental concepts relating to CRUD database operations using Mongo’s syntax. In this post I would like to further elaborate on this information which should hopefully bring those details together in a concise and simple tutorial.

I already assume you have MongoDB files downloaded and extracted on your machine (if not, please see post 1 HERE) so let’s start the server, Mongo’s console and create a sample collection with a few documents in it. Below you can see my command line in Microsoft Windows with all the start-up entries included. The steps outlined here are responsible for the following:

  • Session 1 – change working directory where mongod demon is installed and start mongoDB server.
  • Session 2 – change working directory to where MongoDB is extracted to, start Mongo’s Shell and connect to database ‘foo’ (notice that the database does not need to be created prior to issuing ‘use foo’ to connect to it).

MongoDB-part4-mongo-startup-session-cmd

Next, we will insert 10 sample records (documents) into ‘Customers’ collection on ‘foo’ database. If you remember from reading post 2 and post 3, MongoDB uses JSON format to structure its data and an ‘insert’ function to add data to an existing or new collection. Let’s continue with Session 2 and run the following to populate the new ‘Customers’ collection:

db.purchase_hist.insert([{
  "_id" : ObjectId("52ce3648a4a0bb15a8e07b75"),
  "CustomerID" : 23424,
  "First_Name" : "Mary",
  "Middle_Name" : "Joeanne",
  "Last_Name" : "Black",
  "Email_Address" : "mary0120@yahoo.com.au",
  "Phone_Number" : "03-8573-9455",
  "Purchased_Units" : 1,
  "Purchased_Value" : 29.99,
  "Item_SKU" : "RMK973",
  "Rec_Created_Date" : ISODate("2011-12-31T13:00:00Z")
},
{
  "_id" : ObjectId("52ce3648a4a0bb15a8e07b76"),
  "CustomerID" : 84933,
  "First_Name" : "John",
  "Middle_Name" : "Lorance",
  "Last_Name" : "Moore",
  "Email_Address" : "johnnym@awol.com",
  "Phone_Number" : "03-3423-1155",
  "Purchased_Units" : 1,
  "Purchased_Value" : 49.0,
  "Item_SKU" : "MKK833",
  "Rec_Created_Date" : ISODate("2011-12-31T13:00:00Z")
},
{
  "_id" : ObjectId("52ce3648a4a0bb15a8e07b77"),
  "CustomerID" : 19583,
  "First_Name" : "Martin",
  "Middle_Name" : null,
  "Last_Name" : "Laser",
  "Email_Address" : "mlaser91@aol.com",
  "Phone_Number" : "03-2355-1109",
  "Purchased_Units" : 5,
  "Purchased_Value" : 2099.49,
  "Item_SKU" : "HHY009",
  "Rec_Created_Date" : ISODate("2012-03-31T13:00:00Z")
},
{
  "_id" : ObjectId("52ce3648a4a0bb15a8e07b78"),
  "CustomerID" : 34311,
  "First_Name" : "Spencer",
  "Middle_Name" : "Chris",
  "Last_Name" : "McEvans",
  "Email_Address" : "spencerdude@hotmail.com",
  "Phone_Number" : "03-1122-0007",
  "Purchased_Units" : 32,
  "Purchased_Value" : 0.99,
  "Item_SKU" : "YNK227",
  "Rec_Created_Date" : ISODate("2012-03-31T13:00:00Z")
},
{
  "_id" : ObjectId("52ce3648a4a0bb15a8e07b79"),
  "CustomerID" : 74998,
  "First_Name" : "Mark",
  "Middle_Name" : null,
  "Last_Name" : "King",
  "Email_Address" : "mk038722@gmail.com",
  "Phone_Number" : "03-3423-1155",
  "Purchased_Units" : 1,
  "Purchased_Value" : 88.0,
  "Item_SKU" : "JKL777",
  "Rec_Created_Date" : ISODate("2011-12-31T13:00:00Z")
},
{
  "_id" : ObjectId("52ce3648a4a0bb15a8e07b7a"),
  "CustomerID" : 30002,
  "First_Name" : "Mary",
  "Middle_Name" : "Susan",
  "Last_Name" : "Grey",
  "Email_Address" : "mmgrey@gmail.com",
  "Phone_Number" : "03-1299-3859",
  "Purchased_Units" : 2,
  "Purchased_Value" : 22.99,
  "Item_SKU" : "AAN199",
  "Rec_Created_Date" : ISODate("2011-12-31T13:00:00Z")
},
{
  "_id" : ObjectId("52ce3648a4a0bb15a8e07b7b"),
  "CustomerID" : 22374,
  "First_Name" : "Luis",
  "Middle_Name" : "Blake",
  "Last_Name" : "Shimaro",
  "Email_Address" : "shimarolou@yahoo.com.au",
  "Phone_Number" : "03-0385-3999",
  "Purchased_Units" : 77,
  "Purchased_Value" : 1229.49,
  "Item_SKU" : "NMN800",
  "Rec_Created_Date" : ISODate("2012-11-10T13:00:00Z")
},
{
  "_id" : ObjectId("52ce3648a4a0bb15a8e07b7c"),
  "CustomerID" : 99931,
  "First_Name" : "Natalie",
  "Middle_Name" : "G",
  "Last_Name" : "Chin",
  "Email_Address" : "nataliechin@mediasmarts.com.au",
  "Phone_Number" : "03-3759-1001",
  "Purchased_Units" : 1,
  "Purchased_Value" : 123.0,
  "Item_SKU" : "HZX201",
  "Rec_Created_Date" : ISODate("2012-07-22T14:00:00Z")
},
{
  "_id" : ObjectId("52ce3648a4a0bb15a8e07b7d"),
  "CustomerID" : 22374,
  "First_Name" : "Marian",
  "Middle_Name" : null,
  "Last_Name" : "McErin",
  "Email_Address" : "marianmcerin@gmail.com",
  "Phone_Number" : "03-3400-3331",
  "Purchased_Units" : 10,
  "Purchased_Value" : 999.99,
  "Item_SKU" : "YQP232",
  "Rec_Created_Date" : ISODate("2012-12-18T13:00:00Z")
},
{
  "_id" : ObjectId("52ce3648a4a0bb15a8e07b7e"),
  "CustomerID" : 23424,
  "First_Name" : "Rick",
  "Middle_Name" : "Tony",
  "Last_Name" : "Webster",
  "Email_Address" : "rikky69@gmail.com",
  "Phone_Number" : "03-9459-1112",
  "Purchased_Units" : 3,
  "Purchased_Value" : 239.99,
  "Item_SKU" : "AAL155",
  "Rec_Created_Date" : ISODate("2012-01-22T13:00:00Z")
}])

Once the insert has been committed, let’s display the data to confirm the collection and its documents have indeed been created using db.collection.find() function as per below:

MongoDB-part4-dbfind-cmd

Now that we have out sample database, collection and a number of documents created, we can run a few queries against those to output the data we’re interested. For example:

/*=================================================================================
To select all customers where their middle name is missing in a sorted order:
=================================================================================*/
db.purchase_hist.find({"Middle_Name": null}).sort({"CustomerID": 1});

--Which is equivalent to the following SQL in RDBMS:
SELECT * FROM purchase_hist WHERE Middle_Name IS NULL
ORDER BY CustomerID;

/*=================================================================================
To sum up purchased value and purchased units (where there was more than 1),
grouping by when the record was first created:
=================================================================================*/
db.purchase_hist.group({
    "key": {
        "Rec_Created_Date": true
    },
    "initial": {
        "Units": 0,
        "Value": 0
    },
    "reduce": function(obj, prev) {
        prev.Units = prev.Units + obj.Purchased_Units - 0;
        prev.Value = prev.Value + obj.Purchased_Value - 0;
    },
    "cond": {
        "Purchased_Units": {
            "$gt": 1
        }
    }
});

--Which is equivalent to the following SQL in RDBMS:
SELECT
SUM(Purchased_Units) as Units, SUM(Purchased_Value) as Value, Rec_Created_Date
FROM purchase_hist
WHERE Purchased_Units > 1
GROUP BY Rec_Created_Date;

/*=================================================================================
To select average purchase value grouped by customer ID where their last name
is either 'Webster' or 'Grey' and averaged purchased value is greater than 0:
=================================================================================*/
db.purchase_hist.group({
    "key": {
        "CustomerID": true
    },
    "initial": {
        "sumforaverageAvg_Purchased_Value": 0,
        "countforaverageAvg_Purchased_Value": 0
    },
    "reduce": function(obj, prev) {
        prev.sumforaverageAvg_Purchased_Value += obj.Purchased_Value;
        prev.countforaverageAvg_Purchased_Value++;
    },
    "finalize": function(prev) {
        prev.Avg_Purchased_Value = prev.sumforaverageAvg_Purchased_Value / prev.countforaverageAvg_Purchased_Value;
        delete prev.sumforaverageAvg_Purchased_Value;
        delete prev.countforaverageAvg_Purchased_Value;
    },
    "cond": {
        "Last_Name": {
            "$in": ["Webster", "Grey"]
        }
    }
});

--Which is equivalent to the following SQL in RDBMS:
SELECT AVG(Purchased_Value) as Avg_Purchased_Value, CustomerID
  FROM purchase_hist
  WHERE Last_Name IN ('Webster', 'Grey')
  GROUP BY CustomerID
  HAVING AVG(Purchased_Value) > 0;

If you’re working with such RDBMS systems as Microsoft SQL Server, MySQL or PostgeSQL and need a tool to migrate your relational data into MongoDB, write some queries in an interactive manner or simply to relieve yourself from the ‘burden’ of working with Mongo’s Shell all the time, there are a few tools that can help with that. I personally tried a few different applications but eventually settled for two I consider the best MongoVue and RoboMongo. Both are fairly simplistic in their design but making a transition from Mongo’s Shell to either one of those will be like playing Pong on a monochrome Atari from 1972 and the latest version of Angry Birds on the current smartphone. Having said that, if you came from SQL Server management Studio or Oracle’s SQL Developer background, chances are you will be underwhelmed. Each tool has its strengths and advantages and they somewhat complement each other. I, for example, much prefer RoboMongo’s Shell implementation which also includes syntax highlighting, auto-completion, different view modes (text, tree, custom) etc. Below is a sample view of RoboMongo’s interface.

MongoDB-part4-robomongo-gui

On the other hand, MongoVue (paid version) has a relational database data import tool (providing you’re using MS SQL Server, MySQL or PostreSQL) which in case of MS SQL Server worked really well. It also provides some basic database monitoring/profiling capabilities, GridFS explorer and Map Reduce feature. MongoVue interface look as per below.

MongoDB-part4-mongovue-gui

Final Thoughts

This is the last post in this four part series on MongoDB for SQL developers – the other three can be found HERE, HERE and HERE. So far I only scratched the surface of MongoDB’s features and functionality not to mention the whole NoSQL paradigm. Concepts such as indexing, MapReduce, replication, sharding, monitoring, backups or application design are unfortunately out of scope for this series but there is plenty of great resources (including MongoDB’s own documentation) that can be found – as always Google is your friend. Also, Amazon has got a nice collection of books which cover both admin and development side of using MongoDB so with a little bit of reading/hands-on training it’s not as overwhelming as it may initially seem.

RDBMS systems have long been the staple, go-to applications for data storage layer for a long time now, however, with the recent rise of data volume, velocity and variety a new crop of technologies was required to adapt to these changes. NoSQL databases, in specific contexts such as big data and real-time web applications, provide better simplicity of design, horizontal scaling and finer control over availability when compared to relational databases. At present (Jan 2014), MongoDB specifically is the most popular NoSQL database on the market and the fact that it has been adopted as backend software by a number of major websites and services, including Craigslist, eBay, Foursquare, SourceForge, and The New York Times, among others is a testament to its credibility.

MongoDB-part4-dbrankings-Jan2014

Working with MongoDB is not without challenges though and, depending how you look at it, it has a number of faults. For starters, Mongo requires a lot of memory, preferring to put as much data as possible into working memory for fast access. Furthermore, Mongo doesn’t support the notion of ACID transactions, which is a touchstone of the RDBMS world. Data isn’t immediately written to disk upon an insert (although you can optionally require this via a flag) – a background process eventually writes unsaved data to disk. This makes writes extremely fast, but corresponding reads can occasionally be inconsistent. Coming from a relational database background, it is also hard to wrap your head around the fact that queries don’t support joins between collections, not to mention sparse security features and the lack of Business Intelligence tools integration. You can read some interesting insights into why MongoDB is not a great fit for all data storage needs HERE  or HERE. There is also an interesting video on migrating Diaspora database from MongoDB to MySQL due issues the developers encountered when using is for social network data HERE. Finally, Andrew C. Oliver from the Open Software Integrators has written a short post on the challenges of porting RDBMS schema to Mongo HERE.

With all those points considered, the relational database is still the staple data store for the vast majority of applications built today. But for some applications, the flexibility offered by Mongo provides advantages with respect to development speed and overall application performance. Additionally, Mongo’s relative ease-of-use and low price tag make it an attractive option for companies whose project objective can fit into the NoSQL space and MongoDB’s document storage architecture.

Tags: , ,

Introduction to MongoDB NoSQL database for SQL Developers – Part 3

January 30th, 2014 / No Comments » / by admin

In the last post I outlined some basic MongoDB CRUD commands and their relationship/mapping to Structured Query Language. In this post I will go over some key commands and their variables used for data extraction.

Read operations, or queries, retrieve data stored in the database. In MongoDB, queries select documents from a single collection. Queries specify criteria, or conditions, that identify the documents that MongoDB returns to the clients. A query may include a projection that specifies the fields from the matching documents to return. The projection limits the amount of data that MongoDB returns to the client over the network.

For query operations, MongoDB provide a db.collection.find() method. The method accepts both the query criteria and projections and returns a cursor to the matching documents. You can optionally modify the query to impose limits, skips, and sort orders. The following diagram highlights the components of a MongoDB query operation.

MongoDB-part3-crud-annotated-findThe next diagram shows the same query in SQL.

MongoDB-part3-crud-annotated-SQL-select

MongoDB queries exhibit the following behavior:

  • All queries in MongoDB address a single collection.
  • You can modify the query to impose limits, skips, and sort orders.
  • The order of documents returned by a query is not defined and is not defined unless you specify a sort().
  • Operations that modify existing documents (i.e. updates) use the same query syntax as queries to select documents to update.
  • In aggregation pipeline, the $match pipeline stage provides access to MongoDB queries.

MongoDB provides a db.collection.findOne() method as a special case of find() that returns a single document. Consider the following diagram of the query process that specifies a query criteria and a sort modifier.

MongoDB-part3-crud-query-stages

In the diagram, the query selects documents from the users collection. Using a query selection operator to define the conditions for matching documents, the query selects documents that have age greater than (i.e. $gt) 18. Then the sort() modifier sorts the results by age in ascending order.

Queries in MongoDB return all fields in all matching documents by default. To limit the amount of data that MongoDB sends to applications, include a projection in the queries. By projecting results with a subset of fields, applications reduce their network overhead and processing requirements. Projections, which are the the second argument to the find() method, may either specify a list of fields to return or list fields to exclude in the result documents. Consider the following diagram of the query process that specifies a query criteria and a projection.

MongoDB-part3-crud-query-w-projection-stages

In the above diagram, the query selects from the users collection. The criteria matches the documents that have age equal to 18. Then the projection specifies that only the name field should return in the matching documents.

Naturally, MongoDB also supports other query constructs which are used to achieve further functionality for data extraction and retrieval e.g. grouping, case statements, ordering etc. Below is a graphical depiction of the sample SQL statements and their corresponding equivalent in the syntax used by MongoDB.

MongoDB-part3-MySQL-to-MongoDB-mapping

In the next post I will dive deeper into MongoDB’s query syntax and introduce a couple of handy tools which can be used for basic administrative and data retrieval tasks.

Tags: , ,