Sunday, February 8, 2015

MongoDB for Sitecore

Starting with Sitecore 7.5, MongoDB has become an integral part of the Sitecore ecosystem. This post will walk developers through the process of installing MongoDB, and cover some of the basics of MongoDB CRUD operations, and then look at how to access Sitecore data with MongoDB.

Installation

If you fire up a fresh install of Sitecore 8, and you do not have MongoDB installed, you will see the following errors in your log file:
5004 14:07:43 ERROR MongoDbDictionary.Store() has failed.
Exception: Sitecore.Analytics.DataAccess.DatabaseNotAvailableException
Message: Database not available
Source: Sitecore.Analytics.MongoDB
   at Sitecore.Analytics.Data.DataAccess.MongoDb.MongoDbCollection.Execute(Action action, ExceptionBehavior exceptionBehavior)
   at Sitecore.Analytics.Data.DataAccess.MongoDb.MongoDbCollection.Save(Object value)
   at Sitecore.Analytics.Data.DataAccess.MongoDb.MongoDbDictionary.Store(Object value)

Fortunately, this is pretty easy error to resolve. To make this go away, do the following:
  1. Go to http://mongodbB.org/downdloads, and download the latest MongoDB zip file for windows. Unzip the file to a folder of your choosing. In this walkthrough, I will use C:\mongo\ You should see a bin directory, and a few text documents (a readme, a license, etc.)
  2. Create a folder with the path C:\data\db. This is the built in default location that MongoDB uses to store database files, but the can be changed if desired.
  3. Open a command prompt, and run the following: C:\mongo\bin\mongod.exe. Recycle your Sitecore 8 app pool. You should see the error gone, and the mongod console window will show  some requests from Sitecore:
    2015-02-07T14:45:45.402-0500 [conn2] CMD: dropIndexes sc8hacking_analytics.Autom
    ationStates
    2015-02-07T14:45:45.636-0500 [conn2] CMD: dropIndexes sc8hacking_analytics.Inter
    actions
    Congratulations, you are now running MongoDB!

Running MongoDB as as Windows Service

You probably don't want to open a command prompt every time you work with Sitecore, so let's set this up as a Windows Service. This is just about as easy as running from the command prompt. The only extra wrinkle is that you need to specify a log file location:
  1. Close the existing command window if you still have it open. If you enter Ctrl-C, MongoDB will shut down gracefully:
    2015-02-07T14:51:23.594-0500 [consoleTerminate] got CTRL_C_EVENT, will terminate
     after current cmd ends
    2015-02-07T14:51:23.597-0500 [consoleTerminate] now exiting
     [...]
    2015-02-07T14:51:23.849-0500 [consoleTerminate] dbexit: really exiting now
    
  2. Choose a location and name for your MongoDB log file. Let's put ours in c:\data\mongolog.txt.
  3. Open a command prompt, and run this command: c:\mongo\bin\mongod --logpath c:\data\mongolog.txt --install
  4. Go to your Local Services, and you should find a new called MongoDB. It will be set to Automatic, so will start on the next reboot. Click Start to get it running. You can check the log file to confirm it is running.


    Note you can specify a number of options when creating the service, such as --dbpath to specify where you want databases to be stored, or --port to change from the MongoDB default of 28017. You can also define a configuration file, so that you can change options without having to recreate the service, using the  --config option.  The MongoDB configuration format is described here.

    Exploring MongoDB

    Now that we have MongoDB running, let's take a look at how we can write and read data from it. I recommend getting started with the shell that is provided by the MongoDB distribution. When you are confident with the mechanics of direct interaction with MongoDB, you can select a tool you prefer for day-to-day work. (Robomongo is my personal preference, because it respects the JavaScript focus of the shell, allowing you to define variables and functions.)

    The MongoDB Shell


    You can interact with the mongod process by launching the mongo shell c:\mongo\bin\mongo (note "mongo", not "mongod"). By default, this will make a connection to localhost:27107, so will connect to your local MongoDB instance. The MongoDB shell is a JavaScript interpreter, so you can do things like create functions and define variables. You will find it is handy to be able to store interim results to variables, and to create convenience methods. It also had good online documentation, both at a root level

    >help

    and at an object level


    >db.help() 
    >db.collection.help()


    Creating Data


    Let's take a look at how we can store data in MongoDB. First thing we need to do is create a database. As with much in MongoDB, this is a purely declarative operation. Simply refer to it and start using it, and MongoDB takes care of the rest:

    >use mynewdb

    If you were to look at your c:\data\db folder, you would not notice anything new, but if you now do this:
    > db.mynewcollection.insert({name: "Dan"})
    you would see two new files, "mynewdb.0" and "mynewdb.ns", to store data and indexes.

    It's interesting to compare this to the syntax to create a database in SQL Server, which requires specifying a location for the data and log files when you issue the command. With MongoDB, this configuration is owned by the MongoDB instance, so that from the application's point of view, MongoDB database creation is frictionless. (I was struck by this when I enabled the MongoDB session state provider with Sitecore 7.5. All I had to do was make the configuration change and add a connection string, and the Sitecore application created the database and schema on its own.) In a similar way, we created a collection (roughly equivalent to a "table" in SQL) simply by referring to it.

    Let's go take another look at the insert command. You'll notice that the thing being inserted was a JSON document. MongoDB stores JSON documents (hence the name "document database"), and does not impose any rules on how they are structured. You can view it as a place to park JSON, but one with really powerful querying and indexing capabilities.

    The one requirement that MongoDB does impose on documents is that each one must have a field called "_id", with a unique value per document. All MongoDB drivers, including the shell, will add this value in if you do not supply it. So if I were to query for this document, I would find an _id created for me:

    > db.mynewcollection.find()

    { "_id" : ObjectId("54d6d4c5b054372742f03578"), "name" : "Dan" }


    There is no requirement to use the ObjectID data type for IDs; you could use a Guid, a product code, or even a subdocument.  ObjectIDs are guaranteed to be unique per collection, and have the neat property of containing an embedded, accessible creation timestamp, so each document has an embedded "CreatedTimestamp" field, which you can access with getTimestamp():

    >   db.mynewcollection.findOne()._id.getTimestamp()
    Let's update this document by adding a few interests:

    >db.mynewcollection.update({name: "Dan"}, {$push: {interests: "Sitecore"} })
    WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
    >db.mynewcollection.update({name: "Dan"}, {$push: {interests: "MongoDB"} })
    WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
    >db.mynewcollection.findOne()


    And some address information:

    > db.mynewcollection.update({name: "Dan"}, {$set: {address: {work: {city: "Some
    ville", state: "MA"}}}})
    WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
    
    
    
    So now we have a rich document, with an array and a nested sub-document:
    > db.mynewcollection.findOne()
    {
            "_id" : ObjectId("54d6d4c5b054372742f03578"),
            "name" : "Dan",
            "interests" : [
                    "Sitecore",
                    "MongoDB"
            ],
            "address" : {
                    "work" : {
                            "city" : "Somerville",
                            "state" : "MA"
                    }
            }
    }
    
    
    
    It's in querying for documents like this that MongoDB really shines.  We could find documents for people named "Dan":
    >db.mynewcollection.find({name: "Dan" })
    or use a regular expression to find names beginning with "D":
    >db.mynewcollection.find({name: /^D/ })
    or who have an interest in Sitecore:
    >db.mynewcollection.find({interests: "Sitecore"})
    or work in Massachusetts:
    >db.mynewcollection.find({"address.work.state": "MA"})
    or get a count of people who work in MA and have an interest in MongoDB:
    >db.mynewcollection.count({interests: "MongoDB", "address.work.state": "MA"})
    Here we see a number of MongoDB querying features:

    • Arrays are treated like tags, so if you specify an array element, then all documents that have that array element will match.
    • You can use "dot notation" to specify subdocuments. 
    • You can use regular expressions, which are especially efficient if an index is in place on the field and the expression is specified to match strings beginning with a pattern ("prefix query").
    For a full comparison of MongoDB and SQL querying syntax, I recommend the MongoDB comparison chart, avalable at bit.ly/mongo2sql

    Looking at Sitecore data

    Ok, lets now use the Mongo shell to explore Sitecore analytics data.  First, we can use "show db" to find the analytics database, and "show collections" to see what document collections are in this database:


    > show dbs
    admin                        (empty)
    local                        0.078GB
    mynewdb                      0.078GB
    sc8hacking_analytics         0.078GB
    sc8hacking_tracking_contact  0.078GB
    sc8hacking_tracking_live     0.078GB
    > use sc8hacking_analytics
    switched to db sc8hacking_analytics
    > show collections
    ClassificationsMap
    Contacts
    Devices
    Interactions
    OperationStatuses
    


    Sitecore uses the "Interactions" collection to store information about a user visit. Let's take a look at one of these documents:

    
    > db.Interactions.findOne()
    {
            "_id" : BinData(3,"iUI0c5nx30mRxsHzpiUgqA=="),
            "_t" : "VisitData",
            "ContactId" : BinData(3,"a1k5n+FLBkG6myUtAeBn4A=="),
            "StartDateTime" : ISODate("2015-02-07T19:45:40.258Z"),
            "EndDateTime" : ISODate("2015-02-07T19:45:40.258Z"),
            "SaveDateTime" : ISODate("2015-02-07T20:13:45.312Z"),
            "ChannelId" : BinData(3,"8uQYtBMQQkugU7bU3KmIvw=="),
            "Browser" : {
                    "BrowserVersion" : "42.0",
                    "BrowserMajorName" : "Chrome",
                    "BrowserMinorName" : "42.0"
            },
            "Screen" : {
                    "ScreenHeight" : 480,
                    "ScreenWidth" : 640
            },
            "ContactVisitIndex" : 2,
            "Ip" : BinData(0,"fwAAAQ=="),
            "Language" : "en",
            "LocationId" : BinData(3,"1B2M2Y8AsgTpgAmY7PhCfg=="),
            "MvTest" : {
                    "ValueAtExposure" : 0
            },
            "OperatingSystem" : {
                    "_id" : "WinNT"
            },
            "Pages" : [
                    {
                            "DateTime" : ISODate("2015-02-07T19:45:40.259Z"),
                            "Duration" : 0,
                            "Item" : {
                                    "_id" : BinData(3,"n1UNEaXe6kKcHIpd9+cO+Q=="),
                                    "Language" : "en",
                                    "Version" : 2
                            },
                            "PageEvents" : [
                                    {
                                            "Name" : "Error",
                                            "ItemId" : BinData(3,"n1UNEaXe6kKcHIpd9+
    cO+Q=="),
                                            "Timestamp" : NumberLong(0),
                                            "Data" : "Sitecore.Analytics.DataAccess.
    DatabaseNotAvailableException: Database not available\r\n   at Sitecore.Analytic
    s.Data.DataAccess.MongoDb.MongoDbCollection.Execute(Action action, ExceptionBeha
    vior exceptionBehavior)\r\n   at Sitecore.Analytics.Data.DataAccess.MongoDb.Mong
    oDbCollection.Save(Object value)\r\n   at Sitecore.Analytics.Data.DataAccess.Mon
    goDb.MongoDbDictionary.Store(Object value)",
                                            "DataKey" : "Sitecore.Analytics.DataAcce
    ss.DatabaseNotAvailableException: Database not available",
                                            "Text" : "Sitecore.Analytics.DataAccess.
    DatabaseNotAvailableException: Database not available",
                                            "PageEventDefinitionId" : BinData(3,"SiW
    /yMycFk6QCYK3zTPkvg=="),
                                            "DateTime" : ISODate("2015-02-07T19:45:4
    0.397Z"),
                                            "Value" : 0
                                    }
                            ],
                            "SitecoreDevice" : {
                                    "_id" : BinData(3,"339d/sCJmU2ao7X70AnJ8w=="),
                                    "Name" : "Default"
                            },
                            "MvTest" : {
                                    "ValueAtExposure" : 0
                            },
                            "Url" : {
                                    "Path" : "/"
                            },
                            "VisitPageIndex" : 1
                    }
            ],
            "SiteName" : "website",
            "TrafficType" : 20,
            "UserAgent" : "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (K
    HTML, like Gecko) Chrome/42.0.2292.0 Safari/537.36",
            "Value" : 0,
            "VisitPageCount" : 1
    }
    > 
    

    A few things are worth noting here:

    • Note the number of BinData values. This is how MongoDB represents GUIDs.  I will come back to how you can translate these into .NET Guids in the next section.
    • Sitecore uses a number of subdocuments. For example, the Browser is stored as a subdocument, so you can query for sessions using Chrome with "Browser.MajorName", or a specific version with "Browser.BrowserVersion."
    • Pages are stored in a nested array, and page events are stored as a nested array within Pages.  This nesting of object hierarchies is typical of how MongoDB stores data, and avoids the performance hit of doing table seeks that are required when joining tables. In DMS, this data would be stored in five tables: Visitors, Visits, Pages, PageEvents, and PageEventDefinitions.
    • Session data is written to MongoDB when the SessionEnd event is fired. We can see amusing proof of this in the fact that the page event captured is the exception I referred to at the beginning of this post. Obviously, the error that MongoDB is unavailable must have been persisted in memory until MongoDB became available.  You can work around this while doing development by recycling your app pool, or wiring up a button to call Session.Abandon() to end the ASP.NET session.
    • The xDB Overview and Architecture document states: "In the xDB, currently the only type of interaction is an online visit but in the future this will be expanded to include offline interactions."  We can see this provided for in the field "_t", set to "VisitData". This refers the Sitecore analytics class (Sitecore.Analytics.Model.VisitData) that has been serialized into this document. By specifying the type, Sitecore provides for storing other sorts of interactions in this collection.  This will allow doing things like aggregating engagement value (in the "Value" field) across multiple interaction types, seeing for example whether web, in store, or telephone contacts are generating the most engagement value.  Here we see how MongoDB's lack of defined schema is used to allow storing a variety of classes that share the same base class, allowing for querying at either a base or subclass level.  This sort of open-ended design, allowing for future evolution of the product without requiring schema changes, shows why MongoDB was a compelling choice as a foundational technology for the xDB.

    Guids and MongoDB

    As I mentioned above, Guids are a bit of a challenge to work with in the MongoDB shell, since the binary object is represented in a manner completely different from in .NET. Fortunately, MongoDB provides for that in a JavaScript file, uuidhelpers.js, available on the C# driver github repository. If you download this file, and reference it when you start the mongo shell (>c:\mongo\bin\mongo c:\uuidhelpers.js --shell) you now have two methods available, .toCSUUID() which converts a BinData object into a .NET Guid, and CSUUID("some guid") which creates a BinData object from a string representing a C# GUID. So with this file loaded, we can now see what Sitecore item the user viewed on the first page of the session, represented by the MongoDB shell as BinData(3,"n1UNEaXe6kKcHIpd9+cO+Q==").

    > db.Interactions.findOne().Pages[0].Item._id.toCSUUID()
    CSUUID("110d559f-dea5-42ea-9c1c-8a5df7e70ef9")


    Because there's no place like home. :)

    10 comments:

    1. Thanks for a great post, I got MongoDB up and running because of it. In your Sitecore data example, you show a page visit where Pages.PageEvents.Data is "Database not available" and URL.Path is "/" (root). This entry is also found in my database, even though I have now visited my page many times on many different URLs. It seems like an error. Is anything missing? Do I need to do something else to be able to track page visits correctly?

      ReplyDelete
    2. Hi Esben, glad this was helpful. Are you seeing that "Database not available" more than once? In my case, it is showing up because the same session existed both before and after I started MongoDB. You can do a query like this to get the count:

      >db.Interactions.count({"Pages.PageEvents.DataKey": "Sitecore.Analytics.DataAcce
      ss.DatabaseNotAvailableException: Database not available"})

      If you are seeing only one occurrence (or a small number that does not increase), then you are fine. You are just seeing records created before MongoDB was wired up.

      ReplyDelete
    3. Hi Dan. The count is 1 using that query. But if I do db.Interactions.count(), I also get 1, so it seems that it is the only row in the database.

      I have checked that the connection is correct in ConnectionStrings.config and that analytics is enabled in Sitecore.Analytics.config (it also seems to be configured correctly, as the analytics database is in fact present in MongoDB).

      Any ideas?

      ReplyDelete
      Replies
      1. This is not an issue. You are seeing the record that got created when you got started. It won't go away unless you delete it, but there is no reason to do so.

        Delete
      2. Just noticed that you said this was the only row in your DB. I would try adding a few different content items, and then end your session. I believe you can do this by recycling the app pool and hitting the web site, or you can call Session.Abandon(). Sitecore does not write to MongoDB until the session ends.

        Delete
      3. It didn't help recycling the app pool, but stopping and starting the app pool did! Now, my page visits are logged in the database. Thank you for helping out.

        Delete
    4. Great post! Great explanation, why MongoDB is the choice for the Sitecore analytics data and why SQL server is not.
      But there is a tiny error:
      c:\mongo\bin\mongo --logpath c:\data\mongolog.txt --install
      Should be:
      c:\mongo\bin\mongod --logpath c:\data\mongolog.txt --install

      ReplyDelete
    5. My mongo cluster with 2 app, 3 config and 2 replica set of 3 vms each only shows config db :(

      ReplyDelete
    6. This is great information. Especially converting the GUID data. Is there any chance you can show us how to decode the IP data as well?

      ReplyDelete