Friday 17 April 2015

NoSQL / MongoDB Basics

As a software developer or architect, you would have encountered and used NoSQL. Even if you haven't used it, it is good to have the basics of NoSQL and MongoDB at your fingertips. A couple of times, I faltered while discussing NoSQL / MongoDB. Hence this compilation of basic points, as a ready reference.

The basics that you need to have at your fingertips are: when would you choose NoSQL, what is Brewer's Theorem and how it helps in making this choice, what are the various types of NoSQL databases with examples of each, the basic concepts of MongoDB, some unique features of MongoDB, and have you heard any criticism of it.

For the first question of when would you choose NoSQL, if you google, you will get a lot of links; the three that I read are:
http://blogs.shephertz.com/2013/06/20/a-developers-dilemma-when-to-use-nosql/
http://www.itworld.com/article/2833291/essential-reading-for-choosing-a-nosql-database.html
http://www.informationweek.com/big-data/big-data-analytics/nosql-newsql-or-rdbms-how-to-choose/a/d-id/1297861

Basically the main points are, high data volumes, scale out horizontally, schemaless data. The third link presents the selection criteria in the form of a table:

Table 1: 10 Selection Criteria For Choosing Database Types
 Characteristic  RDBMS  NoSQL  NewSQL
 ACID compliance (Data, Transaction integrity)  Yes  No  Yes
 OLAP/OLTP  Yes  No  Yes
 Data analysis (aggregate, transform, etc.)  Yes  No  Yes
 Schema rigidity (Strict mapping of model)  Yes  No  Maybe
 Data format flexibility  No  Yes  Maybe
 Distributed computing  Yes  Yes  Yes
 Scale up (vertical)/Scale out (horizontal)  Yes  Yes  Yes
 Performance with growing data  Fast  Fast  Very Fast
 Performance overhead  Huge  Moderate  Minimal
 Popularity/community Support  Huge  Growing  Slowly growing
Moving on to the next question of what is Brewer's Theorem and how it helps in making this choice. Essentially this theorem states that you can only guarantee two of Consistency, Availability and Partition Tolerance. More formally,
"it's impossible for a distributed computer system to simultaneously provide all three of these guarantees:
  • Consistency (all nodes see the same data at the same time)
  • Availability (node failures don't prevent survivors from continuing to operate)
  • Partition tolerance (no failures less than total network failures cause the system to fail)
Since only two of these characteristics are guaranteed for any given scalable system, use your functional specification and business SLA (service level agreement) to determine what your minimum and target goals for CAP are, pick the two that meet your requirements, and proceed to implement the appropriate technology."
Source: [1]

Here's the visual depiction:

Source: second link of itworld given above, article by Matthew Mombrea.

Moving on to the next question of what are the various types of NoSQL databases with examples of each, here is the answer:

Key-Value stores
Redis, BerkeleyDB, Risk

Document Databases
MongoDB, CouchBase. Similar databases: Lucene and Solr / ElasticSearch (both built on top of Lucene)

Column-based stores
Cassandra, HBase

Graph databases
Neo4J, OrientDB

XML Databases
Mark Logic, eXist-db, xDB
Source: [2]

Moving on to the next question, what are the basic concepts of MongoDB:
  • A document is the basic unit of data for MongoDB, roughly equivalent to a row in a relational database management system (but much more expressive).
  • Similarly, a collection can thought of as the schema-free-equivalent of a table.
  • A single instance of MongoDB can host multiple independent databases, each of which can have its own collections and permissions.
  • MongoDB comes with a simple but powerful JavaScript shell, which is useful for the administration of MongoDB instances and data manipulation.
  • Every document has a special key, "_id", that is unique across the document's collection.
Moving on to the fifth question, as to what are some unique features of MongoDB: It has some really nice, unique tools that are not (all) present in any other solution.
Indexing
MongoDB supports generic secondary indexes, allowing a variety of fast queries, and provides unique, compound, and geospatial indexing capabilities as well.

Stored JavaScript
Instead of stored procedures, developers can store and use JavaScript functions and values on the server side.

Aggregation
MongoDB supports MapReduce and other aggregation tools

Fixed-size collections
Capped collections are fixed in size and are useful for certain types of data, such as logs.

File storage
MongoDB supports an easy-to-use protocol for storing large files and file metadata.
Source: [3]

Finally if someone asks, have you heard any criticism of MongoDB, here is an article that seems to be popular and I keep bumping into it quite frequently in my reading.
http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/

Sarah writes well, I wish I could write like that. She makes her posts interesting with diagrams and pictures. Curiously, though the title of article is titled "Why you should never use MongoDB", there is advice there in when to use MongoDB.

The crux of the article is captured in:
Whether you’re duplicating critical data (ugh), or using references and doing joins in your application code (double ugh), when you have links between documents, you’ve outgrown MongoDB. When the MongoDB folks say “documents,” in many ways, they mean things you can print out on a piece of paper and hold. A document may have internal structure — headings and subheadings and paragraphs and footers — but it doesn’t link to other documents. It’s a self-contained piece of semi-structured data.

If your data looks like that, you’ve got documents. Congratulations! It’s a good use case for Mongo. But if there’s value in the links between documents, then you don’t actually have documents. MongoDB is not the right solution for you. It’s certainly not the right solution for social data, where links between documents are actually the most critical data in the system.

Sources:
[1] DZone Refcardz, "Getting Started with NoSQL and Data Scalability" by Eugene Ciurana.
[2] https://keefcode.wordpress.com/2013/12/04/nosql-databases-how-to-choose/
[3] MongoDB: The Definitive Guide by Kristina Chodrow and Michael Dirolf, 2010. Oreilly Media, Inc. ISBN: 978-1-449-38156-1.
Note - answers taken verbatim.

No comments:

Post a Comment