NoSQL is not about object databases

Monday, March 29, 2010

(c) Bertrand Le Roy 2008 NoSQL as a movement is an interesting beast. I kinda like that it’s negatively defined (I happen to belong myself to at least one other such a-community). It’s not in its roots about proposing one specific new silver bullet to kill an old problem. it’s about challenging the consensus.

Actually, blindly and systematically replacing relational databases with object databases would just replace one set of issues with another.

No, the point is to recognize that relational databases are not a universal answer -although they have been used as one for so long- and recognize instead that there’s a whole spectrum of data storage solutions out there.

Why is it so hard to recognize, by the way? You are already using some of those other data storage solutions every day. Let me cite a few:

The file system
Active Directory
XML / JSON documents
The Web
e-mail
Logs
Excel files
EXIF blobs in your photos
Relational databases
And yes, object databases

It’s just a fact of modern life. Notice by the way that most of the data that you use every day is unstructured and thus mostly unsuitable for relational storage. It really is more a matter of recognizing it: you are already doing NoSQL.

So what happens when for any reason you need to simultaneously query two or more of these heterogeneous data stores? Well, you build an index of sorts combining them, and that’s what you query instead.

Of course, there’s not much distance to travel from that to realizing that querying is better done when completely separated from storage.

So why am I writing about this today?

Well, that’s something I’ve been giving lots of thought, on and off, over the last ten years. When I built my first CMS all that time ago, one of the main problems my customers were facing was to manage and make sense of the mountain of unstructured data that was constituting most of their business. The central entity of that system was the file system because we were dealing with lots of Word documents, PDFs, OCR’d articles, photos and static web pages. We could have stored all that in SQL Server. It would have worked. Ew. I’m so glad we didn’t.

Today, I’m working on Orchard (another CMS ;). It’s a pretty young project but already one of the questions we get the most is how to integrate existing data.

One of the ideas I’ll be trying hard to sell to the rest of the team in the next few months is to completely split the querying from the storage. Not only does this provide great opportunities for performance optimizations, it gives you homogeneous access to heterogeneous and existing data sources. For free.

12 Comments

Hey you could always use Mongo and NoRM :). Shoot I would be happy to commit if Bradley would let me :):). Oh the irony...

You hit the nail on the head - NoSQL is about prying brains of the DB as "where to put the data". SQL is great for answering questions - not so great and marshaling heavy write environments.

NoSQL rocks at that - it's what Cassandra is really good at - heavy write environment that scales out almost literally with the flip of a power button.

I would give anything to be in the meeting when you and Renaud discuss this...

Rob Conery - Monday, March 29, 2010 8:59:30 AM

100% Bertrand, I couldn't agree more that querying and storage are two separate things, and should be treated as such. I think that in most cases, it's a convenience for folks to try and use one solution for both.

One approach, I think, could be to do something similar to the Command-Query Separatation thing that has been getting a lot of press lately. If Orchard simply separated the interfaces for querying vice storing, then the 1.0 impl. could use a single solution, but consumers could use the IoC container to swap that out later for separated solutions, e.g. an object db for storage and something like Lucene for indexing and querying.

paul - Monday, March 29, 2010 2:09:47 PM

Funny, we're precisely working on a CMS here, and this is one of the many open questions at this time :)

lucasbfr - Monday, March 29, 2010 2:29:26 PM

You're supposed to spell it "NOSQL" (o capitalized) because it's not about "no SQL" but really "not only SQL". Unfortunately they chose an unfortunate abbreviation and the "no" part really caught on.

Morten Nielsen - Monday, March 29, 2010 2:29:56 PM

ALL: I managed comments in my sleep this morning and it's just possible that I deleted a few by mistake. Apologies and don't hesitate to re-post if that's the case.

@Rob: yeah. There are two ways you can move a hive. You can kick it and run, or you can put a protective suit on, sedate the bees with some smoke, and then move the hive. The former approach sure is fast. If you follow my drift ;)

@Morten: I stand corrected. Thanks for the tip.

Bertrand Le Roy - Monday, March 29, 2010 4:51:31 PM

Funny to read about that Bertrand! Cause that's exactly something we did in our whohive project which should be announced very soon. It uses a separation between the querying and the storage.

Laurent Kempé - Monday, March 29, 2010 8:53:24 PM

@bertrand unless there's an even bigger hive that is expecting you to move the hive in question for them...

Rob Conery - Monday, March 29, 2010 11:57:01 PM

@Rob: you seem to be deluded that a hive can expect anything. Oh well, I guess this great analogy just broke down.

Bertrand Le Roy - Tuesday, March 30, 2010 12:03:12 AM

All this talk of Hives reminded me that you missed the Windows Registry off your list :-)

rbirkby - Tuesday, March 30, 2010 9:25:12 AM

Add one more file type - the serialization dump. I recently seen a project where the authors take a very complex/composite object, serialize it, GZip it, and then store the resulting lump in one field of a SQL Server table.

John - Tuesday, March 30, 2010 1:27:13 PM

@rbirkby: yeah, the list wasn't supposed to be exhaustive. But yeah, you're right, we all use that every day without knowing it.

@John: ew. I hope they didn't have to access it frequently.

Bertrand Le Roy - Tuesday, March 30, 2010 5:42:06 PM

@dbj: It's your form of unsubstantiated and generic comments that are what really detract from the NOSQL ideas currently. And the fact that the comment is wrong doesn't help your case :P relational databases that are constructed properly are just as scalable and capable of handling the data volume as any other solution. It's just that properly constructing a relational database is HARD, and requires that you are modeling relational data (not hammering some other form of data into a vaguely relational schema)

@bertrand: nice article. I've been really enjoying the current crop of NOSQL articles that are promoting hybrid systems and analysis of what a data store actually requires over the 'use NoSQL!1 RDBMSs aren't scalabel!!1!' form of comments that initially plagued the uptake :)

workmad3 - Thursday, April 1, 2010 7:38:12 AM

Comments have been disabled for this content.