This editorial was originally published on Sept 13, 2018. It is being republished as Steve is on vacation.
A few weeks ago I ran across an essay from Randolph West called, Relational Databases Aren't the Problem. This was a response to another essay that made a case for relational databases being bad for many businesses. I thought that both pieces were interesting for different reasons. Certainly I don't believe the the RDBMS is perfect, and it certainly can be hard for developers to build software that interfaces with a relational system.
The original complaint about the RDBMS is somewhat rambling and deceitful, in my opinion. It is an excellent study of how to use a few concepts to confuse and create doubt in a casual reader. If I weren't reading closely, I might fall for a number of the issues that exist with relational databases. However, in my mind, part of the issue is that quite a few of the issues that are discussed aren't problems with relational databases, but often the issue with poorly developed software or design of the entities and relationships. I find myself even more disappointed that the author hasn't really addressed any comments, but rather just pasted a link to his followup article.
I do think that the defense from Mr. West does a good job, though it also misses some of the primary issues we struggle with relational databases. There are problems with the knowledge of how to build a well performing database, both from application developers that view this as a necessary evil as well as experienced database developers that don't regularly improve their skills and try new design techniques.
I also think that both of the pieces fail to address the issues of gathering and working with multiple rows of data. The second discussion of "doing without databases" really implements its own database management structure, which may work well, but is fraught with issues such as the concurrency issues of multiple users searching and scanning through data without having indexes. While indexes are overhead, they are necessary as hash buckets aren't necessarily feasible for all the properties in a class. Also, if you end up building them for multiple properties, you're building an index. There's another good defense of some of the issues here.
I do think that keeping more data in memory and synchronizing access to structures sounds great, but scaling that out to multiple systems, and ensuring consistency at high volumes, not to mention potential loss of data issues from crashes are a problem. Having a write ahead log in SQL Server does a wonderful job of ensuring we can handle redo/undo on system restart. The method presented doesn't necessarily ensure this, though perhaps accepting some data loss from high concurrency changes is OK for many applications.
I will say that the idea of all data in memory is interesting. I had to stop and think about how many databases really have more than 1TB of data. If we throw out indexes, does this cover most data stores? I bet this does, though that doesn't mean that there aren't issues with using in memory array structures, with widely varying data sizes.
Would I use an in-memory data structure for software? It's tempting, but honestly, I wouldn't. The value of data is too high, with potential issues from poorly implemented ACID control structures. Plenty of issues have been found with different RDBMSs over their years, and even some in NoSQL systems. Thinking that I could avoid any issues and protect data is something I wouldn't even try. After all, if there is some error, I'd prefer it from a system that many people use, rather than one I tried to emulate for no good reason.