The Age of Multiple Databases

Question

The Age of Multiple Databases

Steve Jones - SSC Editor

SSC Guru

Points: 734434
More actions
July 14, 2018 at 1:17 pm

#366137

Comments posted to this topic are about the item The Age of Multiple Databases

Viewing 15 posts - 1 through 15 (of 25 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1003857 More actions · Answer 1

What I frequently find is that a lot of people use the latest new shiny object just to become a member of the too-cool-for-school crowd. I also find that even in the absence of such "thinking", that a lot of people don't actually spend the time to become proficient in the various technologies they've elected to use and so don't actually know which technology is actually the best for what they want to accomplish. Further, sometimes the "best" isn't actually the "best" simply because it requires such a deep understanding of all the technologies in place and round-n-round we go just because someone may not know how to do something in a given technology.

An example of this is when it was all the rage (i.e. too-cool-for-school) to use PowerShell to control and execute all backups for all servers from a single point. None of the people pushing that even considered what would happen to those servers if that single point of failure actually did fail. Someone later came out with a method to use PowerShell to actually setup autonomous backups one each system and then centralize the success/failure reporting and, unlike the first renditions, THAT was a great idea but I wonder how many people actually went back and made the change?

To summarize, I'm all for using the right tool for the right thing but a whole lot of people don't actually know what the right tool is because they don't know what the other tools can actually do. This is particularly true for SQL Server where a whole lot of people think that "it's just a place to store data" and that "SQL" stands for "Scarcely Qualifies as a Language". Now there's some serious limited thinking.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

xsevensinzx One Orange Chip Points: 25560 More actions · Answer 2

Honestly, it's not just about the size of the data I've found with these systems. Mostly it's the utility and how they solve similar problems that traditional RDBMS ultimately solve, but differently. That's mainly why I won't move away from data stores with the data warehouse from now on because it's just too damn powerful to not use them regardless of your feelings on adapting new tech or how poorly others have implemented them.

David.Poole SSC Guru Points: 75896 More actions · Answer 3

I saw an interesting article calling SQL a narrow waist. Yes we have all these database types but what has been discovered is that each database type with its own unique method of providing data access caused similar problems to the ones the database type was trying to address. A developer using those database types had to learn many different query languages. We have seen a move to adopting some dialect of SQL in many of the NOSQL databases and even things that aren't databases such as AWS Athena/Apache Presto.

A common query language is a useful thing to have.

Each database type requires time to learn what its strengths, weaknesses and appropriate usage may be. Some of the usages are blindingly obvious where as others are more subtle and require a Eureka moment. Those with a subtle use case risk having their reputation sullied by inappropriate application and to be brutally honest, ignorance.

LinkedIn Profile

Eric M Russell SSC Guru Points: 125520 More actions · Answer 4

I see the database platform debate as being similar to the construction industry's wood, brick, concrete, steel, glass debate. Most modern buildings incorporate all the mentioned construction materials, leveraging the strengths of each to create a solution that is practical, scalable, and cost effective.

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

roger.plowman SSChampion Points: 10265 More actions · Answer 5

How is a key-value pair database system all that different from the same technique performed in an RDMS? After all, internally at its heart, SQL Server is exactly a key-value pair system too, it's just been extended with abstraction layers.

Besides, all these database techniques simply add to the complexity of the application without a huge benefit. Sure, horizontal scaling, blah, blah, blah, but when all is said and done horizontal scaling is just a band-aid to jury-rig a solution of "not enough computing power in a single box".

I suppose we're still in the "do it at all" stage of horizontal scaling, but it's yet more complexity on top of an insane spaghetti pile of mish-mashed software.

By the way, the 3 stages of tech are: 1) do it at all, 2) do it well, 3) do it RIGHT. 🙂

Jeff Moden SSC Guru Points: 1003857 More actions · Answer 6

xsevensinzx - Sunday, July 15, 2018 8:35 PM
Honestly, it's not just about the size of the data I've found with these systems. Mostly it's the utility and how they solve similar problems that traditional RDBMS ultimately solve, but differently. That's mainly why I won't move away from data stores with the data warehouse from now on because it's just too damn powerful to not use them regardless of your feelings on adapting new tech or how poorly others have implemented them.

Just curious because it's not clear to me... does that mean you're using SQL Server for this or something else? If something else, then what are you using. And, no... not making an opinion one way or the other. You're one of the good guys and I'm curious as to what you've actually done.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1003857 More actions · Answer 7

roger.plowman - Monday, July 16, 2018 7:01 AM
By the way, the 3 stages of tech are: 1) do it at all, 2) do it well, 3) do it RIGHT. 🙂

Heh.... my development process is "Make it work, make it fast, make it pretty... and it ain't done until it's pretty".

In that same vein where people say "Good, Fast, and Cheap... pick two"... I always say you only need to pick "Good" (ie, RIGHT) because, if you know what you're doing, fast and cheap will come along for the ride and not doing it "Good" will cost you oodles later on and fixing that won't be fast.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1003857 More actions · Answer 8

David.Poole - Monday, July 16, 2018 1:59 AM
I saw an interesting article calling SQL a narrow waist. Yes we have all these database types but what has been discovered is that each database type with its own unique method of providing data access caused similar problems to the ones the database type was trying to address. A developer using those database types had to learn many different query languages. We have seen a move to adopting some dialect of SQL in many of the NOSQL databases and even things that aren't databases such as AWS Athena/Apache Presto.
A common query language is a useful thing to have.
Each database type requires time to learn what its strengths, weaknesses and appropriate usage may be. Some of the usages are blindingly obvious where as others are more subtle and require a Eureka moment. Those with a subtle use case risk having their reputation sullied by inappropriate application and to be brutally honest, ignorance.

Spot on.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Steve Jones - SSC Editor SSC Guru Points: 734434 More actions · Answer 9

roger.plowman - Monday, July 16, 2018 7:01 AM
How is a key-value pair database system all that different from the same technique performed in an RDMS? After all, internally at its heart, SQL Server is exactly a key-value pair system too, it's just been extended with abstraction layers.
Besides, all these database techniques simply add to the complexity of the application without a huge benefit. Sure, horizontal scaling, blah, blah, blah, but when all is said and done horizontal scaling is just a band-aid to jury-rig a solution of "not enough computing power in a single box".
I suppose we're still in the "do it at all" stage of horizontal scaling, but it's yet more complexity on top of an insane spaghetti pile of mish-mashed software.
By the way, the 3 stages of tech are: 1) do it at all, 2) do it well, 3) do it RIGHT. 🙂

Key-value stores, like Redis, are way faster for some applications. As you scale, you can find that a separate store acts almost like a cache that can reduce workloads in some cases. Same with graph queries. While this can be a challenge to manage update, it really depends on your application. For some systems, not even AMZN scale, adding a Redis/key-value lookup server can dramatically speed up an application and reduce resource requirements on the RDBMS.

In short, they're not different, but they can change how your application performs. And yes, you could use a second SQL Server as a key-value lookup, but a few customers have found Redis faster and cheaper.

Rod at work SSC-Dedicated Points: 33898 More actions · Answer 10

Challenging article, Steve. For almost 2 decades I've worked with only relational databases, primarily SQL Server. I've heard of things like DocumentDB, CosmosDB, NoSQL, etc. But have never had a chance to work with any of them. At this point I'd have to say that those I work with and I have probably gotten to the point of seeing all data store problems as nails and we'll just automatically pick our hammer, SQL Server. You're probably right, in that we should try to use something more appropriate for the data storage job, but I think we're too blinded to know of anything else. It might take a little playing around with other data storage systems and paradigms before we can realize, at a practical level, what they have to offer.

Kindest Regards, Rod Connect with me on LinkedIn.

Eric M Russell SSC Guru Points: 125520 More actions · Answer 11

RowStore tables and B-Tree indexes are not part of the SQL standard. SQL is a high level abstraction and integration layer than can be stacked on top of a wide range of data structures; anything from tabular to columnular, key-value, OLAP, JSON documents, and flat files.

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

Jeff Moden SSC Guru Points: 1003857 More actions · Answer 12

Rod at work - Monday, July 16, 2018 8:41 AM
Challenging article, Steve. For almost 2 decades I've worked with only relational databases, primarily SQL Server. I've heard of things like DocumentDB, CosmosDB, NoSQL, etc. But have never had a chance to work with any of them. At this point I'd have to say that those I work with and I have probably gotten to the point of seeing all data store problems as nails and we'll just automatically pick our hammer, SQL Server. You're probably right, in that we should try to use something more appropriate for the data storage job, but I think we're too blinded to know of anything else. It might take a little playing around with other data storage systems and paradigms before we can realize, at a practical level, what they have to offer.

Oddly enough, that's the same advice that I sometimes give to folks looking to use something other than SQL Server except the advice applies to SQL Server. 😀

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

xsevensinzx One Orange Chip Points: 25560 More actions · Answer 13

Jeff Moden - Monday, July 16, 2018 7:43 AM
Just curious because it's not clear to me... does that mean you're using SQL Server for this or something else? If something else, then what are you using. And, no... not making an opinion one way or the other. You're one of the good guys and I'm curious as to what you've actually done.

Nope, I mean I am NOT JUST using SQL Server to solve everything. I'm a data architect. I think beyond a single service and what that service can do. Lots of people are trying to always pile as much work into their one little bucket until the point of it overflowing or the business having to buy a bigger bucket. This is what happens to SQL Server. Always having to scale up and up as well constantly trying to jam everything into it versus taking the rather large complex problems SQL Server is solving and distribute the load across other services such as data stores.

I rely pretty heavily on the traditional RDBMS concept in the sense of having a data warehouse and data marts. The thing is, I'm using Azure Data Warehouse as opposed to SQL Server for that warehouse, which is MPP. This means, I have more stock in large column stores that are populating Azure DB's. Where the data store comes into play is landing and storing all the raw data; ground zero I call it. The data warehouse can now fall back on a service that can be completely rebuilt from even in the face of all backups being corrupted and or lost. Why? Because technically the data store is being used to process the data with an analytics engine that sits between the raw data (i.e.: data store) and the processed data (i.e.: data warehouse).

Thanks to cool technologies like Azure Data Lake Analytics and Polybase, the magic all comes together where SQL Server (or Azure Data Warehouse in my case) is not trying to do everything; workload is distributed across systems. It now has friends and they are all working together for the same common goal. Not that this can't be solved with more than one instance of SQL Server. The point here really falls on the fact that data stores a pretty cheap and can be extremely fast without having to constantly model, index, think about the final form of the data. Feels very plug-in-play to a point where when things get serious, there is the data warehouse to help you make it serious.

P.S

I really like data stores the most because they can bypass the data warehouse. It's direct access to the data before it's implemented and modeled into the data warehouse. This is by far the biggest bottleneck that pushes most users away from the concept of the data warehouse or schema-on-write systems.

Jeff Moden SSC Guru Points: 1003857 More actions · Answer 14

xsevensinzx - Monday, July 16, 2018 6:28 PM
Nope, I mean I am NOT JUST using SQL Server to solve everything. I'm a data architect. I think beyond a single service and what that service can do. Lots of people are trying to always pile as much work into their one little bucket until the point of it overflowing or the business having to buy a bigger bucket. This is what happens to SQL Server. Always having to scale up and up as well constantly trying to jam everything into it versus taking the rather large complex problems SQL Server is solving and distribute the load across other services such as data stores.
I rely pretty heavily on the traditional RDBMS concept in the sense of having a data warehouse and data marts. The thing is, I'm using Azure Data Warehouse as opposed to SQL Server for that warehouse, which is MPP. This means, I have more stock in large column stores that are populating Azure DB's. Where the data store comes into play is landing and storing all the raw data; ground zero I call it. The data warehouse can now fall back on a service that can be completely rebuilt from even in the face of all backups being corrupted and or lost. Why? Because technically the data store is being used to process the data with an analytics engine that sits between the raw data (i.e.: data store) and the processed data (i.e.: data warehouse).
Thanks to cool technologies like Azure Data Lake Analytics and Polybase, the magic all comes together where SQL Server (or Azure Data Warehouse in my case) is not trying to do everything; workload is distributed across systems. It now has friends and they are all working together for the same common goal. Not that this can't be solved with more than one instance of SQL Server. The point here really falls on the fact that data stores a pretty cheap and can be extremely fast without having to constantly model, index, think about the final form of the data. Feels very plug-in-play to a point where when things get serious, there is the data warehouse to help you make it serious.
P.S
I really like data stores the most because they can bypass the data warehouse. It's direct access to the data before it's implemented and modeled into the data warehouse. This is by far the biggest bottleneck that pushes most users away from the concept of the data warehouse or schema-on-write systems.

Thanks for the info. You must handle a wad more data than I do. I've neither had to scale up or out. Of course, that may be because my databases are chump change to some folks.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)