SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


The RDBMS is Often the Right Tool


The RDBMS is Often the Right Tool

Author
Message
Steve Jones
Steve Jones
SSC Guru
SSC Guru (682K reputation)SSC Guru (682K reputation)SSC Guru (682K reputation)SSC Guru (682K reputation)SSC Guru (682K reputation)SSC Guru (682K reputation)SSC Guru (682K reputation)SSC Guru (682K reputation)

Group: Administrators
Points: 682214 Visits: 21588
Comments posted to this topic are about the item The RDBMS is Often the Right Tool

Follow me on Twitter: @way0utwest
Forum Etiquette: How to post data/code on a forum to get the best help
My Blog: www.voiceofthedba.com
Aaron Cutshall
Aaron Cutshall
SSCrazy Eights
SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)

Group: General Forum Members
Points: 8279 Visits: 1361
What really gets me is that so many of the NoSQL databases now offer an SQL interface. I have seen folks go to great lengths to avoid using a relational database yet the end result could be easily served by one. One project I was involved with was on AWS with highly structured delimited files in S3 and used Glue and SparkSQL on top of that to enable them to "query" the data. The source data were mostly from other relational databases (SQL Server, Oracle and DB2) which they dumped into S3 with a file per table. They wrote Python scripts to ETL the data into other S3 files that were also highly structured and documented with Glue so SparkSQL could query them. They did everything to avoid using PostgreSQL and RedShift yet wound up creating a very complex and convoluted system that approximated a database. Their arguments for not using an RDBMS was that it needed to be "scalable and fast" yet was really neither.


...when ye are in the service of your fellow beings ye are only in the service of your God. -- Mosiah 2:17
Steve Jones
Steve Jones
SSC Guru
SSC Guru (682K reputation)SSC Guru (682K reputation)SSC Guru (682K reputation)SSC Guru (682K reputation)SSC Guru (682K reputation)SSC Guru (682K reputation)SSC Guru (682K reputation)SSC Guru (682K reputation)

Group: Administrators
Points: 682214 Visits: 21588
They do have a SQL interface, and I'm not sure if this is good or bad. It might be that SQL is just a better way of querying, or at least, more familiar. If they continue to use the underlying graph query, then things might perform better. Though, if we're querying via SQL, the complexity returns.

Many have added additional indexing as well, to get around the problems of querying multiple entities and instances. That's where NoSQL type structures struggled. They were much, much better for singleton type queries, but struggled with aggregation. I think some of the indexing now reduces scalability, which to me, means the developers ought to have learned more about RDBMS work in the first place rather than struggling against it.

I do know this might be less about tech and more about people (data professionals) being difficult with rapidly requested changes.

Follow me on Twitter: @way0utwest
Forum Etiquette: How to post data/code on a forum to get the best help
My Blog: www.voiceofthedba.com
Aaron Cutshall
Aaron Cutshall
SSCrazy Eights
SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)

Group: General Forum Members
Points: 8279 Visits: 1361
I agree that NoSQL environments have their niches and work well within those, I believe that too many problems arise when folks try to use NoSQL environments for problems that are easily solved with an RDBMS. My dad always taught me to "use the right tool for the job" and I am convinced that applies to data projects as well. We should use NoSQL only where appropriate just as we should use RDBMS only where appropriate. I have seen folks use RDBMS systems for tasks that should have been handled by NoSQL platforms. Just because you CAN use a tool for something doesn't mean it's best for it. Using an RDBMS to track documents is like using a screwdriver for a chisel. Not really the best tool because if it works at all it will be sub-par at best.


...when ye are in the service of your fellow beings ye are only in the service of your God. -- Mosiah 2:17
kiwood
kiwood
SSC Eights!
SSC Eights! (983 reputation)SSC Eights! (983 reputation)SSC Eights! (983 reputation)SSC Eights! (983 reputation)SSC Eights! (983 reputation)SSC Eights! (983 reputation)SSC Eights! (983 reputation)SSC Eights! (983 reputation)

Group: General Forum Members
Points: 983 Visits: 138
I think the first thing to consider is that as Steve pointed out most of us aren't operating at the Google/Facebook/Spotify/Netflix scale. It might also help to realize that it isn't always scale. For instance, web pages have only a "sort of" schema. What schema is defined isn't enforced with browsers being extremely adept at displaying in spite of violation of what should (or shouldn't) be present. In the document world, the lack of hard schema makes a lot of sense. For that matter, if I were dealing with something like Facebook I also might look for something outside the traditional SQL database.

Probably one should start in thinking about how their data relates to other data. Most situations have some defined relationships that follow reasonably easy rules. Further, most of the time the easier on developers is only true at the early stages. Later, the lack of structure comes home to roost and the developer moves to another area pretending they didn't leave a mess behind. They then get to complain about the mess the other developer (just like them) left behind.
xsevensinzx
xsevensinzx
SSC-Insane
SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)

Group: General Forum Members
Points: 24831 Visits: 6224
Aaron N. Cutshall - Wednesday, February 27, 2019 8:22 AM
What really gets me is that so many of the NoSQL databases now offer an SQL interface. I have seen folks go to great lengths to avoid using a relational database yet the end result could be easily served by one. One project I was involved with was on AWS with highly structured delimited files in S3 and used Glue and SparkSQL on top of that to enable them to "query" the data. The source data were mostly from other relational databases (SQL Server, Oracle and DB2) which they dumped into S3 with a file per table. They wrote Python scripts to ETL the data into other S3 files that were also highly structured and documented with Glue so SparkSQL could query them. They did everything to avoid using PostgreSQL and RedShift yet wound up creating a very complex and convoluted system that approximated a database. Their arguments for not using an RDBMS was that it needed to be "scalable and fast" yet was really neither.

Sigh, not necessarily. Having that SQL interface on top of the document stores allows us to have full transparency of the data as it lands on disk on the way up to a gold standard like the data warehouse. It's insanely powerful and something some RDBMS cannot offer in terms of opening doorways into the raw data BEFORE it's cleaned and processed by the ETL regardless if it's Python or SSIS. If you want to talk about being lean, this is what helps you get there.

It also offers a way to use the data BEFORE it's completely processed by the SQL/DBA/Whatever team, which can often have a long turnaround time. For example, all of my data is ingested using Python as documents to a document store. As soon as the data lands on disk, it's accessible to the end user via these SQL Interfaces you're smashing. It's also accessible to the DBA team via Polybase with EXTERNAL TABLES both with the data warehouse and all the data marts using Azure DB (which also supports EXTERNAL QUERIES to the data warehouse, which allows you to bridge the data lake). This is insanely powerful and allows users to start working with the data before the technical teams can fully integrate said data into a proper data warehouse.

I can't say all of this is due to scalability. It's more due to accessibility and being lean with the data. One of the most common pain points with the RDBMS is it's too restricted, and for good reason. With solutions above, it's often opening up the floodgates to the rest of the organization to start getting dirty with the data in order to help the business understand what they need to do with the data BEFORE it goes to the data warehouse. Now, I can't speak to the reasons why some organizations skip using a data warehouse versus others, that's a bit meh, but smashing the idea of exposing the data outside of the data warehouse is a bit well, silly if you ask me.

Aaron Cutshall
Aaron Cutshall
SSCrazy Eights
SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)

Group: General Forum Members
Points: 8279 Visits: 1361
xsevensinzx - Wednesday, February 27, 2019 6:28 PM

Sigh, not necessarily. Having that SQL interface on top of the document stores allows us to have full transparency of the data as it lands on disk on the way up to a gold standard like the data warehouse. It's insanely powerful and something some RDBMS cannot offer in terms of opening doorways into the raw data BEFORE it's cleaned and processed by the ETL regardless if it's Python or SSIS. If you want to talk about being lean, this is what helps you get there.

It also offers a way to use the data BEFORE it's completely processed by the SQL/DBA/Whatever team, which can often have a long turnaround time. For example, all of my data is ingested using Python as documents to a document store. As soon as the data lands on disk, it's accessible to the end user via these SQL Interfaces you're smashing. It's also accessible to the DBA team via Polybase with EXTERNAL TABLES both with the data warehouse and all the data marts using Azure DB (which also supports EXTERNAL QUERIES to the data warehouse, which allows you to bridge the data lake). This is insanely powerful and allows users to start working with the data before the technical teams can fully integrate said data into a proper data warehouse.

I can't say all of this is due to scalability. It's more due to accessibility and being lean with the data. One of the most common pain points with the RDBMS is it's too restricted, and for good reason. With solutions above, it's often opening up the floodgates to the rest of the organization to start getting dirty with the data in order to help the business understand what they need to do with the data BEFORE it goes to the data warehouse. Now, I can't speak to the reasons why some organizations skip using a data warehouse versus others, that's a bit meh, but smashing the idea of exposing the data outside of the data warehouse is a bit well, silly if you ask me.

Please don't misunderstand me. I have no problem with the use cases you mentioned nor do I have an issue with an SQL interface on these environments. The value in getting the data loaded and investigating what's there and what can be done is quite invaluable. The issue I have is when those techniques are implemented in a production environment in lieu of a relational database when the data being loaded is relational (most come from another RDBMS) and the tools are implemented to operate as though it were an RDBMS. It was set up like a Rube-Goldberg machine with a complex set of moving parts all to avoid using a database. That appeared to be a requirement as the conversations kept circling around their "Big Data" solution. The volume of data was not that large that it couldn't have easily set in a Postgres database (assuming they had to stay in the AWS space). The approach was decided upon long before any data analysis was performed. When the Enterprise Architect and I kept asking why we couldn't just use a database instead of the complicated process and we were told that our suggestion wasn't the "Big Data" way. Time to implementation wasn't a concern because it's taken over 2 years to implement the solution using their framework.

Clearly the platform was chosen long before any analysis was done to determine the correct solution. This leads to all sorts of problems. I have witnessed a project that failed miserably because SQL Server was used for document management instead of a more appropriate platform like MongoDB. Each platform has its strengths and weaknesses, what it does well and what it does not do well. Using the wrong tool for the job only invites mediocre results at best.



...when ye are in the service of your fellow beings ye are only in the service of your God. -- Mosiah 2:17
xsevensinzx
xsevensinzx
SSC-Insane
SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)

Group: General Forum Members
Points: 24831 Visits: 6224
Aaron N. Cutshall - Wednesday, February 27, 2019 7:58 PM
The issue I have is when those techniques are implemented in a production environment in lieu of a relational database when the data being loaded is relational (most come from another RDBMS) and the tools are implemented to operate as though it were an RDBMS.

I understand that, but you have mentioned this numerous times in similar threads. The techniques has nothing to do with how good or bad the technology is nor does it mean it's the wrong tool for the job. Anyone can misuse a hammer when trying to hammer in a nail to a board. It does not mean the hammer is not a good tool or the right tool for the job.

The same also applies for other tools. I have seen plenty of people misuse their RDBMS. Does not mean it's a bad tool or the wrong tool.

I know you followed up saying similar to Steve, but just have to point that out because it seems to be a constant theme here on how people use these tools. Yes, people use them differently. Some good, some bad. The RDBMS is not the end all be all solution nor is it the only tool in the toolbox. And with that, I would go on to say that the RDBMS is still here to stay and has a critical part to these other tools for the sheer fact it does what it was designed to do so well IF the operators behind the tool know what they are doing.

crmitchell
crmitchell
SSCarpal Tunnel
SSCarpal Tunnel (4.3K reputation)SSCarpal Tunnel (4.3K reputation)SSCarpal Tunnel (4.3K reputation)SSCarpal Tunnel (4.3K reputation)SSCarpal Tunnel (4.3K reputation)SSCarpal Tunnel (4.3K reputation)SSCarpal Tunnel (4.3K reputation)SSCarpal Tunnel (4.3K reputation)

Group: General Forum Members
Points: 4285 Visits: 2403
Aaron N. Cutshall - Wednesday, February 27, 2019 8:22 AM
What really gets me is that so many of the NoSQL databases now offer an SQL interface. I have seen folks go to great lengths to avoid using a relational database yet the end result could be easily served by one. One project I was involved with was on AWS with highly structured delimited files in S3 and used Glue and SparkSQL on top of that to enable them to "query" the data. The source data were mostly from other relational databases (SQL Server, Oracle and DB2) which they dumped into S3 with a file per table. They wrote Python scripts to ETL the data into other S3 files that were also highly structured and documented with Glue so SparkSQL could query them. They did everything to avoid using PostgreSQL and RedShift yet wound up creating a very complex and convoluted system that approximated a database. Their arguments for not using an RDBMS was that it needed to be "scalable and fast" yet was really neither.

Why does it surprise you that they "now" offer an SQL interface? - remember NOSQL is short for Not Only SQL - that should make it clear why they not only may but rather must offer an SQL interface.

patrickmcginnis59 10839
patrickmcginnis59 10839
SSC-Insane
SSC-Insane (22K reputation)SSC-Insane (22K reputation)SSC-Insane (22K reputation)SSC-Insane (22K reputation)SSC-Insane (22K reputation)SSC-Insane (22K reputation)SSC-Insane (22K reputation)SSC-Insane (22K reputation)

Group: General Forum Members
Points: 22609 Visits: 8784

It was set up like a Rube-Goldberg machine with a complex set of moving parts all to avoid using a database.


Theres the subtle implication that you can't build rube Goldberg machines with SQL and I think that's a bit of a stretch. (source, personal experience)

to properly post on a forum:http://www.sqlservercentral.com/articles/61537/
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum









































































































































































SQLServerCentral


Search