NOSQL and RDBMSs

  • David.Poole (8/27/2015)


    It's survived and thrived so long for the simple reason that there is a solid mathematical foundation underlying relational databases and SQL.

    That is why I'm optimistic about graph databases.

    I do agree with you, in my opinion graph databases are solid as a rock in terms of foundations. Used in the right place they're extremely powerful.

  • maximilianorios (8/27/2015)


    The problem sometimes is people became fans of something and try to stick a solution into everything when it's the opposite, we need to understand the problem to find the solution.

    How very true this is! Use the right tool for the right job, but you need to understand the requirements of the job BEFORE you select a tool! Of course, to do that effectively, you need to understand the functions and applications of the tool.:-)

  • I do take exception to the comment made in the article:

    While I understand the pressure to adopt a SQL like language I do think it is a mistake to base that language too closely on SQL. I feel that basing their query language on SQL, while aiding adoption it will lead to lazy thinking by architects and developers. It will prevent the exploration of key strengths of the NOSQL platform that do not fit easily with a language based on SQL.

    I don't feel that employing an SQL-like language on NoSQL is necessarily a bad thing in of itself. If the data is organized in a relational model and lends itself to set-based operations, then it doesn't necessarily matter how the functions are performed under the hood. A NoSQL database will not be as optimal as an RDBMS for relational data just like an RDBMS is not optimal for non-relational data, but that does not undermine the concept and approach. Again, it all comes down to using the right tool for the right job.

    The point I was trying to make was that by using and extremely close relation of SQL on the NOSQL store you almost encourage people to use the wrong tool for the wrong job.

  • David.Poole (8/27/2015)


    I do take exception to the comment made in the article:

    While I understand the pressure to adopt a SQL like language I do think it is a mistake to base that language too closely on SQL. I feel that basing their query language on SQL, while aiding adoption it will lead to lazy thinking by architects and developers. It will prevent the exploration of key strengths of the NOSQL platform that do not fit easily with a language based on SQL.

    I don't feel that employing an SQL-like language on NoSQL is necessarily a bad thing in of itself. If the data is organized in a relational model and lends itself to set-based operations, then it doesn't necessarily matter how the functions are performed under the hood. A NoSQL database will not be as optimal as an RDBMS for relational data just like an RDBMS is not optimal for non-relational data, but that does not undermine the concept and approach. Again, it all comes down to using the right tool for the right job.

    The point I was trying to make was that by using and extremely close relation of SQL on the NOSQL store you almost encourage people to use the wrong tool for the wrong job.

    My understanding is that Hadoop's implementation of SQL (HiveQL / HCatalog) is only an abstraction layer that translates to MapReduce and physical file names at runtime. MapReduce itself fits prefectly the HDFS model, that being a loosely assoicated archive of non-conformed and unstructured data files. However, MapReduce is not an innovation in terms of usability for business users or anyone else who is not a Java / data science geek. It's sort of like doctors and pharmacists using Latin terminology, and then having another set of "regular" terms for lay people to easily grasp an use in their native language.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • In the RDBMS camp I can understand the attraction of being able to accept and process JSON and absorb other NOSQL type functionality but isn’t this precisely the one-size-fits-all approach to solutions that NOSQL solutions originally identified as a mistake?

    I believe it's a matter of choosing the right tool for the right job.

    JSON and XML are essential data formats for the exchange of information via web services. If an application must consume a web service, but the end-point data repository is an RDBMS, then the web service (or some piece of middleware) must itself either translate between relational and hierarchic representations of the data -- or the RDBMS must do so. The attraction to the developer of NoSQL solutions in this particular use case becomes readily apparent, I would suggest.

  • David.Poole (8/27/2015)


    The point I was trying to make was that by using and extremely close relation of SQL on the NOSQL store you almost encourage people to use the wrong tool for the wrong job.

    I can understand the concern, but it's more of an issue with the application of the tool than the tool itself. The same could be said for MDX queries as they're similar enough to be either comforting or confusing depending upon your perspective. The tool should reflect the intended purpose and application. In similar situations, function dictates form and often times similar solutions will arise. As a paradigm for handling relational data, SQL has functioned well for decades. I foresee some extensions to it to handle unique processes within NoSQL environments, but don't see a need for a complete paradigm shift.

  • Hive and Pig both generate map reduce code under the hood. Where optimisation is required you can grab the output code and adjust it to your needs.

    Hive has something called Serdes (Serializer/Deserializer). These tell it how to talk to different file formats. In theory you could write a serde that would enable you to use SQL to query a JPEG file for the XIFF information.

    I believe that a standard installation of Hive comes with serde for delimited files, XML and JSON.

    The use of SQOOP enables a RDBMS to be read and the data distributed on HDFS with the structural metadata captured as Hive metadata. To all intensive purposes it provides a means of dumping data from traditional SQL sources and maintaining the structure and familiar language.

    SparkSQL is probably a better bet. I believe it can use the HIVE metadata but does not rely on mapreduce. In effect it builds an execution plan and then chooses to execute it which gives a big performance boost. A lot of work has gone into SparkSQL to make it ANSI compliant SQL.

  • David.Poole (8/27/2015)


    Hive and Pig both generate map reduce code under the hood. Where optimisation is required you can grab the output code and adjust it to your needs.

    Hive has something called Serdes (Serializer/Deserializer). These tell it how to talk to different file formats. In theory you could write a serde that would enable you to use SQL to query a JPEG file for the XIFF information.

    I believe that a standard installation of Hive comes with serde for delimited files, XML and JSON.

    The use of SQOOP enables a RDBMS to be read and the data distributed on HDFS with the structural metadata captured as Hive metadata. To all intensive purposes it provides a means of dumping data from traditional SQL sources and maintaining the structure and familiar language.

    SparkSQL is probably a better bet. I believe it can use the HIVE metadata but does not rely on mapreduce. In effect it builds an execution plan and then chooses to execute it which gives a big performance boost. A lot of work has gone into SparkSQL to make it ANSI compliant SQL.

    That's ideally the hurdle I feel many RDBMS people face. Having to write complex Java in order to absorb a JSON file into Hive. It's the same as writing complex SQL to do the same, but the point is, RDBMS already know SQL. Having to learn a OOL in order to get data from point A to point B is a daunting task that requires retraining or new hires.

    I personally enjoy working with NoSQL (Hadoop). But, coming from using primarily SQL Server to jumping right into Hadoop has been a daunting task due to all the components currently available and coming down the pike as well the expansion of knowledge you have to undertake just to leverage the toolset correctly.

  • xsevensinzx (8/28/2015)


    David.Poole (8/27/2015)


    Hive and Pig both generate map reduce code under the hood. Where optimisation is required you can grab the output code and adjust it to your needs.

    Hive has something called Serdes (Serializer/Deserializer). These tell it how to talk to different file formats. In theory you could write a serde that would enable you to use SQL to query a JPEG file for the XIFF information.

    I believe that a standard installation of Hive comes with serde for delimited files, XML and JSON.

    The use of SQOOP enables a RDBMS to be read and the data distributed on HDFS with the structural metadata captured as Hive metadata. To all intensive purposes it provides a means of dumping data from traditional SQL sources and maintaining the structure and familiar language.

    SparkSQL is probably a better bet. I believe it can use the HIVE metadata but does not rely on mapreduce. In effect it builds an execution plan and then chooses to execute it which gives a big performance boost. A lot of work has gone into SparkSQL to make it ANSI compliant SQL.

    That's ideally the hurdle I feel many RDBMS people face. Having to write complex Java in order to absorb a JSON file into Hive. It's the same as writing complex SQL to do the same, but the point is, RDBMS already know SQL. Having to learn a OOL in order to get data from point A to point B is a daunting task that requires retraining or new hires.

    SQL Server 2014 introduced Polybase just for reasons like that

    Gerald Britton, Pluralsight courses

  • g.britton (8/28/2015)


    xsevensinzx (8/28/2015)


    ...

    SQL Server 2014 introduced Polybase just for reasons like that

    SQL Server has been tightly integrated with various forms of NoSQL data storage and query access for decades. Analysis Services (OLAP Cubes), Text Search, XML datatype with XPath query, FileStream, BLOB storage are all non-relational forms of storage with alternative non-SQL query access methods.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Eric M Russell (8/28/2015)


    g.britton (8/28/2015)


    xsevensinzx (8/28/2015)


    ...

    SQL Server 2014 introduced Polybase just for reasons like that

    SQL Server has been tightly integrated with various forms of NoSQL data storage and query access for decades. Analysis Services (OLAP Cubes), Text Search, XML datatype with XPath query, FileStream, BLOB storage are all non-relational forms of storage with alternative non-SQL query access methods.

    True, but not at all on topic. Polybase allows you to write standard SQL to access non-relational data of all types using standard SQL. SQL programmers don't have to learn or use Java to use Hadoop, Map/Reduce etc. to get at the non-rel data and combine it with RDBMS data.

    Gerald Britton, Pluralsight courses

  • g.britton (8/28/2015)


    Eric M Russell (8/28/2015)


    g.britton (8/28/2015)


    xsevensinzx (8/28/2015)


    ...

    SQL Server 2014 introduced Polybase just for reasons like that

    SQL Server has been tightly integrated with various forms of NoSQL data storage and query access for decades. Analysis Services (OLAP Cubes), Text Search, XML datatype with XPath query, FileStream, BLOB storage are all non-relational forms of storage with alternative non-SQL query access methods.

    True, but not at all on topic. Polybase allows you to write standard SQL to access non-relational data of all types using standard SQL. SQL programmers don't have to learn or use Java to use Hadoop, Map/Reduce etc. to get at the non-rel data and combine it with RDBMS data.

    Can Polybase allow SQL access to Analysis Services (as opposed to MDX) and XML documents (as opposed to XPath/XQuery)?

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Eric M Russell (8/28/2015)


    g.britton (8/28/2015)


    Eric M Russell (8/28/2015)


    g.britton (8/28/2015)


    xsevensinzx (8/28/2015)


    ...

    SQL Server 2014 introduced Polybase just for reasons like that

    SQL Server has been tightly integrated with various forms of NoSQL data storage and query access for decades. Analysis Services (OLAP Cubes), Text Search, XML datatype with XPath query, FileStream, BLOB storage are all non-relational forms of storage with alternative non-SQL query access methods.

    True, but not at all on topic. Polybase allows you to write standard SQL to access non-relational data of all types using standard SQL. SQL programmers don't have to learn or use Java to use Hadoop, Map/Reduce etc. to get at the non-rel data and combine it with RDBMS data.

    Can Polybase allow SQL access to Analysis Services (as opposed to MDX) and XML documents (as opposed to XPath/XQuery)?

    Not the purpose. Polybase is for querying non-relational data stored by other systems (e.g. Hadoop) as linked servers using T-SQL. however, since Hadoop can store XML documents, the real answer is "yes" to XML.

    Gerald Britton, Pluralsight courses

  • There is a balance to be made when utilising the right tool for the job together with ensuring supportability by a finite team. Expand the tool set too much and there can be a loss of ability to support everything whereas too little and you start to employ technology inappropriately.

    Quite often there is a compromise to be made. Technologies can be phased out, support outsourced or the use of an existing technology that is good enough but not ideal.

    Gaz

    -- Stop your grinnin' and drop your linen...they're everywhere!!!

  • A better term for "NoSQL" databases would be "ACID-ish" databases.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

Viewing 15 posts - 16 through 29 (of 29 total)

You must be logged in to reply to this topic. Login to reply