Hadoop and SQL Server

  • Eric M Russell (1/15/2015)


    My point is that the fundamental difference between relational databases versus Big Data NoSQL databases is actually not technical or even philosophical; Microsoft and Oracle have crossed that bridge years ago. The difference boils down to marketing and economics which can change overnight.

    yep

  • Steve Jones - SSC Editor (1/15/2015)


    Eric M Russell (1/15/2015)


    My point is that the fundamental difference between relational databases versus Big Data NoSQL databases is actually not technical or even philosophical; Microsoft and Oracle have crossed that bridge years ago. The difference boils down to marketing and economics which can change overnight.

    yep

    I think some of us might miss a point here. We tend to look at data from the relational prospective, but reality is that the data growth in the area of web traffic, live chat, twit, IM, not to mention none textual data, etc. are way faster than the RMDB data. To query and data mining from those type of data, I believe that is the area where Hadoop will shine.

  • Hommer (1/15/2015)


    Steve Jones - SSC Editor (1/15/2015)


    Eric M Russell (1/15/2015)


    My point is that the fundamental difference between relational databases versus Big Data NoSQL databases is actually not technical or even philosophical; Microsoft and Oracle have crossed that bridge years ago. The difference boils down to marketing and economics which can change overnight.

    yep

    I think some of us might miss a point here. We tend to look at data from the relational prospective, but reality is that the data growth in the area of web traffic, live chat, twit, IM, not to mention none textual data, etc. are way faster than the RMDB data. To query and data mining from those type of data, I believe that is the area where Hadoop will shine.

    OK, if your organization's business model is data mining web traffic, chats, and tweets, then you should definately choose Hadoop over any variation of SQL Server, but even with zero licensing costs, you may struggle turning that endeavor into profit. Perhaps part of the confusion is that the NoSQL vs. Relational debate has spilled over into industries where it's not really applicable.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • I have been investigating this for some time to create a system that includes both SQL Server and Hadoop.

    The idea is not to remove SQL Server from the picture. It's a great and powerful set of tools and services for your structured relational data. Hadoop is great for the data that has no structure, is very chaotic in nature and has potential to eventually fit into the bigger picture within SQL Server. (i.e.: ETL development times are dramatically reduced with NoSQL versus SQL).

    In comes the two models: your data warehouse and your data lake. Hadoop is great for creating a data lake of dirty, chaotic data that can allow your data analyst to investigate the data without all the utilization of your SQL guys. They can chop it up, aggregate it and everything right in the box. Then when they discover something that we can use, then you can start formulating that attack plan to clean and move the data into your SQL Server (Data warehouse).

    Even better, offloading majority of your ETL system to Hadoop from SQL Server has had interesting benefits as well. But, I still have not dived that deep down the rabbit hole just yet with Hadoop and SQL Server. I just can see the benefit of both NoSQL and traditional RDBMS working together for data warehousing and deep analytics.

  • While I was doing a project at Yahoo (2009), I was able to spend some time better understanding Hadoop - and the key IMHO, is that Hadoop is a tool designed to capture and redundantly store lots of data, mostly unstructured data as fast as possible. And then create structure after the fact. Basically, what you would need if you were building a search engine.

    Many years ago, Klout used Hadoop to capture the data, and then, use it to organize the data and push it into an MS SSAS cube for analysis. again, IMHO, a wise use of the tool.

    Hadoop has some great promise, but for many, it is just the current sexy tool, being oversold, like so many other tools as the current RONCO multi-use tool. But, it will survive, once it comes back to what it was designed to do - capture lots of unstructured data and then create organization from the chaos - which you then put into a proper BI/DW.

    The more you are prepared, the less you need it.

  • Oh, and I have seen some demo's of where MS has done a great deal of work integrating Hadoop into the parallel data warehouse (old name). I would like to see them do that with the standard SQL Server.

    But I fear, that MS has lost interest in doing an real, useful enhancements to SQL Server. (just my opinion). They are killing off mirroring (bad move), failed to enhance replication to where it is the go-to tool for redundant scalability (Cassandra is built around it). and focused just on raising the price. And with the 2012 and up release, the "BI" version does not even have all of the BI features.

    They are approaching SQL Server as a cash cow, with no apparent interest in making it better.

    (again, just my opinion)

    The more you are prepared, the less you need it.

  • Andrew..Peterson (1/16/2015)


    Oh, and I have seen some demo's of where MS has done a great deal of work integrating Hadoop into the parallel data warehouse (old name). I would like to see them do that with the standard SQL Server.

    But I fear, that MS has lost interest in doing an real, useful enhancements to SQL Server. (just my opinion). They are killing off mirroring (bad move), failed to enhance replication to where it is the go-to tool for redundant scalability (Cassandra is built around it). and focused just on raising the price. And with the 2012 and up release, the "BI" version does not even have all of the BI features.

    They are approaching SQL Server as a cash cow, with no apparent interest in making it better.

    (again, just my opinion)

    2012 was a so-so release, but 2014 introduced in-memory OLTP (Heckaton) and Clustered ColumnStore, which are game changers. When it comes to bundled BI / ETL tools, there is not much competetion. Microsoft can either match or assimilate anyone in that arena.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • I really enjoyed everyone's view on Hadoop. From my development perspective this has yet to come up on any project that I have worked on or anything that I have concretely been aware about. For me, Hadoop comes across as being a popular tool for a niche requirement that is also being employed inappropriately by a small minority believing ridiculously hyped statements.

    Gaz

    -- Stop your grinnin' and drop your linen...they're everywhere!!!

  • Gary Varga (2/9/2015)


    I really enjoyed everyone's view on Hadoop. From my development perspective this has yet to come up on any project that I have worked on or anything that I have concretely been aware about. For me, Hadoop comes across as being a popular tool for a niche requirement that is also being employed inappropriately by a small minority believing ridiculously hyped statements.

    It seems to me that analyzing mega-terrabytes of unstructured data is not something that the vast majority of large corporations and local governments have a need for. Even if a need (or just a curious interest) does arise for analyzing twitter feeds or click stream traffic, it makes sense to farm the work out to some startup that specializes in that sort of thing.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

Viewing 9 posts - 16 through 23 (of 23 total)

You must be logged in to reply to this topic. Login to reply