The New Data Warehouse Choice

  • Comments posted to this topic are about the item The New Data Warehouse Choice

  • Data warehouse in the cloud has to be thought through very carefully.

    I think we've done the security concerns to death so enough said on that subject.

    RedShift is a column store database as is HP Vertica. These are fantastic for Kimball style data models but struggle with a highly normalised model that you would use to marshall data from many disparate sources.

    Data lakes allow you to pump stuff in at huge scale and speed. They don't necessarily have the facilities to enforce good data practices. They facilitate the"chuck it over the wall" mentality. You have to do a lot of work that you probably didn't expect if you are going down the data lake route.

    Then you discover that data has gravity and you've just put the data equivalent of Jupiter in the cloud. Pretty soon your BI solution will get sucked into the cloud due to data gravity. Your ETL solution will be torn in two. A whole plethora of other satellite stuff will become necessary. Much of this will come as no surprise to an infrastructure engineer. There is a huge amount of infrastructure behind day to day IT operations and unless your cloud provider supplies the whole kit and caboodle on a SaaS basis you are going to discover precisely what it is that infrastructure guys do.

  • My back of the napkin calculations always show cloud solutions as waaaaay more expensive than on-premise solutions (if you're going to use the systems for at least a year). I don't get how the cloud works financially for any organization except large enterprises who can afford multi-thousand dollar a month bills. If I can put SQL Server on a box for say $40k (hardware and software), it will last years. If I'm paying $4k a month for the same compute, my ROI is 10 months. Sure, there is some staffing to support it, but those same staff would support a cloud installation. What am I missing (aside from the cloud kool-aid)?

  • The advantage of the cloud if two fold. First, you move from capex to opex. Does that matter? Ask your financial people. Sometimes it does. The RIO isn't 10 months, but something longer.

    The second thing is that you need to have a variable workload. If I need to spend $40k for hardware to handle people load, but really my average load could be handled by $20k, then I've spend $20k unnecessarily.

    However, maybe that lower level is $2k in the cloud per month, with some bursts to $2200 since I only go up for days at a time. In that case, my payback is longer. Plus, the $40k has more admin costs, replacement costs, power, etc. that I don't spend. That's the payback.

    That $40k also gets spent again when you upgrade.

    If you have a steady state of performance you need, then I think the cloud is really, really hard to justify. Unless you're starting from scratch and just expense everything, maybe that makes sense.

  • dadavis (7/15/2016)


    My back of the napkin calculations always show cloud solutions as waaaaay more expensive than on-premise solutions (if you're going to use the systems for at least a year). I don't get how the cloud works financially for any organization except large enterprises who can afford multi-thousand dollar a month bills. If I can put SQL Server on a box for say $40k (hardware and software), it will last years. If I'm paying $4k a month for the same compute, my ROI is 10 months. Sure, there is some staffing to support it, but those same staff would support a cloud installation. What am I missing (aside from the cloud kool-aid)?

    I agree with Steve's points (above) that addresses buying excess hardware to cover an occasional spike in demand - that does make sense for cloud. On the other hand, I'd have to say that the general thought about cloud costs is that they are too high. At least for Azure. Amazon might be less expensive.

    And the infrastructure points were on target. Integration can be difficult with on-premises systems. And now you need to add a layer of on-prem to cloud. Unless you are all cloud.

    The more you are prepared, the less you need it.

  • In the Amazon world if you are prepared to commit to a time period you can reserve instances at a substantial reduction.

    With Amazon RDS instances the operational DBA work goes away. Resilience is a check box item as is backup.

    You can volunteer for DBA work by not using RDS instances but that would seem to be missing the point.

    If you have rack space in your data centre then it may be cheaper to host your own. I have seen situations where installing a new DB server meant major reorganisation in the data centre with new switches, cabling added to the labour bill and with a SAN upgrade threatened

  • The way I look at it is ease of use in the cloud than on-premise. It's extremely easy to launch and go in the cloud. Not so much on-premise in certain situations like a larger organization where you have to wait for instances, services and so forth to be carved out, spun up, configured and given the approval to use.

    The other thing that bothers me is how far removed we are becoming from the hardware. Virtualized disk, virtualized processing, virtualized machines. Sometimes we cannot actually see everything behind the curtains. What's causing the LUN to have high latency? Who else is on the host? Is it the database? Is it you? Etc...

  • xsevensinzx (7/16/2016)


    The way I look at it is ease of use in the cloud than on-premise. It's extremely easy to launch and go in the cloud. Not so much on-premise in certain situations like a larger organization where you have to wait for instances, services and so forth to be carved out, spun up, configured and given the approval to use.

    That's a business issue, not on-prem or cloud. We (Redgate) have tools to spin up some instances ourselves, without IT involvement. It's not ubiquitous, but it's very nice. I worked in a company in 2001 that had this. Salespeople could spin up instances (really a 6 instance group) without IT. IT had to monitor and make sure the teardown worked, but that was a cost audit item, not necessarily something we couldn't have automated. Or maybe they did after I left.

  • That's a dilemma. On one hand you want people empowered by technology, on the other hand it's the mother of all security nightmares.

    It is orders of magnitude worse than the invisible mission critical spreadsheet that really runs your organisation

  • David.Poole (7/18/2016)


    That's a dilemma. On one hand you want people empowered by technology, on the other hand it's the mother of all security nightmares.

    It is orders of magnitude worse than the invisible mission critical spreadsheet that really runs your organisation

    For Azure security, I was reading some posts that indicated that for systems in Azure, there was no way to keep it separate, in a network sense, from all the other Azure systems. it suggested that a VM in Azure could target all the other VM's. [I'm not a network guy, so forgive the lack of using the correct terms.]

    The more you are prepared, the less you need it.

  • Steve Jones - SSC Editor (7/17/2016)


    xsevensinzx (7/16/2016)


    The way I look at it is ease of use in the cloud than on-premise. It's extremely easy to launch and go in the cloud. Not so much on-premise in certain situations like a larger organization where you have to wait for instances, services and so forth to be carved out, spun up, configured and given the approval to use.

    That's a business issue, not on-prem or cloud. We (Redgate) have tools to spin up some instances ourselves, without IT involvement. It's not ubiquitous, but it's very nice. I worked in a company in 2001 that had this. Salespeople could spin up instances (really a 6 instance group) without IT. IT had to monitor and make sure the teardown worked, but that was a cost audit item, not necessarily something we couldn't have automated. Or maybe they did after I left.

    Oh, I don't know why it's not automated, but then again maybe it is. Either way, most of the time we are reconfiguring, especially when it comes to SQL Server.

    That doesn't change the concerns of virtualization with databases. Not being able to optimize the disk and so forth is a true pain.

  • I keep hearing so many folks are putting stuff in the cloud. Here they have just approved moving our Oracle data warehouse to RedShift. Many other like apps will also be moving to Amazon and Azure. I still question how our DWH on Exadata is going to perform in Redshift but they did a POC and it is lightening fast. Time will tell.

  • Markus (7/19/2016)


    I keep hearing so many folks are putting stuff in the cloud. Here they have just approved moving our Oracle data warehouse to RedShift. Many other like apps will also be moving to Amazon and Azure. I still question how our DWH on Exadata is going to perform in Redshift but they did a POC and it is lightening fast. Time will tell.

    Be interested to know how things work. Especially in whether the POC was realistic in data size.

    Also, I think it's performance + cost, not just performance. There are tradeoffs to be made, and depending on your workload and users, this may or may not make sense.

  • Steve Jones - SSC Editor (7/19/2016)


    Markus (7/19/2016)


    I keep hearing so many folks are putting stuff in the cloud. Here they have just approved moving our Oracle data warehouse to RedShift. Many other like apps will also be moving to Amazon and Azure. I still question how our DWH on Exadata is going to perform in Redshift but they did a POC and it is lightening fast. Time will tell.

    Be interested to know how things work. Especially in whether the POC was realistic in data size.

    Also, I think it's performance + cost, not just performance. There are tradeoffs to be made, and depending on your workload and users, this may or may not make sense.

    Steve, the benefit they see is easy to scale up when demand is required... like acquiring another brand... new large app comes online..etc... the 'cost savings' will be eliminating an older Oracle system and those licenses plus eliminating 8 SQL Servers. They say they did a realistic POC with a good chunk of data.... The other benefit is taking the data directly from our POS systems in the stores and directly putting them into Redshift. Today there is an about 2 hour delay in that process before it arrives in our DWH systems. That will be cut to 15 minutes or less by directly loading it in and eliminate a lot of hardware/licenses.

  • Also, it does seem that with the cloud that some, only some, expertise can be outsourced. This is particularly helpful for smaller enterprises.

    Gaz

    -- Stop your grinnin' and drop your linen...they're everywhere!!!

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply