datawarehouse questions

  • We’re going to build out a data warehouse. My boss has been researching for a month or so and passed along literature on “data warehouse on ADW” amazon web services as well as “Data warehouse fast track sql server 2017”. Have any of you used either of these? Pros and cons? Are there other solutions I should look into? Thanks in advance.

    Details:

    1. We have roughly 700 6 to 7 hundred gigs or source that will eventually go to the data warehouse. Of that 70 percent comes from out internal OLTP and the other 30 from salesforce data. We are in the process of moving everything to salesforce.
    2. We want to house the DW in the cloud

    t

  • Just curious since you didn't give a lot of background...I see that your question is focused on implementation.  What is your experience level in data warehousing?

  • Well, I moved to Azure Data Warehouse over Amazon Redshift. Main reasons for that was because most of our shop is Microsoft based in meaning, we were migrating from SQL Server to whatever. Azure Data Warehouse still had the look and feel of SQL Server and pretty much used similar tools minus SSIS and so forth.

    The big thing that got me to Azure Data Warehouse was the fact it was MPP and when in paused state, retained all the data. That means, we did not have to constantly rip in the TB's of data just to analyze it like we were seeing with Redshift at the time. While paused, we only paid for storage not compute. That was a big pro for us because compute was pretty expensive if you ran it 24/7. Outside of that, running queries across N number of computers with TRUE columnstores was a huge pro too. When you get into large datasets with large computational needs, Azure DW like Redshift shine the most because you can scale up within minutes to push out a query. It's pretty nice.

    Some cons though is it does not have all the functionality of our traditional SQL Server. Mostly due to the MPP nature of the design. For example, Azure DW has limited UDF support so a lot of our string splitters were not supported. But this is a common MPP issue, not Azure specific. Same with MERGE statements and so forth when it comes to UPDATING the tables. But again, a lot of data warehouses of the size you need Azure DW for are typically not optimized or have NO UPDATE policies due to the overhead it creates. There is also no Machine Learning support like SQL Server 2017 has. This was almost a killer for us being we use our DW for data science. Then of course no real out-the-box ETL tools outside of hosting your own SSIS instance or using Data Factory, which is more of a copying tool than transformation tool. They now have SSIS on demand I do believe, but it's still in development.

    We stuck with it because there are always workarounds. You can take a different approach to splitting strings in MPP as well use other ETL tools. I just created my own entirely in Python for example. Other services like Azure Data Lake Analytics came in handy for landing data and doing a lot of the ETL too, which fed directly into Azure DW. It also supports Machine Learning scripts in Python and R. Thus far, pretty happy with the move to Azure and using Azure DW over Redshift.

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply