Migrating Data from SQL server 2012 to Data Warehouse

  • Hi,

    What do you think is the easiest way to migrate data from Sql server to data warehouse

  • That's a gigantically open-ended question. To answer it faithfully you have to know if we're changing the data as well as moving it. Are there transformations to the structure or shape of the data as part of the move? If so, SQL Server Integration Services (or a 3rd party equivalent) is probably best. If it's just a straight copy, is it from a single source or multiple sources? Does it act as an archive of cleansed data or is it just a copy of prod? Understanding all these choices (and more) could lead to different solutions from a backup & restore to availability groups to a bulk load process. It's hard to know without details.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • I won't be changing the data that will be moved.

    It will be just a straight copy from SQL Server to a single source.

    It will be just a copy of production

  • juniorDBA13 (11/7/2016)


    I won't be changing the data that will be moved.

    It will be just a straight copy from SQL Server to a single source.

    It will be just a copy of production

    Then why not just restore the production on a different server and call it DW?

    😎

  • I agree with Eirikur. Just a restore will get the job done. If you need it near real time, I'd suggest looking at Availability Groups.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • By 'data warehouse', do you mean Azure SQL Data Warehouse, or just a generic, on-my-own-servers 'data warehouse'

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I mean Azure Data Warehouse

  • Then you have a hell of a lot of work to do first, and a lot of reading, and I can almost guarantee you that putting the production DB into Azure SQLDW without any design changes (other than the ones you'll have to make because it's slightly different to normal SQL DBs) will be an unmitigated disaster (I have a client who did that. It was.)

    Why are you looking at Azure SQLDW? How many TB does your database take up? How many billions of rows are you aggregating?

    You can't backup/restore to an Azure SQL DW. You will need to make design changes (eg identity columns). You will need to decide on the partitioning for each table (hash or round robin) and getting that wrong will make the simplest query slow. You will have to change stored procedures and queries that use T-SQL constructs that don't work in SQLDW

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Whoa!

    I'm with Gail. That brings up a WHOLE different discussion. First, you need all the information she has brought up.

    For loading an Azure SQL Data Warehouse, currently, the best approach, is to use Redgate Data Platform Studio (dataplatformstudio.com). It's actually built right into the Azure Portal. Disclaimer: I work for Redgate. It is the right way to get this done and that's supported by Microsoft. Other options are to use SSIS, although that's very slow, bcp & polybase, very fast but a lot of work, you can also use Azure Data Factory, but it's extremely painful.

    The other stuff brought up before, backup/restore, availability groups, etc., are not available because of what you're proposing with Azure SQL Data Warehouse.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • a DW is not designed to be a complete replica of all your data, well at least it shouldn't be anyway!

    The warehouse is used for storing and presenting data to reporting and analytical processes to help the business make important decisions.

    I would suggest you look into data warehousing a little more and fully understand the concepts before going any further.

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • Make sure you understand the current concurrency and workload limits before you jump in.

    sql-data-warehouse-develop-concurrency[/url]

    At just DW300, it only supports 12 Max concurrent queries. The highest at the maximum DW power is only 32.

    When one of these thresholds is met, new queries are queued and executed on a first-in, first-out basis. As a queries finishes and the number of queries and slots falls below the limits, queued queries are released.

  • Grant Fritchey (11/7/2016)


    Whoa!

    I'm with Gail. That brings up a WHOLE different discussion. First, you need all the information she has brought up.

    For loading an Azure SQL Data Warehouse, currently, the best approach, is to use Redgate Data Platform Studio (dataplatformstudio.com). It's actually built right into the Azure Portal. Disclaimer: I work for Redgate. It is the right way to get this done and that's supported by Microsoft. Other options are to use SSIS, although that's very slow, bcp & polybase, very fast but a lot of work, you can also use Azure Data Factory, but it's extremely painful.

    The other stuff brought up before, backup/restore, availability groups, etc., are not available because of what you're proposing with Azure SQL Data Warehouse.

    Is there actual pricing and licensing information about this product? I would be interested in this too for similar to the OP. Unfortunately, the product sites I saw were very marketing heavy in feature text and whatnot.

  • xsevensinzx (11/8/2016)


    Is there actual pricing and licensing information about this product?

    Yes. It's Azure so it's cloud 'pay as you use' model. You don't buy licenses, you just pay for usage.

    https://azure.microsoft.com/en-us/pricing/details/sql-data-warehouse/

    I would caution you, it's not a straightforward data warehouse system. Azure SQLDW is the cloud version of the Parallel Data Warehouse, the distributed warehouse appliance that MS and hardware dealers sell together. The PDW is priced in the millions of dollars and used for multi-terabyte, billions of rows analytics systems.

    The cloud one can scale from a fair bit smaller than PDW up to (I believe) beyond a full-rack PDW's performance.

    The T-SQL language for Azure SQLDW/PDW is slightly different to normal SQL, and the distributed nature of the system is critical. The datawarehouse design needs to work with the distributed nature of the hardware.

    It's not just a 'big SQL Server'

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • xsevensinzx (11/8/2016)


    At just DW300, it only supports 12 Max concurrent queries. The highest at the maximum DW power is only 32.

    When one of these thresholds is met, new queries are queued and executed on a first-in, first-out basis. As a queries finishes and the number of queries and slots falls below the limits, queued queries are released.

    Yup. The PDW and now SQLDW are designed for small numbers of really large queries crunching huge amounts of data.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Oh, I was referring to the Redgate Data Platform Studio. Wasn't sure if that's fully integrated into the same consumption based model or something else. The Redgate Data Platform Studio website has very limiting information. Seems that the only way to find out more is by logging into Azure through Redgate.

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply