Why PolyBase matters - Part 2

  • APSolutely

    Old Hand

    Points: 390

    Comments posted to this topic are about the item Why PolyBase matters - Part 2

  • curious_sqldba

    SSC-Dedicated

    Points: 36266

    Nice article :).

    You have mentioned your data transfer from PolyBase to Azure DW was 300GB/min? That is insane !!! .I am very curious to know how did you move data at that rate from your local premise to cloud at that rate.

  • APSolutely

    Old Hand

    Points: 390

    Thanks curious_sqldba.

    That 300GB/min refers to the speed at which I was capable of loading my data that was already in the Azure Blob storage as "unstructured" delimited text documents into the distributed schema bound tables of the Azure Data Warehouse using PolyBase. Not the speed at which I was able to upload my data from on-prem to Azure.

    Given the nature of the Azure Data Warehouse and how workers are assigned to groups of distributions (always 60 storage distributions for Azure Data Warehouse) , so the more DWU units you scale out your ADW to the faster you can load, to a maximum of 6000 DWU which is essentially 1 worker per storage distribution. The parallelized loading mechanism of PolyBase for both the on premise and Azure Data Warehouse means the more compute per strorage you have the faster you can theoretically load data in to the appliance or Azure Data Warehouse.

    However in terms of the actual testing we did, we were running on a 1GB/s fibre connection to the internet. So our upload speed from on-prem to Azure was still very impressive but these loads were mostly done late at night and I was able to transfer a ~250GB (clustered column store compressed) database from on-prem to Azure in under an hour. Which was perfectly acceptable, which means once a day during quiet time we could schedule an upload to ensure all our data on Azure Blob was up to date.

    This methodology of using PolyBase to convert our databases from structured to "unstructured" text files in Azure also has the added benefit that it does not make our databases inaccessible like the backups in APS do while they are running.

    Hope this clears that up for you.

  • akljfhnlaflkj

    SSC Guru

    Points: 76202

    Thanks for the two good articles.

Viewing 4 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply