Stairway to U-SQL Level 21: Exporting Data with Azure Data Factory

Question

Stairway to U-SQL Level 21: Exporting Data with Azure Data Factory

Mike McQuillan

SSCertifiable

Points: 6020
More actions
March 28, 2018 at 12:00 am

#410369

Comments posted to this topic are about the item Stairway to U-SQL Level 21: Exporting Data with Azure Data Factory

Viewing 4 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply

xsevensinzx One Orange Chip Points: 25560 More actions · Answer 1

Let me reedit this after going back and re-reading the article. I think the big issue for me with this solution is the fact you're reading everything from the data store into the SQL database. The data store is billed based on reads, writes, and I believe bandwidth too. If you're using data lake analytics, you also get that extra bill of processing the data when it's ready for data factory to copy. This is a huge problem with those using large datasets because you're reading data that has already been processed and being billed for it? For reference, I know it's going to invalidate data that already exists, but still has to read it to invalidate it.

In my example, data is split YYYY/MM/DD for every data source in the data store. Thus, data factory will need to base it's reads on the correct YYYY/MM/DD path in the store to prevent unnecessary reads thus unnecessary charges that could kill your pocket over time. Can this solution address that in any way? Otherwise, this will work great for small data or dimensions/match tables, not so much for large datasets, which is likely the purpose most people use data lake store for.

Mike McQuillan SSCertifiable Points: 6020 More actions · Answer 2

Hello xsevensinzx (is that a Spectrum reference?!)

You make a fair point - indeed, I'm working on a Pluralsight course at the time of writing which expounds the virtue of splitting your data into the correct grains to reduce data reads.

The point of the article was just to demonstrate that ADF can be used to export data, but I appreciate your concerns. I will modify the article to highlight that data separation is recommended to keep your costs down.

Thanks,
Mike.

xsevensinzx One Orange Chip Points: 25560 More actions · Answer 3

mike.mcquillan - Thursday, March 29, 2018 4:03 AM
Hello xsevensinzx (is that a Spectrum reference?!)
You make a fair point - indeed, I'm working on a Pluralsight course at the time of writing which expounds the virtue of splitting your data into the correct grains to reduce data reads.
The point of the article was just to demonstrate that ADF can be used to export data, but I appreciate your concerns. I will modify the article to highlight that data separation is recommended to keep your costs down.
Thanks,
Mike.

Thanks!

Have you tired the above with Data Warehouse yet using the same basic approach?