Comments posted to this topic are about the item Stairway to U-SQL Level 21: Exporting Data with Azure Data Factory
One Orange Chip
Let me reedit this after going back and re-reading the article. I think the big issue for me with this solution is the fact you're reading everything from the data store into the SQL database. The data store is billed based on reads, writes, and I believe bandwidth too. If you're using data lake analytics, you also get that extra bill of processing the data when it's ready for data factory to copy. This is a huge problem with those using large datasets because you're reading data that has already been processed and being billed for it? For reference, I know it's going to invalidate data that already exists, but still has to read it to invalidate it.
In my example, data is split YYYY/MM/DD for every data source in the data store. Thus, data factory will need to base it's reads on the correct YYYY/MM/DD path in the store to prevent unnecessary reads thus unnecessary charges that could kill your pocket over time. Can this solution address that in any way? Otherwise, this will work great for small data or dimensions/match tables, not so much for large datasets, which is likely the purpose most people use data lake store for.
Hello xsevensinzx (is that a Spectrum reference?!)
You make a fair point - indeed, I'm working on a Pluralsight course at the time of writing which expounds the virtue of splitting your data into the correct grains to reduce data reads.
The point of the article was just to demonstrate that ADF can be used to export data, but I appreciate your concerns. I will modify the article to highlight that data separation is recommended to keep your costs down.
One Orange Chip
mike.mcquillan - Thursday, March 29, 2018 4:03 AM
Have you tired the above with Data Warehouse yet using the same basic approach?
Viewing 4 posts - 1 through 3 (of 3 total)