Stairway to U-SQL Level 21: Exporting Data with Azure Data Factory

  • Mike McQuillan

    SSCertifiable

    Points: 5986

    Comments posted to this topic are about the item Stairway to U-SQL Level 21: Exporting Data with Azure Data Factory

  • xsevensinzx

    One Orange Chip

    Points: 25551

    Let me reedit this after going back and re-reading the article. I think the big issue for me with this solution is the fact you're reading everything from the data store into the SQL database. The data store is billed based on reads, writes, and I believe bandwidth too. If you're using data lake analytics, you also get that extra bill of processing the data when it's ready for data factory to copy. This is a huge problem with those using large datasets because you're reading data that has already been processed and being billed for it? For reference, I know it's going to invalidate data that already exists, but still has to read it to invalidate it.

    In my example, data is split YYYY/MM/DD for every data source in the data store. Thus, data factory will need to base it's reads on the correct YYYY/MM/DD path in the store to prevent unnecessary reads thus unnecessary charges that could kill your pocket over time. Can this solution address that in any way? Otherwise, this will work great for small data or dimensions/match tables, not so much for large datasets, which is likely the purpose most people use data lake store for.

  • Mike McQuillan

    SSCertifiable

    Points: 5986

    Hello xsevensinzx (is that a Spectrum reference?!)

    You make a fair point - indeed, I'm working on a Pluralsight course at the time of writing which expounds the virtue of splitting your data into the correct grains to reduce data reads.

    The point of the article was just to demonstrate that ADF can be used to export data, but I appreciate your concerns. I will modify the article to highlight that data separation is recommended to keep your costs down.

    Thanks,
    Mike.

  • xsevensinzx

    One Orange Chip

    Points: 25551

    mike.mcquillan - Thursday, March 29, 2018 4:03 AM

    Hello xsevensinzx (is that a Spectrum reference?!)

    You make a fair point - indeed, I'm working on a Pluralsight course at the time of writing which expounds the virtue of splitting your data into the correct grains to reduce data reads.

    The point of the article was just to demonstrate that ADF can be used to export data, but I appreciate your concerns. I will modify the article to highlight that data separation is recommended to keep your costs down.

    Thanks,
    Mike.

    Thanks!

    Have you tired the above with Data Warehouse yet using the same basic approach?

Viewing 4 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply