Iterating Over Calculated Tables using Power Query

  • Gouri Shankar

    SSC-Addicted

    Points: 414

    Comments posted to this topic are about the item Iterating Over Calculated Tables using Power Query

  • Jeff Moden

    SSC Guru

    Points: 996452

    I can't comment much about your good article other than it seems well laid out and produces the correct order of revelation.  So nice job there.

    I'll also state that I don't know much about Power Query but this process seems awfully complicated to produce the end result.  It seems to me that doing such a thing wouldn't take much in the form of some well written T-SQL.  In this case, what is the advantage of using Power Query over such a bit of T-SQL?

    And, no... I'm not busting chops here or trolling to start a feudal war.  I don't know much about Power Query outside of what I've gleened from this article and I really would like to know what the advantage is because I'm just not seeing it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Forum FAQ

  • Jeff Moden

    SSC Guru

    Points: 996452

    Ok... I got cricket's on that question... Here's what I'm talking about...

    In the article, it first shows Step 1 as a T-SQL query to get the data "into" the Power BI environment.

    Next, it shows how to create a couple of columns, one of them being a ranking column and you need to know another "language" to do it, not to mention a whole bunch of clicking around in menus to get the job done.  I can understand the power provided in being able to easily filter things but why not go "full monty" on the SQL side of things?  Like this...

    WITH
    ctePreAgg AS
    (
    SELECT terr.SalesTerritoryRegion
    ,pcat.EnglishProductCategoryName
    ,SalesAmount = SUM(SalesAmount)
    FROM dbo.FactResellerSales sale
    JOIN dbo.DimProduct prod ON prod.ProductKey = sale.ProductKey
    JOIN dbo.DimProductSubCategory scat ON scat.ProductSubcategoryKey = prod.ProductSubcategoryKey
    JOIN dbo.DimProductCategory pcat ON pcat.ProductCategoryKey = scat.ProductCategoryKey
    JOIN dbo.DimSalesTerritory terr ON terr.SalesTerritoryKey = sale.SalesTerritoryKey
    GROUP BY terr.SalesTerritoryRegion
    ,pcat.EnglishProductCategoryName
    )
    SELECT *
    ,RegionProductBestRank = RANK() OVER (PARTITION BY SalesTerritoryRegion ORDER BY SalesAmount DESC)
    ,RegionProductWorstRank = RANK() OVER (PARTITION BY SalesTerritoryRegion ORDER BY SalesAmount ASC)
    ,RegionProductPctSales = CONVERT(DECIMAL(5,2),SalesAmount / SUM(SalesAmount) OVER (PARTITION BY SalesTerritoryRegion) * 100)
    ,RegionTotalSales = SUM(SalesAmount) OVER (PARTITION BY SalesTerritoryRegion)
    ,RegionProductPctTotalSales = CONVERT(DECIMAL(5,2),SalesAmount / SUM(SalesAmount) OVER () * 100.0)
    ,ProductPctTotalSales = CONVERT(DECIMAL(5,2),SUM(SalesAmount) OVER (PARTITION BY EnglishProductCategoryName) / SUM(SalesAmount) OVER () * 100)
    ,TotalSales = SUM(SalesAmount) OVER ()
    FROM ctePreAgg
    ;

    ...which you could even dump into a "preaggregated" table during whatever ETL process you use to keep your DW up to date.

    That's the other thing I'll never understand about DWs... people spend a shedload of data creating denormalized data for reporting but the preaggregation done to support the denormalization is paltry compared to the needs of the reporting and a ton of people end up running the same reports against some pretty long tables instead of hitting some thoughtfully constructed preaggregated tables.

    Yep... people will say that you cannot anticipate what people want to seen until they see it but, lordy, once they do see it, why not do the preaggregation so that when a thousand people are all banging on the same reports with different criteria so that poor little DW box doesn't have to work so hard and people can get their reports more quickly?

    If you went all the way and only supplied preaggregated tables, think of how little data you might actually have to transfer during your DW ETL processes.

     

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Forum FAQ

  • Gouri Shankar

    SSC-Addicted

    Points: 414

    @jeff-moden, Yes, I agree with you that the steps I'm trying to get through Power Query can also be done with T-SQL. The main purpose of this article was to throw some light on Power Query techniques and how to implement the use case using Power Query. Of course, there are people who might not find the article interesting, but that is not the goal. And, there is no harm if we learn a new language, for our interests, whatever small a use case it might be.

    Thanks for the comments, anyway.

  • Jeff Moden

    SSC Guru

    Points: 996452

    Gouri Shankar wrote:

    @jeff-moden, Yes, I agree with you that the steps I'm trying to get through Power Query can also be done with T-SQL. The main purpose of this article was to throw some light on Power Query techniques and how to implement the use case using Power Query. Of course, there are people who might not find the article interesting, but that is not the goal. And, there is no harm if we learn a new language, for our interests, whatever small a use case it might be.

    Thanks for the comments, anyway.

    Oh... not to worry... I absolutely understand and agree with the purpose of the article.  Your article on the subject was good and provided a good introduction.  My comments were a bit of a hi-jack rant based on personal observations of what people end up doing in general.

    I also wanted to express to people that learning or using the old ways can quickly and greatly augment the new ways especially if you need something quickly and don't have the time to actually learn a "new language" to get the job done.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Forum FAQ

  • Jeffrey Williams

    SSC Guru

    Points: 88448

    Jeff Moden wrote:

    Gouri Shankar wrote:

    @jeff-moden, Yes, I agree with you that the steps I'm trying to get through Power Query can also be done with T-SQL. The main purpose of this article was to throw some light on Power Query techniques and how to implement the use case using Power Query. Of course, there are people who might not find the article interesting, but that is not the goal. And, there is no harm if we learn a new language, for our interests, whatever small a use case it might be.

    Thanks for the comments, anyway.

    Oh... not to worry... I absolutely understand and agree with the purpose of the article.  Your article on the subject was good and provided a good introduction.  My comments were a bit of a hi-jack rant based on personal observations of what people end up doing in general.

    I also wanted to express to people that learning or using the old ways can quickly and greatly augment the new ways especially if you need something quickly and don't have the time to actually learn a "new language" to get the job done.

    One of the benefits to pulling the non-aggregated data into Power BI is that Power BI can then generate the analytics for you based on the relationships you define in Power BI.  This can be much simpler than building the code in SQL to accommodate every level of summary data needed.  This allows for drill-down and drill-through reporting without additional coding in simple to implement charts.

    Of course - you do have to be mindful of how much data you are including...and whether that data is necessary.  I had one report in Power BI where the original developer was pulling in the full list of all providers from the NPI data feed (over 6 million rows) that caused the file to bloat out to several hundred MBs.  Power BI then attempted to create analytics across all 6 million providers...

    Reducing the provider data to only those providers included in the system - the file was reduced to less than 100KB and performance was improved drastically.

     

    Jeffrey Williams
    Problems are opportunities brilliantly disguised as insurmountable obstacles.

    How to post questions to get better answers faster
    Managing Transaction Logs

Viewing 6 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply