How can I improve performance of this query further?

  • Hi,

    I'm a DW/ETL developer and I've created a data load process to load an aggregated table in my data warehouse. One of the joins in the query contains a 'BETWEEN' operator on DateKey (an integer representation of date in format YYYYMMDD) and according to my execution plan, the clustered index seek takes up 43% of the query cost. I don't usually use a BETWEEN in a join if I can avoid it, so I'm not sure if this is actually the issue or not, but I'm keen to improve the performance of this query as it's causing my aggregated table load to run very slowly.

    I've attached the execution plan.

    The table with the slow index is defined as below.
    CREATE TABLE [dbo].[tblCostPrices](
        [StoreKey] [INT] NOT NULL,
        [ProductKey] [INT] NOT NULL,
        [FromDateKey] [INT] NOT NULL,
        [ToDateKey] [INT] NOT NULL,
        [PriceLevel] [INT] NOT NULL,
        [CostPrice] [FLOAT] NOT NULL
    ) ON [PRIMARY]

    GO

    CREATE UNIQUE CLUSTERED INDEX [CL_CostPrices] ON [dbo].[tblCostPrices]
    (
        [StoreKey] ASC,
        [ProductKey] ASC,
        [PriceLevel] ASC,
        [FromDateKey] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
    GO

    Anybody got any advice?

    Thanks.

  • The issue is that you're selecting all the rows. To be able to take full advantage of an index, you need to limit the results.
    You're obfuscating the query by using a view and you attached an estimated plan instead of an actual plan.
    The BETWEEN is usually not a performance problem.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • What is the query?  In your execution plan all I see is this:

    select vss.* from aggr.vfactSaleSummary as vss

  • Sorry I'd forgotten I'd used views. The vfactSalesSummary view is:
    SELECT  StoreKey
         , EmployeeKey
         , EmployeeClassKey
         , TillKey
         , DateKey
         , Hour
         , Quarter
         , SUM(TransactionCount) AS TransactionCount
         , SUM(Quantity) AS Quantity
         , SUM(SalesInclVAT) AS SalesInclVAT
         , SUM(SalesVAT) AS SalesVAT
         , SUM(SalesExclVAT) AS SalesExclVAT
         , SUM(Cost) AS Cost
         , SUM(DiscountInclVAT) AS DiscountInclVAT
         , SUM(DiscountExclVAT) AS DiscountExclVAT
         , SUM(LoyaltyCardQuantity) AS LoyaltyCardQuantity
         , SUM(LoyaltyCardAmount) AS LoyaltyCardAmount
         , SUM(DrinksLoyaltyAmount) AS DrinksLoyaltyAmount
         , SUM(NewspapersLoyaltyAmount) AS NewspapersLoyaltyAmount
         , SUM(ExtrasLoyaltyAmount) AS ExtrasLoyaltyAmount
         , SUM(NegativeTransactionCount) AS NegativeTransactionCount
         , SUM(ZeroValueTransactionCount) AS ZeroValueTransactionCount
         , SUM(UnderFiftyPenceTransactionCount) AS UnderFiftyPenceTransactionCount
         , SUM(UnderOnePoundTransactionCount) AS UnderOnePoundTransactionCount
         , SUM(UnderFivePoundsTransactionCount) AS UnderFivePoundsTransactionCount
         , SUM(UnderTenPoundsTransactionCount) AS UnderTenPoundsTransactionCount
         , SUM(UnderFifteenPoundsTransactionCount) AS UnderFifteenPoundsTransactionCount
         , SUM(UnderTwentyPoundsTransactionCount) AS UnderTwentyPoundsTransactionCount
         , SUM(OverTwentyPoundsTransactionCount) AS OverTwentyPoundsTransactionCount
         , SUM(UnderOneDollarFiftyTransactionCount) AS UnderOneDollarFiftyTransactionCount
         , SUM(UnderTwoTransactionCount) AS UnderTwoTransactionCount
         , SUM(ReturnsQuantity) AS ReturnsQuantity
         , SUM(ReturnSalesInclVAT) AS ReturnSalesInclVAT
         , SUM(ReturnSaleExclVAT) AS ReturnSaleExclVAT
         , SUM(FoodOnlyTransaction) AS FoodOnlyTransaction
         , SUM(DrinksAmount) AS DrinksAmount
         , SUM(HotAndIcedDrinksAmount) AS HotAndIcedDrinksAmount
         , SUM(NewspapersAmount) AS NewspapersAmount
         , SUM(FoodAmount) AS FoodAmount
         , SUM(FoodQuantity) AS FoodQuantity
         , SUM(CustomerEatIn) AS CustomerEatIn
         , SUM(CustomerTakeaway) AS CustomerTakeaway
         , SUM(CustomerCount) AS CustomerCount
      FROM  aggr.vfactSalesSummaryStage
      GROUP BY StoreKey
         , EmployeeKey
         , EmployeeClassKey
         , TillKey
         , DateKey
         , Hour
         , Quarter;

    and that refers to another view where the meat happens:

    SELECT  fsd.TransactionReference
         , fsd.StoreKey
         , fsd.EmployeeKey
         , fsd.EmployeeClassKey
         , fsd.TillKey
         , fsd.DateKey
         , DATEPART(hh, fsd.TransactionTime) AS Hour
         , FLOOR(DATEPART(mi, fsd.TransactionTime) / 15) * 15 AS Quarter
         , 1 AS TransactionCount
         , SUM(CASE WHEN fsd.VoidFlag = 1 THEN -1
             ELSE 1
           END * CASE WHEN fsd.TransactionTypeKey = 1
                AND fsd.ReturnFlag = 0 THEN 1
               ELSE 0
             END) AS Quantity
         , SUM(ISNULL(fsd.SalesInclVAT, 0)) AS SalesInclVAT
         , SUM(ISNULL(fsd.SalesVAT, 0)) AS SalesVAT
         , SUM(ISNULL(fsd.SalesInclVAT, 0) - ISNULL(fsd.SalesVAT, 0)) AS SalesExclVAT
         , SUM(CASE WHEN fsd.VoidFlag = 1 THEN -1
             ELSE 1
           END
           * CASE WHEN fsd.TransactionTypeKey = 1
               AND fsd.ReturnFlag = 0
             THEN ISNULL(tcp.CostPrice, 0)
             ELSE 0
            END) AS Cost
         , SUM(CASE WHEN fsd.TransactionTypeKey = 2
             THEN ISNULL(fsd.SalesInclVAT, 0) * -1
             ELSE 0
           END) AS DiscountInclVAT
         , SUM(CASE WHEN fsd.TransactionTypeKey = 2
             THEN (ISNULL(fsd.SalesInclVAT, 0)
               - ISNULL(fsd.SalesVAT, 0)) * -1
             ELSE 0
           END) AS DiscountExclVAT
                    -- Loyalty Cards
         , SUM(CASE WHEN dd.DiscountDesc = 'Loyalty Card'
              AND dp.MajorGroupName <> 'Drink Extra' THEN 1
             ELSE 0
           END) AS LoyaltyCardQuantity
         , SUM(CASE WHEN dd.DiscountDesc = 'Loyalty Card'
             THEN ISNULL(fsd.SalesInclVAT, 0) * -1
             ELSE 0
           END) AS LoyaltyCardAmount
                    -- Loyalty amounts
         , SUM(CASE WHEN dd.DiscountDesc = 'Loyalty Card'
              AND LEFT(dp.MajorGroupName, 5) = 'Drink'
              AND dp.MajorGroupName <> 'Drink Extra'
             THEN ISNULL(fsd.SalesInclVAT, 0) * -1
             ELSE 0
           END) AS DrinksLoyaltyAmount
         , SUM(CASE WHEN dd.DiscountDesc = 'Loyalty Card'
              AND dp.FamilyGroupName = 'Newspapers'
             THEN ISNULL(fsd.SalesInclVAT, 0) * -1
             ELSE 0
           END) AS NewspapersLoyaltyAmount
         , SUM(CASE WHEN dd.DiscountDesc = 'Loyalty Card'
              AND LEFT(dp.MajorGroupName, 5) = 'Drink'
              AND dp.MajorGroupName = 'Drink Extra'
             THEN ISNULL(fsd.SalesInclVAT, 0) * -1
             ELSE 0
           END) AS ExtrasLoyaltyAmount
                    -- Transaction Flags
         , CASE WHEN SUM(ISNULL(fsd.SalesInclVAT, 0)) < 0 THEN 1
           ELSE 0
          END AS NegativeTransactionCount
         , CASE WHEN SUM(ISNULL(fsd.SalesInclVAT, 0)) = 0 THEN 1
           ELSE 0
          END AS ZeroValueTransactionCount
         , CASE WHEN SUM(ISNULL(fsd.SalesInclVAT, 0)) BETWEEN 0.0001 AND 0.5
           THEN 1
           ELSE 0
          END AS UnderFiftyPenceTransactionCount
         , CASE WHEN SUM(ISNULL(fsd.SalesInclVAT, 0)) BETWEEN 0.0001 AND 1
           THEN 1
           ELSE 0
          END AS UnderOnePoundTransactionCount
         , CASE WHEN SUM(ISNULL(fsd.SalesInclVAT, 0)) BETWEEN 0 AND 4.9999
           THEN 1
           ELSE 0
          END AS UnderFivePoundsTransactionCount
         , CASE WHEN SUM(ISNULL(fsd.SalesInclVAT, 0)) BETWEEN 5 AND 9.9999
           THEN 1
           ELSE 0
          END AS UnderTenPoundsTransactionCount
         , CASE WHEN SUM(ISNULL(fsd.SalesInclVAT, 0)) BETWEEN 10 AND 14.9999
           THEN 1
           ELSE 0
          END AS UnderFifteenPoundsTransactionCount
         , CASE WHEN SUM(ISNULL(fsd.SalesInclVAT, 0)) BETWEEN 15 AND 19.9999
           THEN 1
           ELSE 0
          END AS UnderTwentyPoundsTransactionCount
         , CASE WHEN SUM(ISNULL(fsd.SalesInclVAT, 0)) >= 20 THEN 1
           ELSE 0
          END AS OverTwentyPoundsTransactionCount
         , CASE WHEN SUM(ISNULL(fsd.SalesInclVAT, 0)
              - ISNULL(fsd.SalesVAT, 0)) < 1.5 THEN 1
           ELSE 0
          END AS UnderOneDollarFiftyTransactionCount
         , CASE WHEN SUM(ISNULL(fsd.SalesInclVAT, 0)) BETWEEN 0.0001 AND 2
           THEN 1
           ELSE 0
          END AS UnderTwoTransactionCount            
                    -- Returns (Refunds)
         , SUM(CASE WHEN fsd.TransactionTypeKey = 1
              AND fsd.ReturnFlag = 1 THEN 1
             ELSE 0
           END) AS ReturnsQuantity
         , SUM(CASE WHEN fsd.TransactionTypeKey = 1
              AND fsd.ReturnFlag = 1
             THEN ISNULL(fsd.SalesInclVAT, 0)
             ELSE 0
           END) AS ReturnSalesInclVAT
         , SUM(CASE WHEN fsd.TransactionTypeKey = 1
              AND fsd.ReturnFlag = 1
             THEN ISNULL(fsd.SalesInclVAT, 0)
              - ISNULL(fsd.SalesVAT, 0)
             ELSE 0
           END) AS ReturnSaleExclVAT
         , CASE WHEN SUM(CASE WHEN fsd.TransactionTypeKey = 1
                  AND LEFT(MajorGroupName, 4) = 'Food'
                THEN 1
                ELSE 0
              END) = SUM(CASE WHEN fsd.TransactionTypeKey = 1
                    THEN 1
                    ELSE 0
                  END) THEN 1
           ELSE 0
          END AS FoodOnlyTransaction
                    -- Product Amounts
         , SUM(CASE WHEN LEFT(dp.MajorGroupName, 5) = 'Drink'
              AND dp.MajorGroupName <> 'Drink Extra'
             THEN ISNULL(fsd.SalesInclVAT, 0)
             ELSE 0
           END) AS DrinksAmount
         , SUM(CASE WHEN dp.MajorGroupName IN ('Drink Hot', 'Drink Iced')
             THEN ISNULL(fsd.SalesInclVAT, 0)
             ELSE 0
           END) AS HotAndIcedDrinksAmount
         , SUM(CASE WHEN dp.FamilyGroupName = 'Newspapers'
             THEN ISNULL(fsd.SalesInclVAT, 0)
             ELSE 0
           END) AS NewspapersAmount
         , SUM(CASE WHEN LEFT(dp.MajorGroupName, 4) = 'Food'
             THEN ISNULL(fsd.SalesInclVAT, 0)
             ELSE 0
           END) AS FoodAmount
         , SUM(CASE WHEN LEFT(dp.MajorGroupName, 4) = 'Food' THEN 1
             ELSE 0
           END) AS FoodQuantity
                    -- Customers  
         , SUM(CASE WHEN fsd.VoidFlag = 1 THEN -1
             ELSE 1
           END * CASE WHEN LEFT(dp.MajorGroupName, 5) = 'Drink'
                AND dp.MajorGroupName <> 'Drink Extra'
                AND fsd.OrderTypeKey IN (1, 3)
                AND fsd.TransactionTypeKey = 1
                AND fsd.SalesInclVAT <> 0
                AND ISNULL(tcp.CostPrice, 0) <> 0
                AND fsd.ReturnFlag = 0 THEN 1
               ELSE 0
             END) AS CustomerEatIn
         , SUM(CASE WHEN fsd.VoidFlag = 1 THEN -1
             ELSE 1
           END * CASE WHEN LEFT(dp.MajorGroupName, 5) = 'Drink'
                AND dp.MajorGroupName <> 'Drink Extra'
                AND fsd.OrderTypeKey IN (2, 4)
                AND fsd.TransactionTypeKey = 1
                AND fsd.SalesInclVAT <> 0
                AND ISNULL(tcp.CostPrice, 0) <> 0
                AND fsd.ReturnFlag = 0 THEN 1
               ELSE 0
             END) AS CustomerTakeaway
         , SUM(CASE WHEN fsd.VoidFlag = 1 THEN -1
             ELSE 1
           END * CASE WHEN LEFT(dp.MajorGroupName, 5) = 'Drink'
                AND dp.MajorGroupName <> 'Drink Extra'
                AND fsd.OrderTypeKey IN (1, 2, 3, 4)
                AND fsd.TransactionTypeKey = 1
                AND fsd.SalesInclVAT <> 0
                AND ISNULL(tcp.CostPrice, 0) <> 0
                AND fsd.ReturnFlag = 0 THEN 1
               ELSE 0
             END) AS CustomerCount
      FROM  syn.factSalesDetail AS fsd
      JOIN  syn.dimProduct AS dp
          ON fsd.ProductKey = dp.ProductKey
      JOIN  syn.dimDiscount AS dd
          ON dd.DiscountKey = fsd.DiscountKey
      LEFT JOIN syn.tblCostPrices AS tcp
          ON fsd.PriceLevel = tcp.PriceLevel
           AND fsd.ProductKey = tcp.ProductKey
           AND fsd.StoreKey = tcp.StoreKey
           AND fsd.DateKey BETWEEN tcp.FromDateKey
                AND  tcp.ToDateKey
      JOIN  control.tblTransactionSummaryReloadDates AS ttsrd
          ON ttsrd.dateKey = fsd.DateKey
      GROUP BY fsd.TransactionReference
         , fsd.StoreKey
         , fsd.EmployeeKey
         , fsd.EmployeeClassKey
         , fsd.TillKey
         , fsd.DateKey
         , DATEPART(hh, fsd.TransactionTime)
         , FLOOR(DATEPART(mi, fsd.TransactionTime) / 15) * 15;

    I just thought - would adding the ToDateKey into the clustered index help do you think?

  • I'll repeat it.
    You're querying too many rows. The estimates show about 12M rows.
    That said, I wonder what would happen if you add a JOIN hint. Try adding the missing words to leave it like this:
    LEFT OUTER HASH JOIN syn.tblCostPrices AS tcp

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Unfortunately everything I do for this client is 'too many rows' - they are a large international retail chain and have enormous data quantities coming into their DW on a daily basis. The query I've posted here was 30 days of sales transactions which I'm attempting to aggregate into a table to use for reporting purposes. The 30 day window can possibly be narrowed to reduce the amount of data, but business practices dictate that sometimes we will need to refresh the data that far back on a daily basis so I'd rather not change that.

    I'll give the HASH a try and see if that makes any difference. I've currently got the query running in around 20 mins, but I'd love to shave some more time off that if I can.

  • Looking at the queries I see nothing to filter the data to 30 days worth of data unless the tables you are running the queries against only have 30 days worth of data.  There is no where clause in either query that you have posted.

  • IMHO I would try removing the SUM and group by from the 'meat' SQL, since you are summing the data in the other SQL.  I'm not saying it will help but that is what I would try..I know I've had SQL perform poorly when I have a lot of case statements and I'm trying to sum the data at the same time.

    -------------------------------------------------------------
    we travel not to escape life but for life not to escape us
    Don't fear failure, fear regret.

  • The 30 day filtering is done using the control.tblTransactionSummaryReloadDates table. This has 30 date key values loaded into it in a previous step of the ETL and so the inner join to this table effectively filters the query. This allows us to control how many days are loaded without altering the SQL code.

    The complexity of the meat query comes from the fact I'm calculating a lot of KPIs at a transaction level. The raw data in factSalesDetail is at line level so I effectively have to group twice - once to calculate many of the measures by transaction and then again to 'count' those up to the level of aggregation I actually want.

    I'm starting to think 20 mins might not be too bad based on how complex this is and how much data....

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply