poor performing report query against table without a PK

  • The table is practically a heap because its' clustered index is on a column that allows NULLs, and there are hundreds of thousands of NULLs. 

    Whenever I run my query the execution plan shows that the clustered index is always used (all 32 million rows scanned everytime), nevermind my query doesn't even search against the clustered index column, and despite the presence of non clustered indexes for each of the filtered columns. 

    During daytime my report query takes 3-4 minutes to return 9 rows and at night 5 seconds.  Business needs it to return in 1 second at all times.

    Clustered Index: OfferID

    Non Clustered Index1: Account#
    Non Clustered Index3:  BulkId
    Non Clustered Index4: IMEI
    Non Clustered Index2: Date

    My query:SELECT <25 columns> WHERE Account# = @account# OR BulkID = @BulkID OR IMEI = @IMEI OR Date >= @startdate AND Date <= @enddate.

    Execution plan used shows three operations: Clustered Index Scan, Parallelism, and SELECT.

    Please suggest how to optimize.  Can the table be optimized for my query if it has no PK, and clustered index permits NULLs, and all columns being filtered also allow nulls?

    --Quote me

  • polkadot - Sunday, November 25, 2018 11:02 PM

    The table is practically a heap because its' clustered index is on a column that allows NULLs, and there are hundreds of thousands of NULLs. 

    Whenever I run my query the execution plan shows that the clustered index is always used (all 32 million rows scanned everytime), nevermind my query doesn't even search against the clustered index column, and despite the presence of non clustered indexes for each of the filtered columns. 

    During daytime my report query takes 3-4 minutes to return 9 rows and at night 5 seconds.  Business needs it to return in 1 second at all times.

    Clustered Index: OfferID

    Non Clustered Index1: Account#
    Non Clustered Index3:  BulkId
    Non Clustered Index4: IMEI
    Non Clustered Index2: Date

    My query:SELECT <25 columns> WHERE Account# = @account# OR BulkID = @BulkID OR IMEI = @IMEI OR Date >= @startdate AND Date <= @enddate.

    Execution plan used shows three operations: Clustered Index Scan, Parallelism, and SELECT.

    Please suggest how to optimize.  Can the table be optimized for my query if it has no PK, and clustered index permits NULLs, and all columns being filtered also allow nulls?

    How many columns are in the table?
    Which of the filter columns is most selective?
    How many rows are returned by the query? Obviously this will vary - a low and high figure will be useful.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Can you share the execution plan. PK and CI is different. You can also check the fragmentation, since it's a heap.

    Muthukkumaran Kaliyamoorthy
    https://www.sqlserverblogforum.com/

  • muthukkumaran Kaliyamoorthy - Monday, November 26, 2018 5:05 AM

    Can you share the execution plan. PK and CI is different. You can also check the fragmentation, since it's a heap.

    One, it's not a heap if it has a clustered index (however bad the clustered index may be). Two, fragmentation won't affect plan creation and index choices (there's an exception to this, but not worth getting into here). Be cautious on advice along these lines.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • polkadot - Sunday, November 25, 2018 11:02 PM

    The table is practically a heap because its' clustered index is on a column that allows NULLs, and there are hundreds of thousands of NULLs. 

    Whenever I run my query the execution plan shows that the clustered index is always used (all 32 million rows scanned everytime), nevermind my query doesn't even search against the clustered index column, and despite the presence of non clustered indexes for each of the filtered columns. 

    During daytime my report query takes 3-4 minutes to return 9 rows and at night 5 seconds.  Business needs it to return in 1 second at all times.

    Clustered Index: OfferID

    Non Clustered Index1: Account#
    Non Clustered Index3:  BulkId
    Non Clustered Index4: IMEI
    Non Clustered Index2: Date

    My query:SELECT <25 columns> WHERE Account# = @account# OR BulkID = @BulkID OR IMEI = @IMEI OR Date >= @startdate AND Date <= @enddate.

    Execution plan used shows three operations: Clustered Index Scan, Parallelism, and SELECT.

    Please suggest how to optimize.  Can the table be optimized for my query if it has no PK, and clustered index permits NULLs, and all columns being filtered also allow nulls?

    Without the plan and the structure, any advice is just guesses. So, all those OR clauses make this look like a catch-all query. You should read this article by Gail Shaw on how to better deal with this.

    One question on the logic there. Is it this:  (Account# = @account# OR BulkID = @BulkID OR IMEI = @IMEI OR Date >= @startdate) AND Date <= @enddate

    Or, is it this: Account# = @account# OR BulkID = @BulkID OR IMEI = @IMEI OR (Date >= @startdate AND Date <= @enddate)

    Or what you already have?

    I'd suggest using parenthesis to quantify that for the optimizer. It will make a difference in the indexes you build.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • polkadot - Sunday, November 25, 2018 11:02 PM

    The table is practically a heap because its' clustered index is on a column that allows NULLs, and there are hundreds of thousands of NULLs. 

    Whenever I run my query the execution plan shows that the clustered index is always used (all 32 million rows scanned everytime), nevermind my query doesn't even search against the clustered index column, and despite the presence of non clustered indexes for each of the filtered columns. 

    During daytime my report query takes 3-4 minutes to return 9 rows and at night 5 seconds.  Business needs it to return in 1 second at all times.

    Clustered Index: OfferID

    Non Clustered Index1: Account#
    Non Clustered Index3:  BulkId
    Non Clustered Index4: IMEI
    Non Clustered Index2: Date

    My query:SELECT <25 columns> WHERE Account# = @account# OR BulkID = @BulkID OR IMEI = @IMEI OR Date >= @startdate AND Date <= @enddate.

    Execution plan used shows three operations: Clustered Index Scan, Parallelism, and SELECT.

    Please suggest how to optimize.  Can the table be optimized for my query if it has no PK, and clustered index permits NULLs, and all columns being filtered also allow nulls?

    Post the DDL for the table and all indices please, makes it easier to contribute to the problem's solution.
    😎
    Suggest you also post the execution plan😉

  • polkadot - Sunday, November 25, 2018 11:02 PM

    Non Clustered Index4: IMEI

    Is this mobile CDR data?
    😎

  • Your query is equivalent to this:
    SELECT *
      FROM myTable
     WHERE Account# = @account#
     UNION
    SELECT *
      FROM myTable
     WHERE BulkID = @BulkID
     UNION
    SELECT *
      FROM myTable
     WHERE IMEI = @IMEI
     UNION
    SELECT *
      FROM myTable
     WHERE Date BETWEEN @startdate AND Date @enddate

    You might be able to get it to use the indexes if you rewrite the query like that.
    The indexes should also include the 25 columns you are selecting.

  • Grant Fritchey - Monday, November 26, 2018 6:44 AM

    muthukkumaran Kaliyamoorthy - Monday, November 26, 2018 5:05 AM

    Can you share the execution plan. PK and CI is different. You can also check the fragmentation, since it's a heap.

    One, it's not a heap if it has a clustered index (however bad the clustered index may be). Two, fragmentation won't affect plan creation and index choices (there's an exception to this, but not worth getting into here). Be cautious on advice along these lines.

    The table is practically a heap because its' clustered index is on a column that allows NULLs, and there are hundreds of thousands of NULLs. 


    Oh bad, I thought it was a heap. ya, right fragmentation will not affect the plan creation and it slow the data fetch.
    Sure. I will read carefully.

    Muthukkumaran Kaliyamoorthy
    https://www.sqlserverblogforum.com/

  • Jonathan AC Roberts - Monday, November 26, 2018 7:52 AM

    Your query is equivalent to this:
    SELECT *
      FROM myTable
     WHERE Account# = @account#
     UNION
    SELECT *
      FROM myTable
     WHERE BulkID = @BulkID
     UNION
    SELECT *
      FROM myTable
     WHERE IMEI = @IMEI
      UNION
    SELECT *
      FROM myTable
     WHERE Date >= @startdate A
       AND Date <= @enddat

    You might be able to get it to use the indexes if you rewrite the query like that.
    The indexes should also include the 25 columns you are selecting.

    25 columns is likely to be most or all of the columns (guessing of course).
    Next guess - a small number of rows would be best returned with a seek of a narrow index and key lookup.
    Some idea of the data distribution might help, however I suspect the OP actually has the usual "match or null" logic in the query i.e. [variable = NULL or variable = column]

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM@Work - Monday, November 26, 2018 8:06 AM

    25 columns is likely to be most or all of the columns (guessing of course).
    Next guess - a small number of rows would be best returned with a seek of a narrow index and key lookup.
    Some idea of the data distribution might help, however I suspect the OP actually has the usual "match or null" logic in the query i.e. [variable = NULL or variable = column]

    Which does bring up the question, are those variables we're seeing or parameters? That too could affect index choice based on the differences in how the row estimates are arrived at. There just are no easy answers without all the details.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Grant Fritchey - Monday, November 26, 2018 8:11 AM

    ChrisM@Work - Monday, November 26, 2018 8:06 AM

    25 columns is likely to be most or all of the columns (guessing of course).
    Next guess - a small number of rows would be best returned with a seek of a narrow index and key lookup.
    Some idea of the data distribution might help, however I suspect the OP actually has the usual "match or null" logic in the query i.e. [variable = NULL or variable = column]

    Which does bring up the question, are those variables we're seeing or parameters? That too could affect index choice based on the differences in how the row estimates are arrived at. There just are no easy answers without all the details.

    Can I run this again in case it was missed?

    "There just are no easy answers without all the details."
    Guessing can be amusing, however a faster (and probably better) solution will be arrived at with all of the necessary details up front.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Grant Fritchey - Monday, November 26, 2018 8:11 AM

    ChrisM@Work - Monday, November 26, 2018 8:06 AM

    25 columns is likely to be most or all of the columns (guessing of course).
    Next guess - a small number of rows would be best returned with a seek of a narrow index and key lookup.
    Some idea of the data distribution might help, however I suspect the OP actually has the usual "match or null" logic in the query i.e. [variable = NULL or variable = column]

    Which does bring up the question, are those variables we're seeing or parameters? That too could affect index choice based on the differences in how the row estimates are arrived at. There just are no easy answers without all the details.

    The OP did say the query returns just 9 rows. So they should be able to get it to use an index.

  • Jonathan AC Roberts - Monday, November 26, 2018 8:47 AM

    Grant Fritchey - Monday, November 26, 2018 8:11 AM

    ChrisM@Work - Monday, November 26, 2018 8:06 AM

    25 columns is likely to be most or all of the columns (guessing of course).
    Next guess - a small number of rows would be best returned with a seek of a narrow index and key lookup.
    Some idea of the data distribution might help, however I suspect the OP actually has the usual "match or null" logic in the query i.e. [variable = NULL or variable = column]

    Which does bring up the question, are those variables we're seeing or parameters? That too could affect index choice based on the differences in how the row estimates are arrived at. There just are no easy answers without all the details.

    OP did say the query returns just 9 rows. So they should be able to get it to use an index.

    No arguments there. We just don't have enough data to do more than speculate... somewhat wildly.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Answers to Qs:
    Grant Fritchey: (1) I added parenthesis around Date.  (2) using parameters. Attached query.
    ChrisM: (1) table has 98 columns (2) selectivity of columns will change over time (3) rows returned will be between 5 to 50
    Erik Erikson:  (1) IMEI is mobile. (2) you'll see from DDL that some columns are quite huge and all but one, Bill_Process_Failure_Reason @ varchar(4000) is justified. 
    Jonathan AC Roberts: I don't think I will take UNION route because report query actually has a number of JOINS that I have commented out for sake of addressing bottlenecks one step at a time 

    attached exec plan  /  inserted DDL, and report query.

    DDL


    SET ANSI_NULLS ON
    GO

    SET QUOTED_IDENTIFIER ON
    GO

    SET ANSI_PADDING ON
    GO

    CREATE TABLE [dbo].[QR_Device_Trade_In_Master](


    [Offer_ID] [varchar](500) NULL,


    [Offer_Generate_Date] [datetime] NULL,


    [Offer_Update_Date] [datetime] NULL,


    [Manufacturer] [varchar](300) NOT NULL,


    [Model] [varchar](300) NOT NULL,


    [Serial_Number] [varchar](100) NOT NULL,


    [Serial_Number_Type] [varchar](50) NULL,


    [Application_User_ID] [varchar](500) NULL,


    [Application_ID] [varchar](500) NULL,


    [device_Evaluation_Date] [datetime] NULL,


    [Store_ID] [varchar](100) NULL,


    [Purchase_Device_Sku] [varchar](100) NULL,


    [BAN] [numeric](9, 0) NULL,


    [Subscriber_No] [varchar](20) NULL,


    [Offer_Amount] [numeric](8, 4) NULL,


    [RMA_Number] [varchar](100) NULL,


    [Offer_Expiration_Date] [datetime] NULL,


    [Customer_Email_Address] [varchar](100) NULL,


    [Customer_Phone_Number] [varchar](30) NULL,


    [Offer_Status] [varchar](100) NULL,


    [Rejection_Reason] [varchar](300) NULL,


    [RMA_Status] [varchar](100) NULL,


    [RMA_Expiration_Date] [datetime] NULL,


    [Settlement_Amount] [numeric](8, 4) NULL,


    [Network_Operator] [varchar](100) NOT NULL,


    [Shipping_Label_URL] [varchar](2000) NULL,


    [Eligibility_Status] [varchar](100) NULL,


    [Eligibility_Reason] [varchar](1000) NULL,


    [Is_Customer_Billed] [varchar](1) NULL,


    [Adjustment_Reason] [varchar](2000) NULL,


    [Bill_Status] [varchar](1) NULL,


    [Bill_Process_Failure_Reason] [varchar](4000) NULL,


    [Eligibility_Reason_Code] [varchar](100) NULL,


    [Sender_ID] [varchar](500) NULL,


    [Channel_ID] [varchar](500) NULL,


    [Session_ID] [varchar](500) NULL,


    [Work_Flow_ID] [varchar](500) NULL,


    [Activity_ID] [varchar](500) NULL,


    [Dealer_Code] [varchar](500) NULL,


    [Credit_Issue_Date] [datetime] NULL,


    [Create_Date] [datetime] NOT NULL,


    [Created_By] [varchar](500) NULL,


    [Update_Date] [datetime] NULL,


    [Updated_By] [varchar](500) NULL,


    [Image_Url] [varchar](4000) NULL,


    [Sales_Document_Number] [varchar](100) NULL,


    [Purchase_Order_Number] [varchar](100) NULL,


    [Sap_Channel] [varchar](50) NULL,


    [Sap_Return_Type] [varchar](10) NULL,


    [Sap_Return_Message] [varchar](1000) NULL,


    [Transaction_Type] [varchar](100) NULL,


    [ProcessLogID] [int] NOT NULL,


    [ImportFileDate] [date] NOT NULL,


    [Event_Type] [varchar](100) NULL,


    [Retail_Store_Type] [varchar](50) NULL,


    [Assurant_Model] [varchar](300) NULL,


    [Assurant_Manufacturer] [varchar](300) NULL,


    [Cancel_Reason] [varchar](500) NULL,


    [Program_ID] [varchar](10) NOT NULL,


    [Fulfillment_Ord_Number_Forward] [varchar](30) NULL,


    [EIP_Balance] [numeric](8, 4) NULL,


    [EIP_Plan_ID] [varchar](50) NULL,


    [EIP_Device_ID] [varchar](50) NULL,


    [Claim_Auth_Code] [varchar](50) NULL,


    [Claim_Req] [varchar](2) NULL,


    [Claim_Auth_Date] [date] NULL,


    [Claim_Auth_Available] [varchar](2) NULL,


    [Deductible_Amount] [numeric](8, 4) NULL,


    [Device_Collection_Required] [varchar](2) NULL,


    [Claimed_Device_Collected] [varchar](1) NULL,


    [Jump_TradeIn_Value] [numeric](8, 4) NULL,


    [Claim_Submitted_By] [varchar](100) NULL,


    [TOTAL_DAMAGE_AMOUNT] [numeric](10, 4) NULL,


    [INSURANCE_DEDUCTABLE_AMOUNT] [numeric](10, 4) NULL,


    [DEVICE_SHIPMENT_DATE] [date] NULL,


    [DEVICE_RETURN_EXPIRATION_DATE] [date] NULL,


    [DEVICE_RETURN_DATE] [date] NULL,


    [DEVICE_RETURN_SESSION_ID] [varchar](100) NULL,


    [ORIGINAL_RMA_NUMBER] [varchar](100) NULL,


    [UPDATED_RMA_EXPIRATION_DATE] [date] NULL,


    [Adjustment_Id] [varchar](50) NULL,


    [Tracking_Carrier] [varchar](100) NULL,


    [Tracking_Number] [varchar](50) NULL,


    [Return_Tracking_Number] [varchar](100) NULL,


    [Trade_In_Sub_Status] [varchar](50) NULL,


    [Trade_In_Status] [varchar](50) NULL,


    [Return_Tracking_Carrier] [varchar](100) NULL,


    [Bill_Credit_Balance] [numeric](8, 4) NULL,


    [Device_Received_Date] [date] NULL,


    [Trade_In_Credit_Date] [date] NULL,


    [Credited_Down_Payment] [numeric](8, 4) NULL,


    [Returned_Date] [date] NULL,


    [TRADEIN_INTENT] [varchar](50) NULL,


    [ORDER_LINE_ID] [varchar](50) NULL,


    [BulkID] [varchar](50) NULL


    ) ON [PRIMARY]
    GO

    SET ANSI_PADDING

    OFF

    GO

    Report Query 
    In report query you'll notice I am pulling from table with '
    _2016onward' which is a copy of the table, containing records from year 2016 onward, so as not to interfere with production table while testing new indexes in production.  
    Have commented out the JOINS to focus on primary table


    DECLARE


    @BULKID varchar(50) = 5006,


    @BANvarchar(50) = null,-- '955932237',


    @FROM_DATEdate = NULL,


    @TO_DATEdate = NULL,


    @IMEI varchar(100) = NULL


    SELECT QR.[Offer_ID]


    ,QR.[Offer_Generate_Date]


    ,QR.[Offer_Update_Date]


    ,QR.[Manufacturer]


    ,QR.[Model]


    ,QR.[Serial_Number]


    ,QR.[Application_User_ID]


    ,QR.[Application_ID]


    ,QR.[device_Evaluation_Date]


    ,QR.[Store_ID]


    ,QR.[BAN]


    ,QR.[Subscriber_No]


    ,QR.[Offer_Amount]


    ,QR.[RMA_Number]


    ,QR.[Offer_Status]


    ,QR.[Rejection_Reason]


    ,QR.[RMA_Status]


    ,QR.[Settlement_Amount]


    ,QR.[Eligibility_Status]


    ,QR.[Eligibility_Reason]


    ,QR.[Adjustment_Reason]


    ,QR.[Bill_Status]


    ,QR.[Channel_ID]


    ,QR.[Credit_Issue_Date]


    ,QR.[Program_ID]


    ,QR.[BulkID]


    --,QP.MARKET_ASSESSED_AMOUNT as [Fair Market Value]


    --,QP.PRM_NAME AS Promo


    ,QR.Assurant_Manufacturer as [Assurant Manufacturer Received]


    ,QR.Assurant_Model as [Assurant Model Received]


    --,COALESCE(Q1.Answer_Name,'N/A') AS [FMiP OFF]


    --,COALESCE(Q6.Answer_Name,'N/A') AS [AntiTheft OFF]


    --,COALESCE(Q2.Answer_Name,'N/A') AS [Liquid Damage]


    --,COALESCE(Q3.Answer_Name,'N/A') AS [Promo Accepted]


    --,COALESCE(Q4.Answer_Name,'N/A') AS [LCD Acceptable]


    --,COALESCE(Q5.Answer_Name,'N/A') AS [Powers On]


    FROM DistributionCenterFeed.[dbo].[QR_Device_Trade_In_Master_2016onward] QR


    --LEFT JOIN ODS.[dbo].[QR_DR_TradeIn_Promo] QP ON QR.Offer_ID = QP.Offer_ID


    --LEFT JOIN ODS.[dbo].[QR_DR_Transaction_Ques_Answers] Q1 ON QR.Offer_ID = Q1.Offer_ID AND Q1.Question_Name = 'Find My Iphone'--


    --LEFT JOIN ODS.[dbo].[QR_DR_Transaction_Ques_Answers] Q2 ON QR.Offer_ID = Q2.Offer_ID AND Q2.Question_Name = 'Liquid Damage'


    --LEFT JOIN ODS.[dbo].[QR_DR_Transaction_Ques_Answers] Q3 ON QR.Offer_ID = Q3.Offer_ID AND Q3.Question_Name = 'ATLANTIS_PROMO'


    --LEFT JOIN ODS.[dbo].[QR_DR_Transaction_Ques_Answers] Q4 ON QR.Offer_ID = Q4.Offer_ID AND Q4.Question_Name = 'LCD Issue'


    --LEFT JOIN ODS.[dbo].[QR_DR_Transaction_Ques_Answers] Q5 ON QR.Offer_ID = Q5.Offer_ID AND Q5.Question_Name = 'Powers On'


    --LEFT JOIN ODS.[dbo].[QR_DR_Transaction_Ques_Answers] Q6 ON QR.Offer_ID = Q6.Offer_ID AND Q6.Question_Name = 'Anti Theft Disabled'


    WHERE


    BAN = @BAN


    OR BulkID = @BULKID


    OR Serial_Number= @IMEI


    OR ([Offer_Generate_Date] >= @FROM_DATE AND [Offer_Generate_Date] <= @TO_DATE)


    GO


    --Quote me

Viewing 15 posts - 1 through 15 (of 54 total)

You must be logged in to reply to this topic. Login to reply