Query took a nose dive

  • Suppose I have table with the following structure.

    CREATE TABLE [TAB].[tblWorkOrderSegInvoices](

    [CustomerNumber] [varchar](7) NOT NULL,

    [WorkOrderNumber] [varchar](10) NOT NULL,

    [SegmentNumber] [char](2) NOT NULL,

    [PartsInvoiceAmount] [decimal](13, 2) NOT NULL,

    [LaborInvoiceAmount] [decimal](13, 2) NOT NULL,

    [MiscInvoiceAmount] [decimal](13, 2) NOT NULL,

    [ETLDate] [datetime] NULL

    ) ON [PRIMARY]

    GO

    ALTER TABLE [TAB].[tblWorkOrderSegInvoices] ADD CONSTRAINT [DF_tblWorkOrderSegInvoices_ETLDate] DEFAULT (getdate()) FOR [ETLDate]

    GO

    There is a stored procedure that runs daily that does the following with this table.

    TRUNCATE TABLE TAB.tblWorkOrderSegInvoices

    IF EXISTS (SELECT name FROM sys.indexes WHERE name = N'idxCustWOSeg')

    DROP INDEX idxCustWOSeg ON TAB.tblWorkOrderSegInvoices WITH ( ONLINE = OFF )

    --do inserts into table; around 450,000 records each day

    INSERT INTO TAB.tblWorkOrderSegInvoices WITH (TABLOCK)

    (CustomerNumber, WorkOrderNumber, SegmentNumber, PartsInvoiceAmount, LaborInvoiceAmount, MiscInvoiceAmount)

    SELECT .....

    CREATE CLUSTERED INDEX idxCustWOSeg ON TAB.tblWorkOrderSegInvoices

    (

    CustomerNumber ASC,

    WorkOrderNumber ASC,

    SegmentNumber ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    Then after the truncate table, drop index, insert data, create index, I have a SELECT query that joins to this table on CustomerNumber, WorkorderNumber, and SegmentNumber.

    All of last month, the query that joins to this table was running within an acceptable timeframe. But starting this month, it has taken 4 times longer. The only difference that I can detect is that the number of records being inserted into it daily has lessened by around 140,000. Also, the number of records being joined into it has decreased by 100,000 or so.

    In trying to troubleshoot this, I've not applied the index and it runs within the normal acceptable timeframe. But when I add the index it takes 4x longer to run.

    As a side note, we are about to move this database to a new server. We have it over on the new server and everything about this process performs as expected.

    All of this leads me to believe this is some kind of index\plan\statistics problem. I'm not a DBA and don't claim to be. But I am responsible for this process and making sure it runs well. Any help would be great.

  • Can you post, as a .sqlplan attachment, the actual (as opposed to estimated) execution plan for "SELECT query that joins to this table", please? Thanks.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Thank you Chris for offering to look at this. I'm running it now. I'm warning you now that its not pretty. I'll upload as soon as it completes.

  • When is the last time you rebuilt statistics on the other tables that you're joining to this one?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • It looks like it may have been a while.

    Attached is the query plan. The main query that is the trouble is Query 3. As you can see, it is a monster. In order to put the necessary data together, we have to do some interesting things. I know it isn't the prettiest. Let me know if you have any questions.

    --Update: also updated plan when I've dropped the index.

  • LeeFAR (1/5/2017)


    It looks like it may have been a while.

    Attached is the query plan. The main query that is the trouble is Query 3. As you can see, it is a monster. In order to put the necessary data together, we have to do some interesting things. I know it isn't the prettiest. Let me know if you have any questions.

    --Update: also updated plan when I've dropped the index.

    In that case, before you continue troubleshooting/tuning, you really should rebuild stats on the tables involved, which will also cause the procs that use the table to recompile. You might have nothing else to do after that.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • LeeFAR (1/5/2017)


    It looks like it may have been a while.

    Attached is the query plan. The main query that is the trouble is Query 3. As you can see, it is a monster. In order to put the necessary data together, we have to do some interesting things. I know it isn't the prettiest. Let me know if you have any questions.

    --Update: also updated plan when I've dropped the index.

    Thanks Lee. Here are the steps I'd take next:

    Update stats, as Jeff recommends.

    Change the two table variables to #temp tables. They are seriously screwing up the rowcount estimates.

    Grab the remote data into a #temp table, it's also screwing up estimates.

    Post the actual plan, and also the modified query 3 (it's truncated in the plan file).

    Some of the source tables are read multiple times - customers/stores/county as a group is read five times. If the query cannot be tweaked to reduce the duplication of effort then you might get some good mileage from putting this section of the query into a #temp table too.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Thanks Jeff and Chris. I cannot thank you enough for taking your time to help me.

    Chris, I did see that the table vars are causing the estimated rows to be stupid crazy. I don't know if you looked at the other execution plan, but that is from when I dropped the index on the one table. In that plan the estimated rows is in line. It is also where the query runs within is normal about of time. Its odd to me that dropped in the index would cause this behavior.

    I will start with updating stats. Would I be better off running sp_updatestats in the database or UPDATE STATISTICS on each table involved with the query?

  • One other thing I noticed. I went back to see when the stats were last built on the TAB.tblWorkOrderSegInvoices table and it was today. I guess when I looked before, I didn't have the right object. With the status being built today makes sense based on my understanding of when stats are automatically updated. Because I'm dropping the index, inserting a large amount, and adding the index back, then the stats are current. My thinking is now that perhaps the stats are getting created out of whack.

    SELECT name AS stats_name,

    STATS_DATE(object_id, stats_id) AS statistics_update_date

    FROM sys.stats

    WHERE object_id = OBJECT_ID('TAB.tblWorkOrderSegInvoices')

    order by statistics_update_date;

    GO

    stats_namestatistics_update_date

    idxCustWOSeg1/6/17 4:09 AM

    _WA_Sys_00000003_6991A7CB1/6/17 4:09 AM

    _WA_Sys_00000002_6991A7CB1/6/17 4:09 AM

    _WA_Sys_00000001_6991A7CB1/6/17 4:09 AM

  • I don't think you've posted the SQL select statement, but even though the final returned result is only 300k rows, just looking at the execution plan, I'm seeing table spool operations with row counts in hundreds of millions. It is by chance doing some type of cross join between tables and then filtering the result?

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Here is the t-sql in the store procedure that is running the query that is taking the time. The execution plan that is bad is what is produced base on this. If you do a find for --primary query; this is the one that is having the performance problem you will find the query that is taking the time. Again, if I don't create the idxCustWOSeg, it seems to run in the 12 mins it has been. But if I have the index, it takes over an hour. And prior to Jan 1, it was running with the index at an average of 12 mins.

  • LeeFAR (1/6/2017)


    Here is the t-sql in the store procedure that is running the query that is taking the time. The execution plan that is bad is what is produced base on this. If you do a find for --primary query; this is the one that is having the performance problem you will find the query that is taking the time. Again, if I don't create the idxCustWOSeg, it seems to run in the 12 mins it has been. But if I have the index, it takes over an hour. And prior to Jan 1, it was running with the index at an average of 12 mins.

    Get this...

    FROM OPENQUERY(AS400,'

    into a #temp table for starters, and time it. Then replace the OPENQUERY reference in your query, for the #temp table.

    Edit: also, to vastly simplify your problem query, include whatever processing you can for the result set - this, for instance:

    ISNULL(W.MachineManufacture,'Unknown') AS MachineManufacture, ISNULL(W.MachineFamily,'') AS MachineFamily, ISNULL(W.MachineModel,'') AS MachineModel,

    ISNULL(W.MachineDisplayModel,'') AS MachineDisplayModel, ISNULL(W.ForeCastModel,'') AS ForeCastModel, ISNULL(W.ProductFamilyGroup,'') AS ProductFamilyGroup,

    ISNULL(W.MachineIDNumber,'') AS MachineIDNumber, ISNULL(W.MachineManufactureYear,'') AS MachineManufactureYear,

    ISNULL(W.AttachmentManufacture,'Unknown') AS AttachmentManufacture, ISNULL(W.AttachmentModel,'') AS AttachmentModel, ISNULL(W.AttachmentDisplayModel,'') AS AttachmentDisplayModel,

    ISNULL(W.AttachmentDescription,'') AS AttachmentDescription, ISNULL(W.AttachmentIDNumber,'') AS AttachmentIDNumber, ISNULL(W.AttachmentManufactureYear,'') AS AttachmentManufactureYear,

    ISNULL(SCA.Salesman1Name,'Unknown') AS Salesman1Name, ISNULL(SCA.Salesman1TypeDescription,'Unknown') AS Salesman1TypeDescription, ISNULL(SCA.Salesman1CategoryDescription,'Unknown') AS Salesman1CategoryDescription, SCA.Salesman1AssignDate,

    ISNULL(SCA.Salesman2Name,'Unknown') AS Salesman2Name, ISNULL(SCA.Salesman2TypeDescription,'Unknown') AS Salesman2TypeDescription, ISNULL(SCA.Salesman2CategoryDescription,'Unknown') AS Salesman2CategoryDescription, SCA.Salesman2AssignDate,

    can all be done at the #temp table creation stage.

    Edit: as an aside, scalar functions such as dbo.fnFormatDBSToSQLDate will inhibit parallelism in all or part of the plan (all, in this case).

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • LeeFAR (1/6/2017)


    Thanks Jeff and Chris. I cannot thank you enough for taking your time to help me.

    Chris, I did see that the table vars are causing the estimated rows to be stupid crazy. I don't know if you looked at the other execution plan, but that is from when I dropped the index on the one table. In that plan the estimated rows is in line. It is also where the query runs within is normal about of time. Its odd to me that dropped in the index would cause this behavior.

    I will start with updating stats. Would I be better off running sp_updatestats in the database or UPDATE STATISTICS on each table involved with the query?

    Update statistics on each table.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • An update. The first thing I did was to change the @CODA and @WarrPolCusts to temp tables instead of table vars. This took the query right at 10 mins to complete. This was with the index still in place on the TAB.tblWorkOrderSegInvoices. So it does appear these table vars had something to do with the problem.

    Now, I'm certainly not discounting the fact there are some crappy things going on in this overall process. I've been trying to address a lot of it over time. And based on what you guys have been recommending, I have some good places to go. I welcome any additional advice you may have.

  • One thing I would look at is splitting the call to the function dates into temp tables.

    e.g. get distinct values of the dates that are being passed to the functions, then call the function on that list and insert into a table with the date itself and the result of the function.

    This woud get rid of both the len(..) = 8 and the function call.

    Something like

    insert the openquery onto a temp table as sugested. that one is a killer for sure.

    then create a table as

    select processdate

    , case when len(dates.processdate) = 8 then dbo.fnFormatDBSToSQLDate(processdate) else null end as dbtosqldate

    , case when len(dates.processdate) = 7 then dbo.fnFormatYearDaysToSQLDate(processdate) else null end as yeardaystosqldate)

    into #dates

    from (

    select distinct dates.processdate

    from newopenquerytable

    outer apply (select headeropendate as processdate

    union

    select headerclosedate

    union

    select invoicedate

    union

    select reopendate

    ... add all required

    ) dates

    where dates.processdate is not null

    ) t

    -- eventually set the 2 fields above to be a single one if output of the 2 functions is the same data type - will make remmaining code cleaner and temp table smaller.

    -- the above assumes that the data types of the input fields are the same - adapt as needed

    create clustered index #dates_ix1 on #dates

    (processdate

    );

    and

    then use on remaining code

    select ...

    , d1.headeropendate

    , d2.invoicedate

    ...

    from newopenquerytable w

    inner join #dates d1 -- or left join if nulls allowed on dates.

    on d1.processdate = headeropendate

    inner join #dates d2

    on d2.processdate = invoicedate

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply