Help Optimizing Query?

  • We have a workflow management app, which tracks the status of insurance applications, and records the step each case currently is at. Most of the case data is stored in the audit_trail table below. I need to collect a snapshot of the data (which constantly changes) at hourly intervals. A new record is added for each step the case goes through. Besides the 'audit_date' field which stores the timestamp it enters the stepname, the 'at_id' field indicates the most recent step for each case, since the number is incremental. We are only interested in certain stepnames, which can be seen in the query. In addition, I need to find the Case's StartDate, so we can calculate how old the Case is. The Query I have returns accurate information but is terribly slow. Hopefully, some of our experts here can show me a better technique to make this query fly. I tried to follow Jeff Moden's forum etiquette, but if i missed any important details, just let me know. Thanks!

    --===== If the test table already exists, drop it

    IF OBJECT_ID('TempDB..#audit_trail','U') IS NOT NULL

    DROP TABLE #audit_trail

    --===== Create the test table with

    CREATE TABLE #audit_trail

    (

    at_id numeric(20, 0) NOT NULL,

    node_id int NOT NULL,

    proc_id int NOT NULL,

    casenum numeric(10, 0) NOT NULL,

    type_id int NOT NULL,

    audit_date datetime NOT NULL,

    stepdesc varchar(24) NULL,

    user_name varchar(255) NULL,

    stepname varchar(8) NULL,

    CONSTRAINT pk_audit_trail PRIMARY KEY CLUSTERED

    (

    at_id ASC,

    node_id ASC

    )

    )

    --===== Create Non-Clustered Index

    CREATE NONCLUSTERED INDEX [i_audit_trail] ON #audit_trail

    (

    casenum ASC,

    proc_id ASC

    )

    --===== Insert the test data into the test table

    INSERT INTO #audit_trail

    (at_id,node_id,proc_id,casenum,type_id,audit_date,stepdesc,user_name,stepname)

    SELECT '28600708','1','10','399716','0','Jun 23 2008 2:34PM','','swadmin@STAFFVIEW','' UNION ALL

    SELECT '28600709','1','10','399716','1','Jun 23 2008 2:34PM','*ignore','swadmin@STAFFVIEW','START' UNION ALL

    SELECT '28600710','1','10','399716','2','Jun 23 2008 2:34PM','*ignore','swadmin@STAFFVIEW','START' UNION ALL

    SELECT '28600711','1','10','399716','1','Jun 23 2008 2:34PM','Data Review Step','REVIEW@STAFFVIEW','REVIEW' UNION ALL

    SELECT '28650700','1','10','399716','2','Jun 27 2008 12:35PM','Data Review Step','simmons@STAFFVIEW','REVIEW' UNION ALL

    SELECT '28650701','1','10','399716','1','Jun 27 2008 12:35PM','DATA ENTRY STEP','DATAENTRY@STAFFVIEW','DATAENTY' UNION ALL

    SELECT '28664580','1','10','399716','2','Jun 30 2008 11:10AM','DATA ENTRY STEP','melzer@STAFFVIEW','DATAENTY' UNION ALL

    SELECT '28664581','1','10','399716','1','Jun 30 2008 11:10AM','Send Back Queue','SENDBACK@STAFFVIEW','SENDBACK' UNION ALL

    SELECT '28631292','1','10','400455','0','Jun 25 2008 3:27PM','','swadmin@STAFFVIEW','' UNION ALL

    SELECT '28631293','1','10','400455','1','Jun 25 2008 3:27PM','*ignore','swadmin@STAFFVIEW','START' UNION ALL

    SELECT '28631294','1','10','400455','2','Jun 25 2008 3:27PM','*ignore','swadmin@STAFFVIEW','START' UNION ALL

    SELECT '28631295','1','10','400455','1','Jun 25 2008 3:27PM','Data Review Step','REVIEW@STAFFVIEW','REVIEW' UNION ALL

    SELECT '28682496','1','10','400455','2','Jul 1 2008 11:08AM','Data Review Step','simmons@STAFFVIEW','REVIEW' UNION ALL

    SELECT '28682497','1','10','400455','1','Jul 1 2008 11:09AM','DATA ENTRY STEP','DATAENTRY@STAFFVIEW','DATAENTY'

    SELECT * FROM #audit_trail

    --===== If the test table already exists, drop it

    IF OBJECT_ID('TempDB..#audit_trail','U') IS NOT NULL

    DROP TABLE #audit_trail

    Here is the Query I am using. It returns accurate results, but very long to do so in the 12 million record DB.

    SELECT getdate() as RefreshDate

    ,SUM(CASE WHEN stepname = 'CALLBACK' THEN 1 ELSE 0 END) AS CALLBACK

    ,SUM(CASE WHEN stepname = 'DATAENTY' THEN 1 ELSE 0 END) AS DATAENTY

    ,SUM(CASE WHEN stepname = 'DATAREV' THEN 1 ELSE 0 END) AS DATAREV

    ,SUM(CASE WHEN stepname = 'PARTIAL' THEN 1 ELSE 0 END) AS PARTIAL

    ,SUM(CASE WHEN stepname = 'REVIEW' THEN 1 ELSE 0 END) AS REVIEW

    ,SUM(CASE WHEN stepname = 'SENDBACK' THEN 1 ELSE 0 END) AS SENDBACK

    ,SUM(CASE WHEN stepname = 'SUPREV' THEN 1 ELSE 0 END) AS SUPREV

    ,SUM(CASE WHEN stepname = 'WBREVIEW' THEN 1 ELSE 0 END) AS WBREVIEW

    ,SUM(Over7Days) AS Over7Days

    ,SUM(Over10Days) AS Over10Days

    ,MIN(StartDate) AS OldestDate

    FROM

    (

    SELECTa.casenum,

    b.StartDate,

    a.audit_date,

    case when datediff(d,b.StartDate,a.audit_date) >= 7 then 1 end as Over7Days,

    case when datediff(d,b.StartDate,a.audit_date) >= 10 then 1 end as Over10Days,

    a.stepname

    FROM

    (

    SELECTa.casenum,

    a.audit_date,

    a.stepname

    FROM#audit_trail a

    WHEREa.at_id =

    (

    SELECT MAX(at_id)

    FROM#audit_trail b

    WHEREa.casenum = b.casenum

    )

    AND a.type_id = 1

    AND a.stepname IN ('CALLBACK','DATAENTY','DATAREV','PARTIAL','REVIEW','SENDBACK','SUPREV','WBREVIEW')

    ) a

    INNER JOIN

    (

    SELECTcasenum

    ,audit_date as StartDate

    FROM#audit_trail

    WHEREstepname = 'START'

    AND type_id = 1

    ) b

    ON a.casenum = b.casenum

    ) A

  • Create indexes on

    CaseNUm

    at_ID

    If it is still slow, add anothert 2 index on TypeID

    and StepName

  • It's a 3rd party vendor app, unfortunately, so I can't modify it.

  • Pop

    Your sample data doesn't return any rows. This chunk...

    SELECT a.casenum,

    a.audit_date,

    a.stepname

    FROM #audit_trail a

    WHERE a.at_id =

    (

    SELECT MAX(at_id)

    FROM #audit_trail b

    WHERE a.casenum = b.casenum

    )

    AND a.type_id = 1

    AND a.stepname IN ('CALLBACK','DATAENTY','DATAREV','PARTIAL','REVIEW','SENDBACK','SUPREV','WBREVIEW')

    ... comes back empty. Any chance of rejigging the data please?

    I reckon there's scope to halve the run time by playing with the query.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • I take it the task names you have given (and are reporting) are a subset of all of the available ones.

    Hence the IN clause in the query.

    It might be better to have another temporary table and have an inner join rather than that IN.

    That way you will get better index hits rather than scans.

  • Thanks Chris, I changed the sample data to return results. (can't figure out how to get that one 'wink' out of the code though.)

    John, you are correct, I am only interested in a subset of the possible steps. Haven't tried an additional temp table yet.

    Thanks guys.

  • Cool, many thanks Pop.

    Got a question for you:

    SELECT a.casenum,

    a.audit_date,

    a.stepname

    FROM #audit_trail a

    WHERE a.at_id =

    (

    SELECT MAX(at_id)

    FROM #audit_trail b

    WHERE a.casenum = b.casenum

    )

    AND a.type_id = 1

    AND a.stepname IN ('CALLBACK','DATAENTY','DATAREV','PARTIAL','REVIEW','SENDBACK','SUPREV','WBREVIEW')

    Are you sure about this bit:

    SELECT MAX(at_id)

    FROM #audit_trail b

    WHERE a.casenum = b.casenum

    Because it translates as "pick up MAX(at_id) for each casenum. Rows which don't match our where clause are then discarded". This means that for a casenum which has matches on the where clause, unless the most recent row is a matching row, it will be omitted.

    Are you sure you don't mean "pick up MAX(at_id) for each casenum where rows match our where clause"?

    Cheers

    ChrisM

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • for each casenum, the 'at_id' field increments. So the record with the highest 'at_id' value for each casenum will contain the most recent step.

    What I wanted to do was find out what is the most recent step for each case when I run the Query... and only return those that match my Search Condition (PARTIAL,CALLBACK,etc.), since those are the only steps I'm interested in measuring.

    Hope this makes sense. Feel free to correct me, if I'm wrong.:)

  • So you want the cases having their most recent step in your list of stepnames? Sorry to be a pain!

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • yep, you're exactly right.

    it's measuring how many cases are in each of those listed steps, at a point in time when I run the query.

  • Hi Pop

    SELECT a.casenum, a.audit_date, c.audit_date as StartDate, a.stepname

    FROM #audit_trail a

    INNER JOIN (SELECT MAX(at_id) AS at_id, casenum FROM #audit_trail GROUP BY casenum) b

    ON b.casenum = a.casenum and b.at_id = a.at_id

    INNER JOIN #audit_trail c

    ON c.casenum = a.casenum AND c.type_id = 1 AND c.stepname = 'START'

    WHERE a.type_id = 1

    AND a.stepname IN ('CALLBACK','DATAENTY','DATAREV','PARTIAL','REVIEW','SENDBACK','SUPREV','WBREVIEW')

    This is doing things a little differently, worth a try.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Interesting, def a little shorter code-wise....no performance increase though...

  • pop (7/1/2008)


    Interesting, def a little shorter code-wise....no performance increase though...

    How many rows are returned by this...

    SELECT a.casenum, a.at_id

    FROM #audit_trail a

    INNER JOIN (SELECT MAX(at_id) AS at_id, casenum FROM #audit_trail GROUP BY casenum) b

    ON b.casenum = a.casenum AND b.at_id = a.at_id

    WHERE a.type_id = 1

    AND a.stepname IN ('START','CALLBACK','DATAENTY','DATAREV','PARTIAL','REVIEW','SENDBACK','SUPREV','WBREVIEW')

    ...and how long does it take to run?

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply