Help Optimizing Query?

Question

Post reply

Help Optimizing Query?

pop

SSCrazy

Points: 2130
More actions
June 29, 2008 at 1:33 pm

#184071

We have a workflow management app, which tracks the status of insurance applications, and records the step each case currently is at. Most of the case data is stored in the audit_trail table below. I need to collect a snapshot of the data (which constantly changes) at hourly intervals. A new record is added for each step the case goes through. Besides the 'audit_date' field which stores the timestamp it enters the stepname, the 'at_id' field indicates the most recent step for each case, since the number is incremental. We are only interested in certain stepnames, which can be seen in the query. In addition, I need to find the Case's StartDate, so we can calculate how old the Case is. The Query I have returns accurate information but is terribly slow. Hopefully, some of our experts here can show me a better technique to make this query fly. I tried to follow Jeff Moden's forum etiquette, but if i missed any important details, just let me know. Thanks!
--===== If the test table already exists, drop it
IF OBJECT_ID('TempDB..#audit_trail','U') IS NOT NULL
DROP TABLE #audit_trail
--===== Create the test table with
CREATE TABLE #audit_trail
(
at_id numeric(20, 0) NOT NULL,
node_id int NOT NULL,
proc_id int NOT NULL,
casenum numeric(10, 0) NOT NULL,
type_id int NOT NULL,
audit_date datetime NOT NULL,
stepdesc varchar(24) NULL,
user_name varchar(255) NULL,
stepname varchar(8) NULL,
CONSTRAINT pk_audit_trail PRIMARY KEY CLUSTERED
(
at_id ASC,
node_id ASC
)
)
--===== Create Non-Clustered Index
CREATE NONCLUSTERED INDEX [i_audit_trail] ON #audit_trail
(
casenum ASC,
proc_id ASC
)
--===== Insert the test data into the test table
INSERT INTO #audit_trail
(at_id,node_id,proc_id,casenum,type_id,audit_date,stepdesc,user_name,stepname)
SELECT '28600708','1','10','399716','0','Jun 23 2008 2:34PM','','swadmin@STAFFVIEW','' UNION ALL
SELECT '28600709','1','10','399716','1','Jun 23 2008 2:34PM','*ignore','swadmin@STAFFVIEW','START' UNION ALL
SELECT '28600710','1','10','399716','2','Jun 23 2008 2:34PM','*ignore','swadmin@STAFFVIEW','START' UNION ALL
SELECT '28600711','1','10','399716','1','Jun 23 2008 2:34PM','Data Review Step','REVIEW@STAFFVIEW','REVIEW' UNION ALL
SELECT '28650700','1','10','399716','2','Jun 27 2008 12:35PM','Data Review Step','simmons@STAFFVIEW','REVIEW' UNION ALL
SELECT '28650701','1','10','399716','1','Jun 27 2008 12:35PM','DATA ENTRY STEP','DATAENTRY@STAFFVIEW','DATAENTY' UNION ALL
SELECT '28664580','1','10','399716','2','Jun 30 2008 11:10AM','DATA ENTRY STEP','melzer@STAFFVIEW','DATAENTY' UNION ALL
SELECT '28664581','1','10','399716','1','Jun 30 2008 11:10AM','Send Back Queue','SENDBACK@STAFFVIEW','SENDBACK' UNION ALL
SELECT '28631292','1','10','400455','0','Jun 25 2008 3:27PM','','swadmin@STAFFVIEW','' UNION ALL
SELECT '28631293','1','10','400455','1','Jun 25 2008 3:27PM','*ignore','swadmin@STAFFVIEW','START' UNION ALL
SELECT '28631294','1','10','400455','2','Jun 25 2008 3:27PM','*ignore','swadmin@STAFFVIEW','START' UNION ALL
SELECT '28631295','1','10','400455','1','Jun 25 2008 3:27PM','Data Review Step','REVIEW@STAFFVIEW','REVIEW' UNION ALL
SELECT '28682496','1','10','400455','2','Jul 1 2008 11:08AM','Data Review Step','simmons@STAFFVIEW','REVIEW' UNION ALL
SELECT '28682497','1','10','400455','1','Jul 1 2008 11:09AM','DATA ENTRY STEP','DATAENTRY@STAFFVIEW','DATAENTY'
SELECT * FROM #audit_trail
--===== If the test table already exists, drop it
IF OBJECT_ID('TempDB..#audit_trail','U') IS NOT NULL
DROP TABLE #audit_trail
Here is the Query I am using. It returns accurate results, but very long to do so in the 12 million record DB.
SELECT getdate() as RefreshDate
,SUM(CASE WHEN stepname = 'CALLBACK' THEN 1 ELSE 0 END) AS CALLBACK
,SUM(CASE WHEN stepname = 'DATAENTY' THEN 1 ELSE 0 END) AS DATAENTY
,SUM(CASE WHEN stepname = 'DATAREV' THEN 1 ELSE 0 END) AS DATAREV
,SUM(CASE WHEN stepname = 'PARTIAL' THEN 1 ELSE 0 END) AS PARTIAL
,SUM(CASE WHEN stepname = 'REVIEW' THEN 1 ELSE 0 END) AS REVIEW
,SUM(CASE WHEN stepname = 'SENDBACK' THEN 1 ELSE 0 END) AS SENDBACK
,SUM(CASE WHEN stepname = 'SUPREV' THEN 1 ELSE 0 END) AS SUPREV
,SUM(CASE WHEN stepname = 'WBREVIEW' THEN 1 ELSE 0 END) AS WBREVIEW
,SUM(Over7Days) AS Over7Days
,SUM(Over10Days) AS Over10Days
,MIN(StartDate) AS OldestDate
FROM
(
SELECTa.casenum,
b.StartDate,
a.audit_date,
case when datediff(d,b.StartDate,a.audit_date) >= 7 then 1 end as Over7Days,
case when datediff(d,b.StartDate,a.audit_date) >= 10 then 1 end as Over10Days,
a.stepname
FROM
(
SELECTa.casenum,
a.audit_date,
a.stepname
FROM#audit_trail a
WHEREa.at_id =
(
SELECT MAX(at_id)
FROM#audit_trail b
WHEREa.casenum = b.casenum
)
AND a.type_id = 1
AND a.stepname IN ('CALLBACK','DATAENTY','DATAREV','PARTIAL','REVIEW','SENDBACK','SUPREV','WBREVIEW')
) a
INNER JOIN
(
SELECTcasenum
,audit_date as StartDate
FROM#audit_trail
WHEREstepname = 'START'
AND type_id = 1
) b
ON a.casenum = b.casenum
) A

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

ZA_Crafty Ten Centuries Points: 1228 More actions · Answer 1

Create indexes on

CaseNUm

at_ID

If it is still slow, add anothert 2 index on TypeID

and StepName

pop SSCrazy Points: 2130 More actions · Answer 2

It's a 3rd party vendor app, unfortunately, so I can't modify it.

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 3

Pop

Your sample data doesn't return any rows. This chunk...

SELECT a.casenum,

a.audit_date,

a.stepname

FROM #audit_trail a

WHERE a.at_id =

(

SELECT MAX(at_id)

FROM #audit_trail b

WHERE a.casenum = b.casenum

)

AND a.type_id = 1

AND a.stepname IN ('CALLBACK','DATAENTY','DATAREV','PARTIAL','REVIEW','SENDBACK','SUPREV','WBREVIEW')

... comes back empty. Any chance of rejigging the data please?

I reckon there's scope to halve the run time by playing with the query.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

john.henderson-787186 SSC-Addicted Points: 427 More actions · Answer 4

I take it the task names you have given (and are reporting) are a subset of all of the available ones.

Hence the IN clause in the query.

It might be better to have another temporary table and have an inner join rather than that IN.

That way you will get better index hits rather than scans.

pop SSCrazy Points: 2130 More actions · Answer 5

Thanks Chris, I changed the sample data to return results. (can't figure out how to get that one 'wink' out of the code though.)

John, you are correct, I am only interested in a subset of the possible steps. Haven't tried an additional temp table yet.

Thanks guys.

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 6

Cool, many thanks Pop.

Got a question for you:

SELECT a.casenum,

a.audit_date,

a.stepname

FROM #audit_trail a

WHERE a.at_id =

(

SELECT MAX(at_id)

FROM #audit_trail b

WHERE a.casenum = b.casenum

)

AND a.type_id = 1

AND a.stepname IN ('CALLBACK','DATAENTY','DATAREV','PARTIAL','REVIEW','SENDBACK','SUPREV','WBREVIEW')

Are you sure about this bit:

SELECT MAX(at_id)

FROM #audit_trail b

WHERE a.casenum = b.casenum

Because it translates as "pick up MAX(at_id) for each casenum. Rows which don't match our where clause are then discarded". This means that for a casenum which has matches on the where clause, unless the most recent row is a matching row, it will be omitted.

Are you sure you don't mean "pick up MAX(at_id) for each casenum where rows match our where clause"?

Cheers

ChrisM

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

pop SSCrazy Points: 2130 More actions · Answer 7

for each casenum, the 'at_id' field increments. So the record with the highest 'at_id' value for each casenum will contain the most recent step.

What I wanted to do was find out what is the most recent step for each case when I run the Query... and only return those that match my Search Condition (PARTIAL,CALLBACK,etc.), since those are the only steps I'm interested in measuring.

Hope this makes sense. Feel free to correct me, if I'm wrong.:)

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 8

So you want the cases having their most recent step in your list of stepnames? Sorry to be a pain!

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

pop SSCrazy Points: 2130 More actions · Answer 9

yep, you're exactly right.

it's measuring how many cases are in each of those listed steps, at a point in time when I run the query.

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 10

Hi Pop

SELECT a.casenum, a.audit_date, c.audit_date as StartDate, a.stepname

FROM #audit_trail a

INNER JOIN (SELECT MAX(at_id) AS at_id, casenum FROM #audit_trail GROUP BY casenum) b

ON b.casenum = a.casenum and b.at_id = a.at_id

INNER JOIN #audit_trail c

ON c.casenum = a.casenum AND c.type_id = 1 AND c.stepname = 'START'

WHERE a.type_id = 1

AND a.stepname IN ('CALLBACK','DATAENTY','DATAREV','PARTIAL','REVIEW','SENDBACK','SUPREV','WBREVIEW')

This is doing things a little differently, worth a try.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

pop SSCrazy Points: 2130 More actions · Answer 11

Interesting, def a little shorter code-wise....no performance increase though...

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 12

pop (7/1/2008)
Interesting, def a little shorter code-wise....no performance increase though...

How many rows are returned by this...

SELECT a.casenum, a.at_id

FROM #audit_trail a

INNER JOIN (SELECT MAX(at_id) AS at_id, casenum FROM #audit_trail GROUP BY casenum) b

ON b.casenum = a.casenum AND b.at_id = a.at_id

WHERE a.type_id = 1

AND a.stepname IN ('START','CALLBACK','DATAENTY','DATAREV','PARTIAL','REVIEW','SENDBACK','SUPREV','WBREVIEW')

...and how long does it take to run?

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden