February 18, 2011 at 4:37 am
Hi,
I am having some serious performance issues when trying to represent time and attendance data from our Access control system. The system keeps a archive table which logs all In and Out (This includes internal doors, which is not of interest currently) occurances for all staff members on site.
The table structure is as follows:
CREATE TABLE #myTable(
[x_hist_type] [int] NULL,
[x_panel_name] [char](32) NULL,
[x_term_name] [char](32) NULL,
[x_badge_number] [char](20) NULL,
[x_timestamp] [datetime] NOT NULL,
[x_guid] [uniqueidentifier] ROWGUIDCOL NOT NULL,
PRIMARY KEY NONCLUSTERED
(
[x_guid] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
Sample Data:
INSERT INTO #myTable
(x_hist_type,x_panel_name,x_term_name,x_badge_number,x_timestamp,x_guid)
SELECT '68','Turnstiles ','Turnstile In ','508 ','Aug 2 2010 7:20AM','0D9D5265-85BE-46A9-96BC-359F997BF493' UNION ALL
SELECT '68','Turnstiles ','Turnstile Out ','508 ','Aug 2 2010 8:47AM','83395CFE-2BC9-4192-8CE2-7193CF3996DF' UNION ALL
SELECT '68','Turnstiles ','Turnstile In ','508 ','Aug 2 2010 8:50AM','D60A9DD6-5312-428B-ADA7-4FDAB9866DF8' UNION ALL
SELECT '68','Turnstiles ','Turnstile Out ','508 ','Aug 2 2010 4:13PM','BC5ADC5A-48A2-463F-BC68-BAA670ACC712' UNION ALL
SELECT '68','Turnstiles ','Turnstile In ','508 ','Aug 3 2010 7:13AM','CD18BA98-EA51-4B8E-BDD7-BD96CA5BC924' UNION ALL
SELECT '68','Turnstiles ','Turnstile Out ','508 ','Aug 3 2010 9:41AM','BF7F845F-E931-4FAC-8181-D62C51D898F3' UNION ALL
SELECT '68','Turnstiles ','Turnstile Out ','508 ','Aug 3 2010 9:41AM','F57CAA2D-0539-41F4-A489-3B13BB37B7C5' UNION ALL
SELECT '68','Turnstiles ','Turnstile In ','508 ','Aug 3 2010 11:34AM','0CB79832-0A36-4F2D-96D2-FEF85CD15992' UNION ALL
SELECT '68','Turnstiles ','Turnstile Out ','508 ','Aug 3 2010 5:15PM','C5A93520-408E-4B95-B63A-953FC9999578' UNION ALL
SELECT '68','Turnstiles ','Turnstile In ','508 ','Aug 4 2010 7:21AM','891DD8F0-8099-46EF-B4B5-9AEBB90C1981' UNION ALL
SELECT '68','Turnstiles ','Turnstile Out ','508 ','Aug 4 2010 9:50AM','898AF8EC-2C52-4D1C-85E7-ADEDDFACE325' UNION ALL
SELECT '68','Turnstiles ','Turnstile In ','508 ','Aug 4 2010 12:17PM','6BC94FDF-D15F-4410-BF0B-CA8EC218596B' UNION ALL
SELECT '68','Turnstiles ','Turnstile Out ','508 ','Aug 4 2010 4:40PM','10052249-BD44-499C-BCED-0669E2319B88'
This sample data represents a common problem which we have. Sometimes when a terminal is scanned it doesn't release the door so the terminal is scanned a second time which causes two entries with a couple of seconds between them.
The code I have generated to handle this data is as follows, it gives the correct results but takes a long time to complete for 1 user nevermind the entire factory.
WITH cte_1 -- order by time and spilt to InTime, OutTime
AS (SELECT
x_badge_number
,case WHEN rtrim(x_term_name) LIKE '%IN' THEN x_timestamp ELSE NULL END AS InTime
,case WHEN rtrim(x_term_name) LIKE '%OUT' THEN x_timestamp ELSE NULL END AS OutTime
,case when rtrim(x_term_name) LIKE '%IN' THEN 'IN' ELSE 'OUT' END AS Terminal
,row_number() OVER (PARTITION BY x_badge_number ORDER BY x_timestamp ASC ) AS rn
FROM
#myTable
WHERE x_panel_name = 'turnstiles' AND x_hist_type = 68 -- and x_badge_number IN (3518,506)
)
,
cte_2 -- mark those that repeat
AS (SELECT
t.x_badge_number
,t.InTime
,t.OutTime
,t.Terminal
,t.rn
,case( SELECT Terminal FROM cte_1 AS x WHERE x.rn = t.rn - 1 AND x.x_badge_number = t.x_badge_number) WHEN t.Terminal THEN 1 ELSE 0 END AS mrk
FROM cte_1 AS t
)
,
cte_3 --extract non repeats and group
AS (SELECT
x_badge_number, InTime, OutTime, Terminal, rn, mrk
,row_number() OVER ( PARTITION BY Terminal ORDER BY x_badge_number,rn ASC ) AS rn2
FROM cte_2
WHERE mrk = 0
)
SELECT
t1.x_badge_number
,t1.InTime
,t2.Outtime
,datediff(MINUTE, t1.InTime, t2.OutTime) AS Duration
FROM
cte_3 AS t1
INNER JOIN cte_3 AS t2 ON t1.rn2 = t2.rn2
WHERE
t1.Intime IS NOT NULL
AND t2.Outtime IS NOT NULL
ORDER BY
t1.rn
This is my first post so i hope I gave included all the required information.
Oh, the output from this is like as follows:
3518 2010-07-06 07:56:12.0002010-07-06 16:51:21.000535
3518 2010-07-08 08:02:50.0002010-07-08 16:50:33.000528
3518 2010-07-09 08:11:32.0002010-07-09 16:47:36.000516
3518 2010-07-12 08:02:27.0002010-07-12 16:40:16.000518
3518 2010-07-13 07:51:40.0002010-07-13 16:46:05.000535
3518 2010-07-14 08:06:37.0002010-07-14 17:03:59.000537
3518 2010-07-15 08:09:20.0002010-07-15 16:44:18.000515
Is there anything that I could do too improve the speed of this code?
Thanks,
Anthony
February 18, 2011 at 6:55 am
I've found through a lot of testing that the use of ROW_NUMBER when returning more than a few rows has a tendency to degrade faster than using a TOP with an ORDER BY, assuming you have a good index for ordering the data. I posted the tests here at the PASS web site (it requires a login, sorry, but it's free to set up). I've updated the article and it'll be coming out in Simple-Talk within a week or two. Anyway, I'd try rewriting with the TOP statement. Other than that, your approach looks good to me.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
Viewing 2 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply