February 20, 2006 at 8:23 am
if the default is 0, then myFlag will never be set to 1 with your query. It is dependent on at least one barcode containing a 1 to start.
Let's wait for John to reply on his needs.
February 20, 2006 at 4:29 pm
how about:
update mytable set myflag = 1
from mytable a
where not exists (select * from mytable where barcode = a.barcode
and datetimestamp = dateadd(hh, -6, a.datetimestamp))
The problem with this and any other solution is that the very nature of the task will require Order n(squared) comparisons, no matter how things are range limited and indexed.
If you can live with a good approximation, then a simple:
select count(distinct barcode) where datetimestamp between X and Y
would probably suffice.
February 20, 2006 at 8:57 pm
I agree with Fred. If you want an approximation, there's a few quick fixes. I just figure, if you're going to do it only one time, then do mantenance after that, you should do it right. Despite it being 20 mil rcds, I don't think it will take much time (during down time in the evening). Especially if it's on a server made to handle 20 mil records.
February 20, 2006 at 9:25 pm
Wow, I'm overwhelmed by the response here. Thanks much. I was out of the office today and will be again tomorrow but I'll try to clear up some questions now.
I think an approximation will be okay. I really don't want to loop if I don't have to and I'm sure I can make it understood what the approximation means if needed. I haven't tried all of the suggestions out on my database yet but I will review them in detail when I get a chance - hopefully on Wednesday.
The approximation I'm referring to is to divide the day into 6 hour chunks and essentially counting the distinct card swipes within each chunk. It's not exactly what the users have asked for but, I expect, at a lot lower cost in terms of my time to implement as well as time to execute.
Thanks again,
John
February 21, 2006 at 1:40 am
Kevin, I don't agree. Enter the test data below and run my code - you will see that myflag is set to 1 for 5:59 and 12:00. If you remove the 5:59 record (and start over again), then myflag will be set to 1 for 11:58 and not for 12:00.
The point is the "and t2.datetimestamp is null" in connection with the left join. This can be rewritten as a "not exists" query (similar to what you suggested in your first query). I think these queries perform equally fast, so it's a matter of taste which one you prefer. You may also compare our suggestions (left join/not exists) in the thread
http://www.sqlservercentral.com/forums/shwmessage.aspx?forumid=8&messageid=259829
create table [mytable] (
[barcode] varchar(10),
[datetimestamp] datetime,
[myflag] bit
)
go
insert into mytable select 1, '20050101 5:59', 0
insert into mytable select 1, '20050101 11:58', 0
insert into mytable select 1, '20050101 12:00', 0
February 21, 2006 at 7:31 am
Sorry Jesper...I didn't notice the Left Join in your Sub Query. I think that does the same thing in a loop that I did in my first post (which splits the day into 4 6-hour periods)
I know it's a repeat, but it beats going back a page.
UPDATE myTable
SET myFlag = 1
FROM myTable mt1
WHERE NOT EXISTS
(
SELECT *
FROM myTable mt2
WHERE mt1.barcode = mt2.barcode
AND DATEPART(yy,mt1.dateTimeStamp) = DATEPART(yy,mt2.dateTimeStamp)
AND DATEPART(dy,mt1.dateTimeStamp) = DATEPART(dy,mt2.dateTimeStamp)
AND DATEPART(hh,mt1.dateTimeStamp)/4 = DATEPART(hh,mt2.dateTimeStamp)/4
AND mt1.dateTimeStamp < mt2.dateTimeStamp
)
February 22, 2006 at 1:19 am
It's not the same. Run your code on the example I posted before
insert into mytable select 1, '20050101 5:59', 0
insert into mytable select 1, '20050101 11:58', 0
insert into mytable select 1, '20050101 12:00', 0
It flags all three rows. My code only flags rows 1 and 3.
The difference is that my code doesn't flag a row if the same user has been flagged during the last 6 hours (at least, that's my intention  ). Your code doesn't flag a row if the same user has been flagged during the same time interval (you should probably divide by 6 instead of 4 to obtain 6 hour intervals). The advantage of your suggestion, however, is that it is much faster - no doubt about that
 ). Your code doesn't flag a row if the same user has been flagged during the same time interval (you should probably divide by 6 instead of 4 to obtain 6 hour intervals). The advantage of your suggestion, however, is that it is much faster - no doubt about that 
February 22, 2006 at 8:22 am
24/4 = 6
0:00-5:59 = period 1
6:00-11:59 = period 2
etc.
try on your cursor loop...
10:58 <--gets flag
12:01 <-- min in time group, but no flag
17:58 <--should get flag but doesn't
February 23, 2006 at 4:59 am
Kevin, I am not sure I understand your first comment, but let me explain in greater detail why I think you should divide by 6 rather than 4.
When you divide numbers 0 through 23 by 4, you have 6 possible results (numbers 0 through 5). When you divide by 6, you have 4 possible results. Division by 4 will result in the following time periods:
0:00-3:59 = period 1
4:00-7:59 = period 2
etc.
That's why I suggested you might want to divide by 6 instead (Sergiy also divides by 6 in his second post).
Regarding your second comment. When I test my loop (which isn't a cursor loop) on your example above (10:58, 12:01 and 17:58), it correctly flags the first and last record. Assuming that you always test other people's code before claiming that it is wrong, I conclude that one of us must have mistyped the data.... 
February 24, 2006 at 9:44 am
sorry dude...I was in a bit of a hurry yesterday, so I wasn't able to get all my thoughts clear or do any testing. I try to understand code before I run it (bad habit I guess). I am seing the light on your loop...not cursor loop. only prob I see is the >= in...
t2.datetimestamp >= dateadd(hh, -6, t1.datetimestamp)
0:00 - 6:0 = 6 hrs. 6:00 should be a new flag. But that's minor and I digress....
I created 200K rows with 1000 barcodes and 3 hour increments (20020101 to 20020125) with a nested loop.
The DIV 6 method took 6 seconds.
(I was thinking 24/4 = 4 periods of the day...24/4 = 6 4 hour periods and we want 4! my bad!!!!!)
Your Loop took 4:52 and produced the following pattern...
barcodedatetimestampmyflag
2002002-01-01 00:001
2002002-01-01 03:000
2002002-01-01 06:000
2002002-01-01 09:001
2002002-01-01 12:000
2002002-01-01 15:000
2002002-01-01 18:001
2002002-01-01 21:000
Changing >= to > will clean that up...
barcodedatetimestampmyflag
2002002-01-01 00:001
2002002-01-01 03:000
2002002-01-01 06:001
2002002-01-01 09:000
2002002-01-01 12:001
2002002-01-01 15:000
2002002-01-01 18:001
2002002-01-01 21:000
My cursor loop was at around row 50,000 at 15 min... I'm unsure if the log had anything to do with it. It looked good on paper at first.
So I came up with this in the spirit of both our ideas....
DECLARE @retCnt int
SET @retCnt = 1
WHILE @retCnt > 0
BEGIN
UPDATE t
SET myFlag=1
FROM MyTable t
INNER JOIN
(
SELECT t1.barcode, MIN(datetimestamp) minTime
FROM myTable t1
INNER JOIN
(
SELECT barcode, MAX(t2.datetimestamp) maxFlag
FROM myTable t2
WHERE t2.myFlag = 1
GROUP BY t2.barcode
) t2
ON t1.barcode = t2.barcode
AND t1.datetimestamp > t2.maxFlag
AND DATEADD(hh,-6, t1.datetimestamp) >= t2.maxFlag
AND t1.myFlag = 0
GROUP BY t1.barcode
) dt
ONt.barcode = dt.barcode
ANDt.datetimestamp = dt.minTime
SELECT @retCnt = @@ROWCOUNT
END
...28 seconds!!!!!
February 24, 2006 at 12:36 pm
--this code will set up the test data set
DROP TABLE [mytable]
GO
CREATE TABLE [mytable] (
[barcode] [varchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[datetimestamp] [datetime] NOT NULL ,
[myflag] [bit] NULL ,
CONSTRAINT [PK_mytable] PRIMARY KEY CLUSTERED
(
[barcode],
[datetimestamp]
) ON [PRIMARY]
) ON [PRIMARY]
GO
declare @cnt bigint
declare @cnt2 bigint
set @cnt = 0
WHILE @cnt < 1000 --barcodes
BEGIN
set @cnt2 = 0
WHILE @cnt2 < 200 --swipes
BEGIN
insert mytable select @cnt, DATEADD(hh, @cnt2 * 3, '20020101'), 0
SET @cnt2 = @cnt2+1
END --inner loop
SET @cnt = @cnt+1
END --outer loop
select count(*) from mytable --200,000
February 27, 2006 at 2:53 am
Kevin, very interesting...  I couldn't get your loop started, so I pre-added some code... Also, I think "AND t1.datetimestamp > t2.maxFlag" is redundant in view of the next line of code: "AND DATEADD(hh,-6, t1.datetimestamp) >= t2.maxFlag". It might also be faster to replace this by "AND t1.datetimestamp >= dateadd(hh, 6, t2.maxFlag)", as t1 has more rows than t2.
 I couldn't get your loop started, so I pre-added some code... Also, I think "AND t1.datetimestamp > t2.maxFlag" is redundant in view of the next line of code: "AND DATEADD(hh,-6, t1.datetimestamp) >= t2.maxFlag". It might also be faster to replace this by "AND t1.datetimestamp >= dateadd(hh, 6, t2.maxFlag)", as t1 has more rows than t2.
So after having messed up your code a bit I ended up with the following:
UPDATE t
SET myFlag=1
FROM MyTable t
INNER JOIN
(
SELECT t1.barcode, MIN(t1.datetimestamp) minTime
FROM myTable t1
GROUP BY t1.barcode
) dt
ON t.barcode = dt.barcode
AND t.datetimestamp = dt.minTime
WHILE @@ROWCOUNT > 0
BEGIN
UPDATE t
SET myFlag=1
FROM MyTable t
INNER JOIN
(
SELECT t1.barcode, MIN(t1.datetimestamp) minTime
FROM myTable t1
INNER JOIN
(
SELECT barcode, MAX(t2.datetimestamp) maxFlag
FROM myTable t2
WHERE t2.myFlag = 1
GROUP BY t2.barcode
) t2
ON t1.barcode = t2.barcode
AND t1.datetimestamp >= dateadd(hh, 6, t2.maxFlag)
AND t1.myFlag = 0
GROUP BY t1.barcode
) dt
ON t.barcode = dt.barcode
AND t.datetimestamp = dt.minTime
END
I had my doubts about the performance of this triple join, but it turns out that it is faster than my code... And not only a little faster, it's 8-10 times faster. I think this is because my join involves larger tables. The second (I would normally say inner, but that's a bad word in this context  ) join in your code reduces the size of the right table in your first (outer
 ) join in your code reduces the size of the right table in your first (outer  ) join... Also, I think that inner joins are in general faster than left joins.
 ) join... Also, I think that inner joins are in general faster than left joins.
Whether it should be > or >=... Well, it matters with your testdata, but I guess it matters very little with "real" data... >= is OK with me.
February 27, 2006 at 3:03 am
A side remark: Perfromance while setting up test data is probably not something one worries much about... But it is faster (and easier, I think) to use a Numbers table:
SELECT TOP 10000 n = IDENTITY(INT, 1, 1) INTO Numbers
FROM
sysobjects a1
CROSS JOIN
sysobjects a2
CROSS JOIN
sysobjects a3
ALTER TABLE Numbers
ADD CONSTRAINT Index_Numbers PRIMARY KEY CLUSTERED(n)
GO
After constructing this table and mytable, run this:
insert mytable select t1.n, DATEADD(hh, t2.n * 3, '20020101'), 0
from Numbers t1 cross join Numbers t2
where t1.n <= 1000 and t2.n <= 200
February 27, 2006 at 7:50 am
>= vs > ...
the only difference was the pattern 100100 and 101010 respectively.
not sure why you couldn't get it started. I cut and pasted the query.
thank you for the number generator code. I'll check that out sometime. Looks like a good way to time/calendar table on the fly as well.
cheers, Kevin
Viewing 14 posts - 16 through 29 (of 29 total)
You must be logged in to reply to this topic. Login to reply