help needed with a set based update

Question

help needed with a set based update

Viewing 14 posts - 16 through 29 (of 29 total)

You must be logged in to reply to this topic. Login to reply

kevin mann Hall of Fame Points: 3894 More actions · Answer 1

if the default is 0, then myFlag will never be set to 1 with your query. It is dependent on at least one barcode containing a 1 to start.

Let's wait for John to reply on his needs.

Fred Williams-134192 SSCrazy Points: 2532 More actions · Answer 2

how about:

update mytable set myflag = 1

from mytable a

where not exists (select * from mytable where barcode = a.barcode

and datetimestamp = dateadd(hh, -6, a.datetimestamp))

The problem with this and any other solution is that the very nature of the task will require Order n(squared) comparisons, no matter how things are range limited and indexed.

If you can live with a good approximation, then a simple:

select count(distinct barcode) where datetimestamp between X and Y

would probably suffice.

kevin mann Hall of Fame Points: 3894 More actions · Answer 3

I agree with Fred. If you want an approximation, there's a few quick fixes. I just figure, if you're going to do it only one time, then do mantenance after that, you should do it right. Despite it being 20 mil rcds, I don't think it will take much time (during down time in the evening). Especially if it's on a server made to handle 20 mil records.

John Rempel SSCommitted Points: 1955 More actions · Answer 4

Wow, I'm overwhelmed by the response here. Thanks much. I was out of the office today and will be again tomorrow but I'll try to clear up some questions now.

I think an approximation will be okay. I really don't want to loop if I don't have to and I'm sure I can make it understood what the approximation means if needed. I haven't tried all of the suggestions out on my database yet but I will review them in detail when I get a chance - hopefully on Wednesday.

The approximation I'm referring to is to divide the day into 6 hour chunks and essentially counting the distinct card swipes within each chunk. It's not exactly what the users have asked for but, I expect, at a lot lower cost in terms of my time to implement as well as time to execute.

Thanks again,

John

Jesper-244176 SSCertifiable Points: 7142 More actions · Answer 5

Kevin, I don't agree. Enter the test data below and run my code - you will see that myflag is set to 1 for 5:59 and 12:00. If you remove the 5:59 record (and start over again), then myflag will be set to 1 for 11:58 and not for 12:00.

The point is the "and t2.datetimestamp is null" in connection with the left join. This can be rewritten as a "not exists" query (similar to what you suggested in your first query). I think these queries perform equally fast, so it's a matter of taste which one you prefer. You may also compare our suggestions (left join/not exists) in the thread

http://www.sqlservercentral.com/forums/shwmessage.aspx?forumid=8&messageid=259829

create table [mytable] (

[barcode] varchar(10),

[datetimestamp] datetime,

[myflag] bit

)

go

insert into mytable select 1, '20050101 5:59', 0

insert into mytable select 1, '20050101 11:58', 0

insert into mytable select 1, '20050101 12:00', 0

kevin mann Hall of Fame Points: 3894 More actions · Answer 6

Sorry Jesper...I didn't notice the Left Join in your Sub Query. I think that does the same thing in a loop that I did in my first post (which splits the day into 4 6-hour periods)

I know it's a repeat, but it beats going back a page.

UPDATE myTable

SET myFlag = 1

FROM myTable mt1

WHERE NOT EXISTS

(

SELECT *

FROM myTable mt2

WHERE mt1.barcode = mt2.barcode

AND DATEPART(yy,mt1.dateTimeStamp) = DATEPART(yy,mt2.dateTimeStamp)

AND DATEPART(dy,mt1.dateTimeStamp) = DATEPART(dy,mt2.dateTimeStamp)

AND DATEPART(hh,mt1.dateTimeStamp)/4 = DATEPART(hh,mt2.dateTimeStamp)/4

AND mt1.dateTimeStamp < mt2.dateTimeStamp

)

Jesper-244176 SSCertifiable Points: 7142 More actions · Answer 7

It's not the same. Run your code on the example I posted before

insert into mytable select 1, '20050101 5:59', 0

insert into mytable select 1, '20050101 11:58', 0

insert into mytable select 1, '20050101 12:00', 0

It flags all three rows. My code only flags rows 1 and 3.

The difference is that my code doesn't flag a row if the same user has been flagged during the last 6 hours (at least, that's my intention ). Your code doesn't flag a row if the same user has been flagged during the same time interval (you should probably divide by 6 instead of 4 to obtain 6 hour intervals). The advantage of your suggestion, however, is that it is much faster - no doubt about that

kevin mann Hall of Fame Points: 3894 More actions · Answer 8

24/4 = 6

0:00-5:59 = period 1

6:00-11:59 = period 2

etc.

try on your cursor loop...

10:58 <--gets flag

12:01 <-- min in time group, but no flag

17:58 <--should get flag but doesn't

Jesper-244176 SSCertifiable Points: 7142 More actions · Answer 9

Kevin, I am not sure I understand your first comment, but let me explain in greater detail why I think you should divide by 6 rather than 4.

When you divide numbers 0 through 23 by 4, you have 6 possible results (numbers 0 through 5). When you divide by 6, you have 4 possible results. Division by 4 will result in the following time periods:

0:00-3:59 = period 1

4:00-7:59 = period 2

etc.

That's why I suggested you might want to divide by 6 instead (Sergiy also divides by 6 in his second post).

Regarding your second comment. When I test my loop (which isn't a cursor loop) on your example above (10:58, 12:01 and 17:58), it correctly flags the first and last record. Assuming that you always test other people's code before claiming that it is wrong, I conclude that one of us must have mistyped the data....

kevin mann Hall of Fame Points: 3894 More actions · Answer 10

sorry dude...I was in a bit of a hurry yesterday, so I wasn't able to get all my thoughts clear or do any testing. I try to understand code before I run it (bad habit I guess). I am seing the light on your loop...not cursor loop. only prob I see is the >= in...

t2.datetimestamp >= dateadd(hh, -6, t1.datetimestamp)

0:00 - 6:0 = 6 hrs. 6:00 should be a new flag. But that's minor and I digress....

I created 200K rows with 1000 barcodes and 3 hour increments (20020101 to 20020125) with a nested loop.

The DIV 6 method took 6 seconds.

(I was thinking 24/4 = 4 periods of the day...24/4 = 6 4 hour periods and we want 4! my bad!!!!!)

Your Loop took 4:52 and produced the following pattern...

barcodedatetimestampmyflag

2002002-01-01 00:001

2002002-01-01 03:000

2002002-01-01 06:000

2002002-01-01 09:001

2002002-01-01 12:000

2002002-01-01 15:000

2002002-01-01 18:001

2002002-01-01 21:000

Changing >= to > will clean that up...

barcodedatetimestampmyflag

2002002-01-01 00:001

2002002-01-01 03:000

2002002-01-01 06:001

2002002-01-01 09:000

2002002-01-01 12:001

2002002-01-01 15:000

2002002-01-01 18:001

2002002-01-01 21:000

My cursor loop was at around row 50,000 at 15 min... I'm unsure if the log had anything to do with it. It looked good on paper at first.

So I came up with this in the spirit of both our ideas....

DECLARE @retCnt int

SET @retCnt = 1

WHILE @retCnt > 0

BEGIN

UPDATE t

SET myFlag=1

FROM MyTable t

INNER JOIN

(

SELECT t1.barcode, MIN(datetimestamp) minTime

FROM myTable t1

INNER JOIN

(

SELECT barcode, MAX(t2.datetimestamp) maxFlag

FROM myTable t2

WHERE t2.myFlag = 1

GROUP BY t2.barcode

) t2

ON t1.barcode = t2.barcode

AND t1.datetimestamp > t2.maxFlag

AND DATEADD(hh,-6, t1.datetimestamp) >= t2.maxFlag

AND t1.myFlag = 0

GROUP BY t1.barcode

) dt

ONt.barcode = dt.barcode

ANDt.datetimestamp = dt.minTime

SELECT @retCnt = @@ROWCOUNT

END

...28 seconds!!!!!

kevin mann Hall of Fame Points: 3894 More actions · Answer 11

--this code will set up the test data set

DROP TABLE [mytable]

GO

CREATE TABLE [mytable] (

[barcode] [varchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,

[datetimestamp] [datetime] NOT NULL ,

[myflag] [bit] NULL ,

CONSTRAINT [PK_mytable] PRIMARY KEY CLUSTERED

(

[barcode],

[datetimestamp]

) ON [PRIMARY]

GO

declare @cnt bigint

declare @cnt2 bigint

set @cnt = 0

WHILE @cnt < 1000 --barcodes

BEGIN

set @cnt2 = 0

WHILE @cnt2 < 200 --swipes

BEGIN

insert mytable select @cnt, DATEADD(hh, @cnt2 * 3, '20020101'), 0

SET @cnt2 = @cnt2+1

END --inner loop

SET @cnt = @cnt+1

END --outer loop

select count(*) from mytable --200,000

Jesper-244176 SSCertifiable Points: 7142 More actions · Answer 12

Kevin, very interesting... I couldn't get your loop started, so I pre-added some code... Also, I think "AND t1.datetimestamp > t2.maxFlag" is redundant in view of the next line of code: "AND DATEADD(hh,-6, t1.datetimestamp) >= t2.maxFlag". It might also be faster to replace this by "AND t1.datetimestamp >= dateadd(hh, 6, t2.maxFlag)", as t1 has more rows than t2.

So after having messed up your code a bit I ended up with the following:

UPDATE t

SET myFlag=1

FROM MyTable t

INNER JOIN

(

SELECT t1.barcode, MIN(t1.datetimestamp) minTime

FROM myTable t1

GROUP BY t1.barcode

) dt

ON t.barcode = dt.barcode

AND t.datetimestamp = dt.minTime

WHILE @@ROWCOUNT > 0

BEGIN

UPDATE t

SET myFlag=1

FROM MyTable t

INNER JOIN

(

SELECT t1.barcode, MIN(t1.datetimestamp) minTime

FROM myTable t1

INNER JOIN

(

SELECT barcode, MAX(t2.datetimestamp) maxFlag

FROM myTable t2

WHERE t2.myFlag = 1

GROUP BY t2.barcode

) t2

ON t1.barcode = t2.barcode

AND t1.datetimestamp >= dateadd(hh, 6, t2.maxFlag)

AND t1.myFlag = 0

GROUP BY t1.barcode

) dt

ON t.barcode = dt.barcode

AND t.datetimestamp = dt.minTime

END

I had my doubts about the performance of this triple join, but it turns out that it is faster than my code... And not only a little faster, it's 8-10 times faster. I think this is because my join involves larger tables. The second (I would normally say inner, but that's a bad word in this context ) join in your code reduces the size of the right table in your first (outer ) join... Also, I think that inner joins are in general faster than left joins.

Whether it should be > or >=... Well, it matters with your testdata, but I guess it matters very little with "real" data... >= is OK with me.

Jesper-244176 SSCertifiable Points: 7142 More actions · Answer 13

A side remark: Perfromance while setting up test data is probably not something one worries much about... But it is faster (and easier, I think) to use a Numbers table:

SELECT TOP 10000 n = IDENTITY(INT, 1, 1) INTO Numbers

FROM

sysobjects a1

CROSS JOIN

sysobjects a2

CROSS JOIN

sysobjects a3

ALTER TABLE Numbers

ADD CONSTRAINT Index_Numbers PRIMARY KEY CLUSTERED(n)

GO

After constructing this table and mytable, run this:

insert mytable select t1.n, DATEADD(hh, t2.n * 3, '20020101'), 0

from Numbers t1 cross join Numbers t2

where t1.n <= 1000 and t2.n <= 200

kevin mann Hall of Fame Points: 3894 More actions · Answer 14

>= vs > ...

the only difference was the pattern 100100 and 101010 respectively.

not sure why you couldn't get it started. I cut and pasted the query.

thank you for the number generator code. I'll check that out sometime. Looks like a good way to time/calendar table on the fly as well.

cheers, Kevin

help needed with a set based update

Cookies on SQLServerCentral