help needed with a set based update

  • if the default is 0, then myFlag will never be set to 1 with your query. It is dependent on at least one barcode containing a 1 to start.

    Let's wait for John to reply on his needs.

  • I agree with Fred. If you want an approximation, there's a few quick fixes. I just figure, if you're going to do it only one time, then do mantenance after that, you should do it right. Despite it being 20 mil rcds, I don't think it will take much time (during down time in the evening). Especially if it's on a server made to handle 20 mil records.

  • Wow, I'm overwhelmed by the response here. Thanks much. I was out of the office today and will be again tomorrow but I'll try to clear up some questions now.

    I think an approximation will be okay. I really don't want to loop if I don't have to and I'm sure I can make it understood what the approximation means if needed. I haven't tried all of the suggestions out on my database yet but I will review them in detail when I get a chance - hopefully on Wednesday.

    The approximation I'm referring to is to divide the day into 6 hour chunks and essentially counting the distinct card swipes within each chunk. It's not exactly what the users have asked for but, I expect, at a lot lower cost in terms of my time to implement as well as time to execute.

    Thanks again,

    John

  • Kevin, I don't agree. Enter the test data below and run my code - you will see that myflag is set to 1 for 5:59 and 12:00. If you remove the 5:59 record (and start over again), then myflag will be set to 1 for 11:58 and not for 12:00.

    The point is the "and t2.datetimestamp is null" in connection with the left join. This can be rewritten as a "not exists" query (similar to what you suggested in your first query). I think these queries perform equally fast, so it's a matter of taste which one you prefer. You may also compare our suggestions (left join/not exists) in the thread

    http://www.sqlservercentral.com/forums/shwmessage.aspx?forumid=8&messageid=259829

     

    create table [mytable] (

    [barcode] varchar(10),

    [datetimestamp] datetime,

    [myflag] bit

    )

    go

    insert into mytable select 1, '20050101 5:59', 0

    insert into mytable select 1, '20050101 11:58', 0

    insert into mytable select 1, '20050101 12:00', 0

  • Sorry Jesper...I didn't notice the Left Join in your Sub Query. I think that does the same thing in a loop that I did in my first post (which splits the day into 4 6-hour periods)

    I know it's a repeat, but it beats going back a page.

    UPDATE myTable

    SET myFlag = 1

    FROM myTable mt1

    WHERE NOT EXISTS

    (

    SELECT *

    FROM myTable mt2

    WHERE mt1.barcode = mt2.barcode

    AND DATEPART(yy,mt1.dateTimeStamp) = DATEPART(yy,mt2.dateTimeStamp)

    AND DATEPART(dy,mt1.dateTimeStamp) = DATEPART(dy,mt2.dateTimeStamp)

    AND DATEPART(hh,mt1.dateTimeStamp)/4 = DATEPART(hh,mt2.dateTimeStamp)/4

    AND mt1.dateTimeStamp < mt2.dateTimeStamp

    )

  • It's not the same. Run your code on the example I posted before

    insert into mytable select 1, '20050101 5:59', 0

    insert into mytable select 1, '20050101 11:58', 0

    insert into mytable select 1, '20050101 12:00', 0

    It flags all three rows. My code only flags rows 1 and 3.

    The difference is that my code doesn't flag a row if the same user has been flagged during the last 6 hours (at least, that's my intention ). Your code doesn't flag a row if the same user has been flagged during the same time interval (you should probably divide by 6 instead of 4 to obtain 6 hour intervals). The advantage of your suggestion, however, is that it is much faster - no doubt about that

     

  • 24/4 = 6

    0:00-5:59 = period 1

    6:00-11:59 = period 2

    etc.

    try on your cursor loop...

    10:58 <--gets flag

    12:01 <-- min in time group, but no flag

    17:58 <--should get flag but doesn't

  • Kevin, I am not sure I understand your first comment, but let me explain in greater detail why I think you should divide by 6 rather than 4.

    When you divide numbers 0 through 23 by 4, you have 6 possible results (numbers 0 through 5). When you divide by 6, you have 4 possible results. Division by 4 will result in the following time periods:

    0:00-3:59 = period 1

    4:00-7:59 = period 2

    etc.

    That's why I suggested you might want to divide by 6 instead (Sergiy also divides by 6 in his second post).

    Regarding your second comment. When I test my loop (which isn't a cursor loop) on your example above (10:58, 12:01 and 17:58), it correctly flags the first and last record. Assuming that you always test other people's code before claiming that it is wrong, I conclude that one of us must have mistyped the data....

     

  • sorry dude...I was in a bit of a hurry yesterday, so I wasn't able to get all my thoughts clear or do any testing. I try to understand code before I run it (bad habit I guess). I am seing the light on your loop...not cursor loop. only prob I see is the >= in...

    t2.datetimestamp >= dateadd(hh, -6, t1.datetimestamp)

    0:00 - 6:0 = 6 hrs. 6:00 should be a new flag. But that's minor and I digress....

    I created 200K rows with 1000 barcodes and 3 hour increments (20020101 to 20020125) with a nested loop.

    The DIV 6 method took 6 seconds.

    (I was thinking 24/4 = 4 periods of the day...24/4 = 6 4 hour periods and we want 4! my bad!!!!!)

    Your Loop took 4:52 and produced the following pattern...

    barcodedatetimestampmyflag

    2002002-01-01 00:001

    2002002-01-01 03:000

    2002002-01-01 06:000

    2002002-01-01 09:001

    2002002-01-01 12:000

    2002002-01-01 15:000

    2002002-01-01 18:001

    2002002-01-01 21:000

    Changing >= to > will clean that up...

    barcodedatetimestampmyflag

    2002002-01-01 00:001

    2002002-01-01 03:000

    2002002-01-01 06:001

    2002002-01-01 09:000

    2002002-01-01 12:001

    2002002-01-01 15:000

    2002002-01-01 18:001

    2002002-01-01 21:000

    My cursor loop was at around row 50,000 at 15 min... I'm unsure if the log had anything to do with it. It looked good on paper at first.

    So I came up with this in the spirit of both our ideas....

    DECLARE @retCnt int

    SET @retCnt = 1

    WHILE @retCnt > 0

    BEGIN

    UPDATE t

    SET myFlag=1

    FROM MyTable t

    INNER JOIN

    (

    SELECT t1.barcode, MIN(datetimestamp) minTime

    FROM myTable t1

    INNER JOIN

    (

    SELECT barcode, MAX(t2.datetimestamp) maxFlag

    FROM myTable t2

    WHERE t2.myFlag = 1

    GROUP BY t2.barcode

    ) t2

    ON t1.barcode = t2.barcode

    AND t1.datetimestamp > t2.maxFlag

    AND DATEADD(hh,-6, t1.datetimestamp) >= t2.maxFlag

    AND t1.myFlag = 0

    GROUP BY t1.barcode

    ) dt

    ONt.barcode = dt.barcode

    ANDt.datetimestamp = dt.minTime

    SELECT @retCnt = @@ROWCOUNT

    END

    ...28 seconds!!!!!

  • --this code will set up the test data set

    DROP TABLE [mytable]

    GO

    CREATE TABLE [mytable] (

    [barcode] [varchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,

    [datetimestamp] [datetime] NOT NULL ,

    [myflag] [bit] NULL ,

    CONSTRAINT [PK_mytable] PRIMARY KEY CLUSTERED

    (

    [barcode],

    [datetimestamp]

    ) ON [PRIMARY]

    ) ON [PRIMARY]

    GO

    declare @cnt bigint

    declare @cnt2 bigint

    set @cnt = 0

    WHILE @cnt < 1000 --barcodes

    BEGIN

    set @cnt2 = 0

    WHILE @cnt2 < 200 --swipes

    BEGIN

    insert mytable select @cnt, DATEADD(hh, @cnt2 * 3, '20020101'), 0

    SET @cnt2 = @cnt2+1

    END --inner loop

    SET @cnt = @cnt+1

    END --outer loop

    select count(*) from mytable --200,000

  • Kevin, very interesting... I couldn't get your loop started, so I pre-added some code... Also, I think "AND t1.datetimestamp > t2.maxFlag" is redundant in view of the next line of code: "AND DATEADD(hh,-6, t1.datetimestamp) >= t2.maxFlag". It might also be faster to replace this by "AND t1.datetimestamp >= dateadd(hh, 6, t2.maxFlag)", as t1 has more rows than t2.

    So after having messed up your code a bit I ended up with the following:

    UPDATE t

    SET myFlag=1

    FROM MyTable t

    INNER JOIN

    (

    SELECT t1.barcode, MIN(t1.datetimestamp) minTime

    FROM myTable t1

    GROUP BY t1.barcode

    ) dt

    ON t.barcode = dt.barcode

    AND t.datetimestamp = dt.minTime

    WHILE @@ROWCOUNT > 0

    BEGIN

    UPDATE t

    SET myFlag=1

    FROM MyTable t

    INNER JOIN

    (

    SELECT t1.barcode, MIN(t1.datetimestamp) minTime

    FROM myTable t1

    INNER JOIN

    (

    SELECT barcode, MAX(t2.datetimestamp) maxFlag

    FROM myTable t2

    WHERE t2.myFlag = 1

    GROUP BY t2.barcode

    ) t2

    ON t1.barcode = t2.barcode

    AND t1.datetimestamp >= dateadd(hh, 6, t2.maxFlag)

    AND t1.myFlag = 0

    GROUP BY t1.barcode

    ) dt

    ON t.barcode = dt.barcode

    AND t.datetimestamp = dt.minTime

    END

    I had my doubts about the performance of this triple join, but it turns out that it is faster than my code... And not only a little faster, it's 8-10 times faster. I think this is because my join involves larger tables. The second (I would normally say inner, but that's a bad word in this context ) join in your code reduces the size of the right table in your first (outer ) join... Also, I think that inner joins are in general faster than left joins.

    Whether it should be > or >=... Well, it matters with your testdata, but I guess it matters very little with "real" data... >= is OK with me.

  • A side remark: Perfromance while setting up test data is probably not something one worries much about... But it is faster (and easier, I think) to use a Numbers table:

    SELECT TOP 10000 n = IDENTITY(INT, 1, 1) INTO Numbers

    FROM

        sysobjects a1

      CROSS JOIN

        sysobjects a2

      CROSS JOIN

        sysobjects a3

    ALTER TABLE Numbers

            ADD CONSTRAINT Index_Numbers PRIMARY KEY CLUSTERED(n)

    GO

    After constructing this table and mytable, run this:

    insert mytable select t1.n, DATEADD(hh, t2.n * 3, '20020101'), 0

    from Numbers t1 cross join Numbers t2

    where t1.n <= 1000 and t2.n <= 200

     

  • >= vs > ...

    the only difference was the pattern 100100 and 101010 respectively.

    not sure why you couldn't get it started. I cut and pasted the query.

    thank you for the number generator code. I'll check that out sometime. Looks like a good way to time/calendar table on the fly as well.

    cheers, Kevin

Viewing 14 posts - 16 through 28 (of 28 total)

You must be logged in to reply to this topic. Login to reply