help needed with a set based update

  • I'm trying to write a set based update that sets a bit flag to 1 but ensuring that only one row can be flagged per distinct barcode per a 6 hour time span. A simplified version of the table in question is defined as [mytable] below.

    create table [mytable] {

    [barcode] varchar(10),

    [datetimestamp] datetime,

    [myflag] bit

    )

    The actual table has about 20 million rows and 640 thousand distinct barcodes. Oh, and it's running on SQL Server 2005.

    The table is essentially a log of membership card swipes on entry to our facilities and we want to count how many people were admitted during various time spans but we do not want to count multiple swipes of the same card within a 6 hour period. So if someone was to swipe in and then leave the building to pick up a forgotten item in the car and then come in again we want to count the first swipe only. Or if someone does a double swipe on a single entry - that counts as one.

    My idea is to create a bit column ([myflag] above) that we could periodically (probably nightly) update to identify which swipes would actually count in that type of query while leaving the raw data intact. The problem is that everything I come up with turns into a cursor process and I'd rather avoid that. I'm new to SQL Server 2005 so there might be a new easy way to accomplish something like this that I'm unaware of. If not, there's probably still a way to avoid cursors.

    Thanks in advance,

    John

  • Actually you don't need any flag.

    Select count(T1.barcode) as NumberOfMembers, dateadd(dd, T2.DayNumber, 0) as DateRecorded

    FROM MyTable T1

    INNER JOIN (

            select barcode, datediff(dd, 0, datetimestamp) DayNumber -- if your time span is 1 day

            from MyTable

            group by barcode, datediff(dd, 0, datetimestamp)

             ) T2 on T1.barcode = T2.Barcode and datediff(dd, 0, T1.datetimestamp) = T2.DayNumber

    It's good to have computed column datediff(dd, 0, datetimestamp) and set an index on it. This will make respond from your query really fast.

    You may set as many additional columns as types of periods you are using - days, weeks, decades, months, etc. Becuse your field datetimestamp not to be updated it will not affect performance.

    _____________
    Code for TallyGenerator

  • Sergiy query looks more plausible.

    *edit just a test-query

    This code should be tested first

    DECLARE @begin datetime

    DECLARE @end datetime

    select ALLSWEEPS.barcode,ALLSWEEPS.ALL_COUNTS-ISNULL(SWEEPDOUBLES.DOUBLES,0)

    FROM

    (

    select barcode,count_BIG(*) AS ALL_COUNTS all /*all counts*/

    from mytable

    where datetimestamp between @begin and @end /*within period*/

    group by barcode

    ) AS ALLSWEEPS

    LEFT join

    (

    select barcode, count_BIG(*) AS DOUBLES

    from mytable t1

    inner join mytable t2

    on t1.barcode=t2.barcode

    /*everything that should be discarded*/

    and t1.datetimestamp >= t2.datetimestamp and

    t2.datetimestamp<dateadd(h,t1.datetimestamp,6)

    where t1.datetimestamp BETWEEN @begin AND @end /*within period*/

    group by t1.barcode

    having count_BIG(*)>1 /*doubles*/

    ) AS SWEEPDOUBLES

    ON ALLSWEEPS.barcode=SWEEPDOUBLES.barcode

  • Thanks Sergiy. I tried your query and get "Column 'T2.DayNumber' is invalid in the select list because it is not contained in an aggregate function and there is no GROUP BY clause."

    Btw, in your approach how would you implement a 6 hour time span instead of 1 day?

    John

  • Add

    GROUP BY T2.DayNumber

    at the end of my script.

    For 6 hours you can use

    datediff(hh, 0, datetimestamp) / 6

    Because all values are int it will return number of 6 hours periods passed.

    For dateadd you need to add * 6.

    It's better to have it in UDF with 3 parameters:

    @datetimestamp datetime, @PeriodType varchar, @PeriodDuration int

    _____________
    Code for TallyGenerator

  • UPDATE myTable

    SET myFlag = 1

    FROM myTable mt1

    WHERE NOT EXISTS

    (

    SELECT *

    FROM myTable mt2

    WHERE mt1.marcode

    AND DATEPART(yy,mt1.dateTimeStamp) = DATEPART(yy,mt2.dateTimeStamp)

    AND DATEPART(dy,mt1.dateTimeStamp) = DATEPART(dy,mt2.dateTimeStamp)

    AND DATEPART(hh,mt1.dateTimeStamp)/4 = DATEPART(hh,mt2.dateTimeStamp)/4

    AND mt1.dateTimeStamp < mt2.dateTimeStamp

    )

  • OK...just thinking about this a little more. If the above isn't what you had in mind, the next course of action gets trickier if you want to go rhetro-active.

    CREATE TRIGGER tr_MyFlag ON myTable FOR INSERT, UPDATE

    UPDATE myTable

    SET myFlag = 1

    FROM INSERTED mt1

    WHERE myFlag = 0

    AND NOT EXISTS

    (

    SELECT *

    FROM myTable mt2

    WHERE mt2.mFlag = 1

    AND mt1.barcode = mt2.barcode

    AND DATEDIFF(hh, mt2.dateTimeStamp, m1.dateTimeStamp) BETWEEN 0 AND 5

    )

    GO

    --This will run through each record, start-finnish populating the correct value (using the trigger)....

    --Run this code just one time for rhetro-active

    DECLARE curMT CURSOR FOR

    SELECT barcode, dateTimeStamp

    FROM myTable

    ORDER BY barcode, dateTimeStamp

    OPEN curMT

    DECLARE @bc varchar(10)

    DECLARE @dts smalldatetime

    FETCH NEXT FROM curMT INTO @bc, @dts

    WHILE @@FETCH_STATUS = 0

    BEGIN

    UPDATE myTable

    SET myFlag = 0 --default

    WHERE barcode = @bc

    AND dateTimeStamp = @dts

    FETCH NEXT FROM curMT INTO @bc, @dts

    END

    CLOSE curMT

    DEALLOCATE curMT

    GO

    --

  • Can you make an estimation how long it will take to run a cursor over 20 million rows firing trigger doing update with derived table on the same table with 20 mil rows?

    I afraid they need for such task more computing power than Peter Jackson.

    _____________
    Code for TallyGenerator

  • I thought about it for a while and you either do it like the first post I made for rhetro or you do it with a cursor. The cursor is more acurate. Post back if you need an explantation on either method.

  • You could also try the script below.

    I think what Sergiy suggests is that you make timeinterval (see below) a calculated and indexed column on mytable and then run the last query below. That's probably the best solution.

     

    create table [mytable] (

    [barcode] varchar(10),

    [datetimestamp] datetime,

    timeinterval int

    )

    go

    declare @InitDate datetime

    select @InitDate = '20051201'

    insert mytable select 1, '20060101 4:00', 0

    insert mytable select 1, '20060101 5:00', 0

    insert mytable select 1, '20060101 8:00', 0

    insert mytable select 2, '20060101 9:00', 0

    insert mytable select 1, '20060101 11:00', 0

    update mytable set timeinterval = datediff(hh, @InitDate, datetimestamp)/6

    select

    count(distinct barcode) as distinctEntries,

    dateadd(hh, timeinterval*6, @InitDate) as FromDate,

    dateadd(hh, (timeinterval + 1)*6, @InitDate) as ToDate

    from mytable

    group by timeinterval

    drop table mytable

    go

  • this still splits the day up into time sections. Here's the scenerio...

    swipe at 11:58 -> myFlag = 1

    12:00 -> 6 hour period elapses.

    swipe at 12:01

    My original post flags both with less overhead.

    The cursor solution will flag the first swipe move to the second and realize the "original" (it knows, because there is a flag on the first) is less than 6 hrs. A batch update command simply cannot acheive this, because there is no moving frame of reference. You only need to run the cursor one time, then you rely on the Trigger.

  • I understand what you are saying. I cannot read from John's post whether he wants your example to count as a swipe during each period or whether he wants to ignore one of them (could be the first one if there is a swipe at 5:59). If this is the case, you will have to use a loop, that's right. But looping through 20 million rows, that's going to take a lot of time... One might accept that your user is included in both time periods if data can be found in a (relatively) fast and straight-forward way.

     

  • If you insist on solving this, it is probably better to loop over time than over each row in the table

    declare @time datetime

    select @time = '20050101' -- Your starting point

    while @time < getdate()

    begin

      update t set t.myflag = 1

      from mytable t inner join

      (

        select t1.barcode, min(t1.datetimestamp) as mintime

        from mytable t1 left join mytable t2

        on t1.barcode = t2.barcode and t2.myflag = 1 and t2.datetimestamp >= dateadd(hh, -6, t1.datetimestamp)

        where

        t1.datetimestamp >= @time

        and t2.datetimestamp is null

        group by t1.barcode

      )

      dt

      on t.barcode = dt.barcode and t.datetimestamp = dt.mintime

      select @time = dateadd(hh, 6, @time)

    end

  • "on..and t2.myflag = 1"

    at what point does myFlag = 1?

  • Previous iteration of the loop.

Viewing 15 posts - 1 through 15 (of 28 total)

You must be logged in to reply to this topic. Login to reply