Linking to the Previous Row

  • Thanks for your post. i am facing a similar issue. i have a table which is having the following fields and samle data

    empid Transacttiondate transactiontype

    1001 2010-06-18 07:33:17 IN

    1001 2010-06-18 07:40:18 OUT

    1001 2010-06-18 10:40:10 IN

    1001 2010-06-18 10:45:17 IN

    1001 2010-06-18 12:45:17 OUT

    i need the result as

    empid intime outtime

    1001 2010-06-18 07:33:17 2010-06-18 07:40:18

    1001 2010-06-18 10:40:10

    1001 2010-06-18 10:45:17 2010-06-18 12:45:17

    can i get any help to solve this?

  • anudata (8/31/2010)


    Thanks for your post. i am facing a similar issue. i have a table which is having the following fields and samle data

    empid Transacttiondate transactiontype

    1001 2010-06-18 07:33:17 IN

    1001 2010-06-18 07:40:18 OUT

    1001 2010-06-18 10:40:10 IN

    1001 2010-06-18 10:45:17 IN

    1001 2010-06-18 12:45:17 OUT

    i need the result as

    empid intime outtime

    1001 2010-06-18 07:33:17 2010-06-18 07:40:18

    1001 2010-06-18 10:40:10

    1001 2010-06-18 10:45:17 2010-06-18 12:45:17

    can i get any help to solve this?

    A version that uses outer apply might be faster, but functional speaking this should work from sql server 2000 onwards. Just make sure you got at least an index on empID, Transacttiondate for performance reasons.

    select

    inData.empID

    , inData.Transacttiondate as inTime

    , (

    select top 1

    case

    when outData.transactiontype = 'IN'

    then null

    else outData.Transacttiondate

    end

    from

    sometable as outData

    where

    outData.empID = inData.empID and

    outData.Transacttiondate > inData.Transacttiondate

    order by

    outData.Transacttiondate

    )

    as outTime

    from

    sometable as inData

    where

    inData.transactiontype = 'IN'

    order by

    inData.empID

    , inData.Transacttiondate

    ;

  • anudata (8/31/2010)


    Thanks for your post. i am facing a similar issue. i have a table which is having the following fields and samle data

    empid Transacttiondate transactiontype

    1001 2010-06-18 07:33:17 IN

    1001 2010-06-18 07:40:18 OUT

    1001 2010-06-18 10:40:10 IN

    1001 2010-06-18 10:45:17 IN

    1001 2010-06-18 12:45:17 OUT

    i need the result as

    empid intime outtime

    1001 2010-06-18 07:33:17 2010-06-18 07:40:18

    1001 2010-06-18 10:40:10

    1001 2010-06-18 10:45:17 2010-06-18 12:45:17

    can i get any help to solve this?

    Here is one solution.

    create table timecard

    (

    primary key ( empid, Transacttiondate, transactiontype ),

    empid int not null,

    Transacttiondate smalldatetime not null,

    transactiontype char(1) not null

    );

    insert into timecard

    select 1001, '2010-06-18 07:33:17', 'I' union all

    select 1001, '2010-06-18 07:40:18', 'O' union all

    select 1001, '2010-06-18 10:40:10', 'I' union all

    select 1001, '2010-06-18 10:45:17', 'I' union all

    select 1001, '2010-06-18 12:45:17', 'O';

    select i.empid, i.transacttiondate intime, min(o.transacttiondate) outtime

    from timecard i

    left join timecard o on o.empid = i.empid and o.transacttiondate >= i.transacttiondate and o.transactiontype = 'O'

    where i.transactiontype = 'I'

    group by i.empid, i.transacttiondate

    order by i.empid, i.transacttiondate;

    empid intime outtime

    ----------- ----------------------- -----------------------

    1001 2010-06-18 07:33:00 2010-06-18 07:40:00

    1001 2010-06-18 10:40:00 2010-06-18 12:45:00

    1001 2010-06-18 10:45:00 2010-06-18 12:45:00

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Thank u for your Help. This query helps me lot. but still have some problem.

    for example i have an output with above query

    EMPIDInTime OutTime

    218512010-04-20 07:52:07.0002010-04-20 16:32:13.000

    218512010-04-21 19:20:37.0002010-04-28 16:41:10.000

    218512010-04-25 18:24:56.0002010-04-28 16:41:10.000

    218512010-04-26 06:31:03.0002010-04-28 16:41:10.000

    218512010-04-28 08:03:28.0002010-04-28 16:41:10.000

    218512010-04-28 16:40:52.0002010-04-28 16:41:10.000

    218512010-05-02 08:19:48.0002010-05-02 15:34:10.000

    Here the employee have IN transaction on 21,25,26 but no OUT on these dates. So it's taking the OUT Time for those IN as 2010-04-28 16:41:10.000. how to avoid this. i need blank or null for these out.:w00t:

  • anudata...first point is that a lot of people won't answer your post because you haven't made the effort to script the data.

    Look at how Eric did it above. If you do that once, then we don't have to. If you don't do it, then each of us has to. Get it?

    So thanks Eric for scripting a table - if he hadn't done so I wouldn't be answering your post!

    Below is a CTE approach, akin to that shown in the article. I'll pass no comment in this post on performance pros and cons, that's been done more than adequately in the previous pages. I've included a second employee, as it's I suppose essential to any solution that you don't mix up INs and OUTs for different employees. This being so, its essential also to include the second employee in your test data.

    if object_id('tempdb.dbo.#timecard') is not null

    drop table #timecard

    go

    create table #timecard

    (

    primary key ( empid, Transactiondate, transactionType ) ,

    empid int not null ,

    TransactionDate smalldatetime not null ,

    transactionType char(3) not null

    ) ;

    insert into #timecard

    select 1001 ,

    '2010-06-18 07:33:17' ,

    'IN'

    union all

    select 1001 ,

    '2010-06-18 07:40:18' ,

    'OUT'

    union all

    select 1001 ,

    '2010-06-18 10:40:10' ,

    'IN'

    union all

    select 1001 ,

    '2010-06-18 10:45:17' ,

    'IN'

    union all

    select 1001 ,

    '2010-06-18 12:45:17' ,

    'OUT'

    union all

    select 9999 ,

    '2010-06-18 10:14:10' ,

    'IN'

    union all

    select 9999 ,

    '2010-06-18 12:40:10' ,

    'OUT' ;

    with numberedset

    as ( select row_number() OVER ( partition by empid order by TransactionDate asc ) as rownum ,

    empid ,

    TransactionDate ,

    transactionType

    FROM #timecard t

    ),

    outs

    as ( select *

    FROM numberedset

    where transactionType = 'OUT'

    ),

    pairs

    as ( SELECT therow.rownum ,

    therow.empid ,

    therow.TransactionDate as TransactionDate ,

    therow.transactionType as TransactionType ,

    thenextrow.TransactionDate as NextTransactionDate ,

    thenextrow.transactionType as NextTransactionType

    FROM numberedset therow

    left join outs thenextrow

    on therow.empid = thenextrow.empid

    and therow.rownum + 1 = thenextrow.rownum

    )

    SELECT *

    FROM pairs

    where TransactionType = 'IN'

  • @ peter SSC Enthusiastic

    I apologise for the upper case letters in my last post,I was multitasking it wasn't meant to be rude

  • Thank you David for your quick replay. Your query solved my problem.:-)

  • LIYA (9/1/2010)


    Thank u for your Help. This query helps me lot. but still have some problem.

    for example i have an output with above query

    EMPIDInTime OutTime

    218512010-04-20 07:52:07.0002010-04-20 16:32:13.000

    218512010-04-21 19:20:37.0002010-04-28 16:41:10.000

    218512010-04-25 18:24:56.0002010-04-28 16:41:10.000

    218512010-04-26 06:31:03.0002010-04-28 16:41:10.000

    218512010-04-28 08:03:28.0002010-04-28 16:41:10.000

    218512010-04-28 16:40:52.0002010-04-28 16:41:10.000

    218512010-05-02 08:19:48.0002010-05-02 15:34:10.000

    Here the employee have IN transaction on 21,25,26 but no OUT on these dates. So it's taking the OUT Time for those IN as 2010-04-28 16:41:10.000. how to avoid this. i need blank or null for these out.:w00t:

    Look at the version I posted, that should not have this problem as I anticipated it from the data you presented.

  • Hi Peter,

    The query is working fine; the only disadvantage is its execution time. It took long time to execute.

    David's solution is much faster as compared to this. Thank you very much both of you.

  • LIYA (9/1/2010)


    Hi Peter,

    The query is working fine; the only disadvantage is its execution time. It took long time to execute.

    David's solution is much faster as compared to this. Thank you very much both of you.

    Odd, it should not work that slow given you added the index I wrote about. The index should speed up any well written query for this particular task as it indexes the precise fields being searched on.

    I will try out both solutions myself, maybe I do learn something new myself 😉

    Can you have any indication on how large your dataset is (in records)?

  • Hi Peter,

    At present the testing table have 1628430 records. its' only a 5 month data. but in production it will be 5 time higher. It's a access control and attendance database, the application is having it's own interface and reports. the client wants to generate some custmised reports. the existing application is not directly accessing sql server. it's using BDE (Bolrland Database Engine] to access the database. so i can't modify anything on existing database. it will affect the existing application. i just use a view with above specified fileds to generate the report. 🙂

  • Hi Liya (aka anudata!),

    Given the volumes of data involved, I suggest you get familiar with the other approaches available in tackling this problem. If you leaf through the discussion pages, you can see that there are many. Temp tables with identity fields come particularly to mind (instead of the rownumber.) The advantage of the CTE approach is its elegance and its ease of use in building up the layers of a solution. However it will not be the fastest solution over large datasets. (Except maybe if you're always reporting on one employee at a time.)

    I'm glad the solution I provided is working for you, but make sure it performs well with the 5* data load, and if it does make sure it works with 50* data load to give it some future proofing.

    Regards,

    David.

  • LIYA (9/1/2010)


    Hi Peter,

    At present the testing table have 1628430 records. its' only a 5 month data. but in production it will be 5 time higher. It's a access control and attendance database, the application is having it's own interface and reports. the client wants to generate some custmised reports. the existing application is not directly accessing sql server. it's using BDE (Bolrland Database Engine] to access the database. so i can't modify anything on existing database. it will affect the existing application. i just use a view with above specified fileds to generate the report. 🙂

    It seems that this type of punch in/out report is something that would only be segmented on a daily or weekly basis. Since you only need the 3 columns, for performance reasons you may want to create a compound index on (transacttiondate, empid, transactiontype), so you can easily filter on a specific date range while covering the query. Actually, I'd expect that column combination to be the primary key.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • David,

    I have used Table Variables to accomplish the same thing. Are there a drawbacks to using table variables instead of CTEs? Are there advantages to using CTEs?

  • Robert Carretta (9/9/2010)


    David,

    I have used Table Variables to accomplish the same thing. Are there a drawbacks to using table variables instead of CTEs? Are there advantages to using CTEs?

    Table variables do not and cannot be made to generate any statistics. They are always evaluated like any other variable... one row... no matter how many they have. Table variables are NOT memory only like a lot of folks think. When they get too big, the flip over to Temp DB (same as a Temp Table). The only advantage a table variable may have is providing an interim result set to split larger queries. I'd rather use a Temp table (which starts in memory just like a table variable) just because they persist for troubleshooting in SSMS and can be a LOT faster if SELECT/INTO is used.

    Non recursive CTE's are no different than derived tables or views except they're more like a view in that they can be self joined and joined more than once (with the same performance problems when that happens).

    There's ton's of other information about the differences. Is there something specific you wanted to know about Table Variables and "other" options?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 106 through 120 (of 147 total)

You must be logged in to reply to this topic. Login to reply