Help with sql statement to delete old records for user if multiple records found

  • Here is how the data looks like for UserSession Table

    UserName LastUpdated sessionID

    --------- ------------- ---------

    User1 1/1/2013 10AM SESS01

    User1 1/1/2013 11AM SESS02

    User1 1/1/2013 12PM SESS03

    User2 1/1/2013 10AM SESS04

    User3 1/1/2013 10AM SESS05

    User3 1/1/2013 11AM SESS06

    I would like to find users with multiple sessions and delete the old sessions and just keep the only one latest session for user.

    In above example it should delete following records

    UserName LastUpdated sessionID

    --------- ------------- ---------

    User1 1/1/2013 10AM SESS01

    User1 1/1/2013 11AM SESS02

    User3 1/1/2013 10AM SESS05

    and keep following session

    UserName LastUpdated sessionID

    --------- ------------- ---------

    User1 1/1/2013 12PM SESS03

    User2 1/1/2013 10AM SESS04

    User3 1/1/2013 11AM SESS06

    Thanks.

  • Can you post ddl and sample data? It will help greatly if we know what datatypes we are dealing with here. Take a few minutes and read the article at the first link in my signature if you are unsure about what to post and how to put it together.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • I took a few liberties and created my own version of your data. I made the assumption that LastUpdated is a datetime datatype. If it is not, it will need to converted to a datetime because you can't count on string values sorting the way you think they should with the dateformat you posted.

    Notice that the first thing is a consumable table with data. This is what I meant by my first post.

    if OBJECT_ID('tempdb..#UserSession') is not null

    drop table #UserSession

    create table #UserSession

    (

    UserName varchar(25),

    LastUpdated datetime,

    SessionID char(6)

    )

    insert #UserSession

    select 'User1', '2013-01-01 10:00:00.000', 'SESS01' union all

    select 'User1', '2013-01-01 11:00:00.000', 'SESS02' union all

    select 'User1', '2013-01-01 12:00:00.000', 'SESS03' union all

    select 'User2', '2013-01-01 10:00:00.000', 'SESS04' union all

    select 'User3', '2013-01-01 10:00:00.000', 'SESS05' union all

    select 'User3', '2013-01-01 11:00:00.000', 'SESS06';

    --Use ROW_NUMBER to isolate the most current values for each UserName.

    --The most current row will have a RowNum of 1

    with myCTE as

    (

    select UserName,

    LastUpdated,

    SessionID,

    ROW_NUMBER() over (PARTITION BY UserName order by LastUpdated desc) as RowNum

    from #UserSession

    )

    --Now we can use the cte as our driver for deleting.

    delete myCTE where RowNum > 1

    select * from #UserSession

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • That was great. Worked out perfect.

  • Here's two other ways of doing it. First, write a query returning the rows you want to keep:

    -- Query 1

    select *

    from #UserSession us

    where us.LastUpdated =(

    select Max( LastUpdated )

    from #UserSession

    where UserName = us.UserName );

    -- Query 2

    select us.*

    from #UserSession us

    left join #UserSession us2

    on us2.UserName = us.Username

    and us2.LastUpdated > us.LastUpdated

    where us2.UserName is null;

    For those leery of subqueries, Query 1 will be impressively fast if the table is large, assuming Username and LastUpdated are properly indexed.

    To delete, just invert the filter criteria and add to a DELETE statement. In this case, Query 2 is easier to invert so:

    delete

    from #UserSession us

    from #UserSession us

    left join #UserSession us2

    on us2.UserName = us.Username

    and us2.LastUpdated > us.LastUpdated

    where us2.UserName is not null;

    -- Verify

    select *

    from #UserSession;

    Will this be faster than using analytics? Probably not. But it never hurts to have alternate ways to get to where you want to go.

    Note: this code is written in SybaseIQ, which is the only system I have available to me at the moment. Some slight modification will be needed for Sql Server.

    TommCatt
    In theory, there is no difference between theory and practice. In practice, there is.

  • Or

    delete a

    FROM #UserSession a

    WHERE EXISTS (SELECT NULL FROM #UserSession WHERE UserName = a.UserName AND LastUpdated > a.LastUpdated)

  • I guess the original sin here is that some valuable statistical information has been deleted forever. I would never delete such data until I had squirreled it away somewhere. No telling when some manager is going to ask questions like "I need to know what the hourly average of online sessions was per hour per day for the last 6 months with a comparison of that same time frame from a year ago so we can show our stockholders how business has improved from an online perspective." The look on his/her face will be extra special when you tell him/her that you deleted it even if it were because that same manager told you to do the deletes.

    Protect the less informed and yourself... never delete data until you know you can get it back and have tested it. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply