Match vertical records to next greater date to find datediff

  • I need to find the amount of time a user spent watching videos.  I have play/pause actions recorded with time stamps as shown below:


    In this case, the textAuditID is effectively the userID so this is only showing logs for that single user...

    I need to tie together the play action and the pause action and find the datediff between. It is possible that there isn't a completing pause action (if they closed the browser before it could send back an action)
    What I am just not sure of is how to efficiently know which play goes with which pause.  This table will have a ton of records when in use so I want to do this one right.
    Any pointers or thoughts? 
    I have been writing SQL for 20 years so it isn't new (but it isn't my main job either) so giving ideas should get me there hopefully.
    Thanks!
    Brandon

  • You can use LEAD() to get the next viewChecked value - calculate the difference between the current viewChecked and the next viewChecked.  Add up the times associated with the play action to get the total play time and total pause time.

    If you don't have something to identify the end of the video - then you would have to assume the last play action completed to the end of the video.  That is going to require that you have the total video time available which you can then use to determine the last play time (video_length - total_play_time = last_play_time).

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Thanks. I have never used LEAD().  Looking into it now. For simplicity of the question, I do have other actions recorded like Close, Ended & Back but all of them amount to the same thing - the video was stopped (just like pause).   And, good news, I do have the total vid length.  So, I will play with this and get back - thank you for your fast reply.

  • Jeffrey Williams 3188 - Wednesday, January 16, 2019 3:23 PM

    You can use LEAD() to get the next viewChecked value - calculate the difference between the current viewChecked and the next viewChecked.  Add up the times associated with the play action to get the total play time and total pause time.

    If you don't have something to identify the end of the video - then you would have to assume the last play action completed to the end of the video.  That is going to require that you have the total video time available which you can then use to determine the last play time (video_length - total_play_time = last_play_time).

    So the easier answer was using lag as follows (p.s. - I know lots of peeps HATE temp tables - hate away - this is about as efficient as I need)
    select viewUniqueID, viewChecked,action,
      (datediff(minute,lag(viewChecked) over (order by viewChecked),viewChecked) ) as diff
        into #tmp
    from viewHistory where TextAuditID = @textAuditID

    select sum(diff) from #tmp where action in ('pause','closed')

  • brandon 94807 - Wednesday, January 16, 2019 8:35 PM

    Jeffrey Williams 3188 - Wednesday, January 16, 2019 3:23 PM

    You can use LEAD() to get the next viewChecked value - calculate the difference between the current viewChecked and the next viewChecked.  Add up the times associated with the play action to get the total play time and total pause time.

    If you don't have something to identify the end of the video - then you would have to assume the last play action completed to the end of the video.  That is going to require that you have the total video time available which you can then use to determine the last play time (video_length - total_play_time = last_play_time).

    So the easier answer was using lag as follows (p.s. - I know lots of peeps HATE temp tables - hate away - this is about as efficient as I need)
    select viewUniqueID, viewChecked,action,
      (datediff(minute,lag(viewChecked) over (order by viewChecked),viewChecked) ) as diff
        into #tmp
    from viewHistory where TextAuditID = @textAuditID

    select sum(diff) from #tmp where action in ('pause','closed')

    You don't need a temp table with your method, you can use a CTE instead:
    ;with CTE AS
    (
      select viewUniqueID, viewChecked,action,
              (datediff(minute,lag(viewChecked) over (order by viewChecked),viewChecked) ) as diff
        from viewHistory where TextAuditID = @textAuditID
    )
    select sum(diff) from CTE where action in ('pause','closed')

  • Jonathan AC Roberts - Thursday, January 17, 2019 7:45 AM

    brandon 94807 - Wednesday, January 16, 2019 8:35 PM

    Jeffrey Williams 3188 - Wednesday, January 16, 2019 3:23 PM

    You can use LEAD() to get the next viewChecked value - calculate the difference between the current viewChecked and the next viewChecked.  Add up the times associated with the play action to get the total play time and total pause time.

    If you don't have something to identify the end of the video - then you would have to assume the last play action completed to the end of the video.  That is going to require that you have the total video time available which you can then use to determine the last play time (video_length - total_play_time = last_play_time).

    So the easier answer was using lag as follows (p.s. - I know lots of peeps HATE temp tables - hate away - this is about as efficient as I need)
    select viewUniqueID, viewChecked,action,
      (datediff(minute,lag(viewChecked) over (order by viewChecked),viewChecked) ) as diff
        into #tmp
    from viewHistory where TextAuditID = @textAuditID

    select sum(diff) from #tmp where action in ('pause','closed')

    You don't need a temp table with your method, you can use a CTE instead:
    ;with CTE AS
    (
      select viewUniqueID, viewChecked,action,
              (datediff(minute,lag(viewChecked) over (order by viewChecked),viewChecked) ) as diff
        from viewHistory where TextAuditID = @textAuditID
    )
    select sum(diff) from CTE where action in ('pause','closed')

    I see I have been programming too long.  When I started this didn't exist and I guess I never learned about CTE.  That sounds nuts since it came out with SQL 2005, but I haven't ever run into CTE in examples or in my programming.  They look very handy (said the old man in the programming business - and when did that happen?).  Thank you for your help.

  • brandon 94807 - Wednesday, January 16, 2019 2:58 PM

    I would do this differently. The viewing is an event and should be the table. What you've done is a design flaw called attribute splitting. You put the play and pause as values in your "<something>_action" , instead of showing them as the defining points in time of what ISO calls an interval data type. They are part of an attribute. Also, are you still using the old Sybase datetime data type? We now have DATETIME2(n) which conforms ANSI/ISO standards and let you be as accurate as you need to be.

    I'm also curious about what you are calling a "vid_id" since we now have an industry-standard for audiovisual material called ISAN. I wish it had existed a few decades ago, when I did some work in Belgium for their national television network. But more than that, I'm bothered by your formatting of the data element names; I spent some years doing research at AIRMICS and we found that those embedded capitals really screw up maintenance. Basically if you grow up of the alphabet (Latin, Greek or Cyrillic) that has an uppercase, then your eye is taught to jump to it since they always begin a special word or the start of a sentence or the start of a paragraph. We found out that this kind of formatting added about 8% more time to doing simple maintenance.

    CREATE TABLE Views
    (isan CHAR(..) NOT NULL
    viewing_start_timestamp DATETIME2(0) NOT NULL,
    viewing_end_timestamp DATETIME2(0), --- null is not finished view CHECK ( CHECK(viewing_start_timestamp <= viewing_end_timestamp),
    ...);

    I recommend that you download the PDF from the University of Arizona of Rick Snodgrass is book on temporal queries in SQL. It's a classic that should be part of everybody's library.

    Please post DDL and follow ANSI/ISO standards when asking for help. 

  • jcelko212 32090 - Friday, January 18, 2019 3:16 PM

    brandon 94807 - Wednesday, January 16, 2019 2:58 PM

    I would do this differently. The viewing is an event and should be the table. What you've done is a design flaw called attribute splitting. You put the play and pause as values in your "<something>_action" , instead of showing them as the defining points in time of what ISO calls an interval data type. They are part of an attribute. Also, are you still using the old Sybase datetime data type? We now have DATETIME2(n) which conforms ANSI/ISO standards and let you be as accurate as you need to be.

    I'm also curious about what you are calling a "vid_id" since we now have an industry-standard for audiovisual material called ISAN. I wish it had existed a few decades ago, when I did some work in Belgium for their national television network. But more than that, I'm bothered by your formatting of the data element names; I spent some years doing research at AIRMICS and we found that those embedded capitals really screw up maintenance. Basically if you grow up of the alphabet (Latin, Greek or Cyrillic) that has an uppercase, then your eye is taught to jump to it since they always begin a special word or the start of a sentence or the start of a paragraph. We found out that this kind of formatting added about 8% more time to doing simple maintenance.

    CREATE TABLE Views
    (isan CHAR(..) NOT NULL
    viewing_start_timestamp DATETIME2(0) NOT NULL,
    viewing_end_timestamp DATETIME2(0), --- null is not finished view CHECK ( CHECK(viewing_start_timestamp <= viewing_end_timestamp),
    ...);

    I recommend that you download the PDF from the University of Arizona of Rick Snodgrass is book on temporal queries in SQL. It's a classic that should be part of everybody's library.

    If that's true, you should be able to provide the link, Joe.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • jcelko212 32090 - Friday, January 18, 2019 3:16 PM

    brandon 94807 - Wednesday, January 16, 2019 2:58 PM

    I would do this differently. The viewing is an event and should be the table. What you've done is a design flaw called attribute splitting. You put the play and pause as values in your "<something>_action" , instead of showing them as the defining points in time of what ISO calls an interval data type. They are part of an attribute. Also, are you still using the old Sybase datetime data type? We now have DATETIME2(n) which conforms ANSI/ISO standards and let you be as accurate as you need to be.

    I'm also curious about what you are calling a "vid_id" since we now have an industry-standard for audiovisual material called ISAN. I wish it had existed a few decades ago, when I did some work in Belgium for their national television network. But more than that, I'm bothered by your formatting of the data element names; I spent some years doing research at AIRMICS and we found that those embedded capitals really screw up maintenance. Basically if you grow up of the alphabet (Latin, Greek or Cyrillic) that has an uppercase, then your eye is taught to jump to it since they always begin a special word or the start of a sentence or the start of a paragraph. We found out that this kind of formatting added about 8% more time to doing simple maintenance.

    CREATE TABLE Views
    (isan CHAR(..) NOT NULL
    viewing_start_timestamp DATETIME2(0) NOT NULL,
    viewing_end_timestamp DATETIME2(0), --- null is not finished view CHECK ( CHECK(viewing_start_timestamp <= viewing_end_timestamp),
    ...);

    I recommend that you download the PDF from the University of Arizona of Rick Snodgrass is book on temporal queries in SQL. It's a classic that should be part of everybody's library.

    This has to be the definition of nitpicking. There is a trade off of table proliferation and usability (assuming I understood the first pick)  and I often make the choice for usability. But I don't think I did understand, because it was really something.  

    VidID is an internal ID, because they are the IDs assigned by vimeo. I bet I could use ISAN then have a lookup table and then we are right back to take proliferation...

    I shall continue to use caps cause I am a rebel without a cause that anyone cares about. But, from now on, anytime I do I will think of you.
    Thanks!

  • One thing when using the LEAD or LAG functions, in your OVER clause, you may also want to add a "PARTITION BY vidID, textAuditID" statement before ORDER BY. So the complete syntax would be as follows:

    ;with CTE AS
    (
    select viewUniqueID, viewChecked,action,
        (datediff(minute,lag(viewChecked) OVER (PARTITION BY vidID, textAuditID ORDER BY viewChecked),viewChecked) ) as diff
      from viewHistory where TextAuditID = @textAuditID
    )
    select sum(diff) from CTE where action in ('pause','closed')

    The reason you would do this is to make sure your next row in the windowing function belongs to the same video and same user. Otherwise you could be picking the viewChecked record for a different video/user combination.

  • jcelko212 32090 - Friday, January 18, 2019 3:16 PM

    brandon 94807 - Wednesday, January 16, 2019 2:58 PM

    ...
    I recommend that you download the PDF from the University of Arizona of Rick Snodgrass is book on temporal queries in SQL. It's a classic that should be part of everybody's library.

    From https://www2.cs.arizona.edu/~rts/publications.html:

    Developing Time-Oriented Database Applications in SQL: The PDF of this book is here (5MB) (which looks a little fuzzy but prints fine, except for pages 30–31, which are here) and its associated CD-ROM in zip (59MB) or gzipped tar (57MB) formats.

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply