Help with grouping by datetime interval.

  • Hello,

    I've been struggling with a bit of a challenge to do with grouping visits based on the gap between the previous visit.

    I've been using the article by Dave McKinney "Linking to the Previous Row" http://www.sqlservercentral.com/articles/T-SQL/62159/ as the base for my code, but I've run into a bit of a problem now.

    Basically, I have a table containing visits

    CREATE TABLE [dbo].[current_visit_test](

    [visit_id] [bigint] NOT NULL,

    [log_file_id] [int] NULL,

    [site_id] [int] NOT NULL,

    [user_id] [bigint] NOT NULL,

    [user_visit_id] [int] NULL,

    [user_source_id] [int] NULL,

    [user_traffic_source_id] [bigint] NULL,

    [traffic_source_id] [int] NULL,

    [visit_start_date_gmt] [datetime] NULL,

    [visit_end_date_gmt] [datetime] NULL,

    [visit_duration] [int] NULL,

    [visit_date_time_id] [int] NULL,

    [last_changed_date] [datetime] NULL,

    [mod_ctr] [smallint] NULL,

    [visit_depth] [smallint] NULL,

    [first_page_name_id] [bigint] NULL,

    [last_page_name_id] [bigint] NULL,

    [javascript_version] [smallint] NOT NULL,

    [is_first_user_visit] [bit] NULL,

    [is_first_traffic_source_visit] [bit] NULL,

    [is_consistent] [bit] NULL,

    [hierarchy_id] [int] NULL

    )

    and I want to group these visits and assign the groups a unique 'session' identifier so i can perform some session-level analytics. A session is defined as a group of visits where the gap between consecutive visits is no more than 30 minutes. I used the code from the article to get a CTE that partitions the data:

    ;WITH SessionAlloc AS

    (

    SELECTvisit_id,

    user_id,

    site_id,

    user_visit_id,

    user_source_id,

    visit_start_date_gmt,

    visit_end_date_gmt,

    ROW_NUMBER() OVER (Partition BY user_id,site_id/ ORDER BY visit_start_date_gmt) AS rownum

    FROMcurrent_visit_test

    Then I'm testing a statement that puts a new guid on the visits if its more than 30 minutes since the previous vists for the user/site combo:

    SELECTcurrow.visit_id,

    currow.site_id,

    currow.user_id,

    currow.rownum,

    currow.visit_start_date_gmt as current_start_datetime,

    prevrow.visit_end_date_gmt as previous_end_datetime,

    DATEDIFF(mi,prevrow.visit_end_date_gmt,currow.visit_start_date_gmt) as minutes_since_prev_visit,

    CASE WHEN DATEDIFF(mi,prevrow.visit_end_date_gmt,currow.visit_start_date_gmt)IS NULL THEN NEWID()

    WHEN DATEDIFF(mi,prevrow.visit_end_date_gmt,currow.visit_start_date_gmt)> 30THEN NEWID()

    ELSE NULL --prevrow.engagement_guid

    END AS engagement_guid

    FROMSessionAlloc currow

    LEFT JOIN

    SessionAlloc prevrow

    ONcurrow.user_id= prevrow.user_id

    ANDcurrow.site_id= prevrow.site_id

    ANDcurrow.rownum= prevrow.rownum + 1

    ORDER BY

    currow.user_id,

    currow.site_id,

    currow.visit_start_date_gmt

    The problem I have is: This puts a new guid on the first visit of each new session, but the other visits that belong to that session have null in their guid column, and I'm stuck when it comes to an efficient way of adding the relevant guid for that session to those visits that belong to it.

    I tried creating the select statement as a view and then reffereing back to itself, but the NEWID function means that the visits have different guids in each version of the view I refer to!!!

    I hope that makes some sense - I've been staring at this for hours now and I'm getting the SQL fog decending...

    Thanks in advance for any advice/help you can lend me....

  • Please provide some ready to use sample data for your current_visit_test table in the form of

    INSERT INTO...

    SELECT... UNION ALL

    SELECT...

    together with your expected result based on those sample data.

    You might want to limit the number of columns to the ones referenced in the query and I also strongly recommend to provide fake data.



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Seem like you need another CTE. Something like (since no data is provided this is untested):

    ;WITH SessionAlloc AS

    (

    SELECT

    visit_id,

    user_id,

    site_id,

    user_visit_id,

    user_source_id,

    visit_start_date_gmt,

    visit_end_date_gmt,

    ROW_NUMBER() OVER (Partition BY user_id, site_id ORDER BY visit_start_date_gmt) AS rownum

    FROM

    current_visit_test

    ),

    Uniqify

    (

    SELECT

    currow.visit_id,

    currow.site_id,

    currow.user_id,

    currow.rownum,

    currow.visit_start_date_gmt as current_start_datetime,

    prevrow.visit_end_date_gmt as previous_end_datetime,

    DATEDIFF(mi, prevrow.visit_end_date_gmt, currow.visit_start_date_gmt) as minutes_since_prev_visit,

    CASE WHEN DATEDIFF(mi, prevrow.visit_end_date_gmt, currow.visit_start_date_gmt) IS NULL THEN NEWID()

    WHEN DATEDIFF(mi, prevrow.visit_end_date_gmt, currow.visit_start_date_gmt) > 30 THEN NEWID()

    ELSE NULL --prevrow.engagement_guid

    END AS engagement_guid

    FROM

    SessionAlloc currow LEFT JOIN

    SessionAlloc prevrow

    ON currow.user_id = prevrow.user_id AND

    currow.site_id = prevrow.site_id AND

    currow.rownum = prevrow.rownum + 1

    )

    SELECT

    * -- put your desired query here

    FROM

    uniqify

Viewing 3 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply