September 20, 2010 at 5:24 am
Hello,
I've been struggling with a bit of a challenge to do with grouping visits based on the gap between the previous visit.
I've been using the article by Dave McKinney "Linking to the Previous Row" http://www.sqlservercentral.com/articles/T-SQL/62159/ as the base for my code, but I've run into a bit of a problem now.
Basically, I have a table containing visits
CREATE TABLE [dbo].[current_visit_test](
[visit_id] [bigint] NOT NULL,
[log_file_id] [int] NULL,
[site_id] [int] NOT NULL,
[user_id] [bigint] NOT NULL,
[user_visit_id] [int] NULL,
[user_source_id] [int] NULL,
[user_traffic_source_id] [bigint] NULL,
[traffic_source_id] [int] NULL,
[visit_start_date_gmt] [datetime] NULL,
[visit_end_date_gmt] [datetime] NULL,
[visit_duration] [int] NULL,
[visit_date_time_id] [int] NULL,
[last_changed_date] [datetime] NULL,
[mod_ctr] [smallint] NULL,
[visit_depth] [smallint] NULL,
[first_page_name_id] [bigint] NULL,
[last_page_name_id] [bigint] NULL,
[javascript_version] [smallint] NOT NULL,
[is_first_user_visit] [bit] NULL,
[is_first_traffic_source_visit] [bit] NULL,
[is_consistent] [bit] NULL,
[hierarchy_id] [int] NULL
)
and I want to group these visits and assign the groups a unique 'session' identifier so i can perform some session-level analytics. A session is defined as a group of visits where the gap between consecutive visits is no more than 30 minutes. I used the code from the article to get a CTE that partitions the data:
;WITH SessionAlloc AS
(
SELECTvisit_id,
user_id,
site_id,
user_visit_id,
user_source_id,
visit_start_date_gmt,
visit_end_date_gmt,
ROW_NUMBER() OVER (Partition BY user_id,site_id/ ORDER BY visit_start_date_gmt) AS rownum
FROMcurrent_visit_test
Then I'm testing a statement that puts a new guid on the visits if its more than 30 minutes since the previous vists for the user/site combo:
SELECTcurrow.visit_id,
currow.site_id,
currow.user_id,
currow.rownum,
currow.visit_start_date_gmt as current_start_datetime,
prevrow.visit_end_date_gmt as previous_end_datetime,
DATEDIFF(mi,prevrow.visit_end_date_gmt,currow.visit_start_date_gmt) as minutes_since_prev_visit,
CASE WHEN DATEDIFF(mi,prevrow.visit_end_date_gmt,currow.visit_start_date_gmt)IS NULL THEN NEWID()
WHEN DATEDIFF(mi,prevrow.visit_end_date_gmt,currow.visit_start_date_gmt)> 30THEN NEWID()
ELSE NULL --prevrow.engagement_guid
END AS engagement_guid
FROMSessionAlloc currow
LEFT JOIN
SessionAlloc prevrow
ONcurrow.user_id= prevrow.user_id
ANDcurrow.site_id= prevrow.site_id
ANDcurrow.rownum= prevrow.rownum + 1
ORDER BY
currow.user_id,
currow.site_id,
currow.visit_start_date_gmt
The problem I have is: This puts a new guid on the first visit of each new session, but the other visits that belong to that session have null in their guid column, and I'm stuck when it comes to an efficient way of adding the relevant guid for that session to those visits that belong to it.
I tried creating the select statement as a view and then reffereing back to itself, but the NEWID function means that the visits have different guids in each version of the view I refer to!!!
I hope that makes some sense - I've been staring at this for hours now and I'm getting the SQL fog decending...
Thanks in advance for any advice/help you can lend me....
September 20, 2010 at 1:01 pm
Please provide some ready to use sample data for your current_visit_test table in the form of
INSERT INTO...
SELECT... UNION ALL
SELECT...
together with your expected result based on those sample data.
You might want to limit the number of columns to the ones referenced in the query and I also strongly recommend to provide fake data.
September 20, 2010 at 6:44 pm
Seem like you need another CTE. Something like (since no data is provided this is untested):
;WITH SessionAlloc AS
(
SELECT
visit_id,
user_id,
site_id,
user_visit_id,
user_source_id,
visit_start_date_gmt,
visit_end_date_gmt,
ROW_NUMBER() OVER (Partition BY user_id, site_id ORDER BY visit_start_date_gmt) AS rownum
FROM
current_visit_test
),
Uniqify
(
SELECT
currow.visit_id,
currow.site_id,
currow.user_id,
currow.rownum,
currow.visit_start_date_gmt as current_start_datetime,
prevrow.visit_end_date_gmt as previous_end_datetime,
DATEDIFF(mi, prevrow.visit_end_date_gmt, currow.visit_start_date_gmt) as minutes_since_prev_visit,
CASE WHEN DATEDIFF(mi, prevrow.visit_end_date_gmt, currow.visit_start_date_gmt) IS NULL THEN NEWID()
WHEN DATEDIFF(mi, prevrow.visit_end_date_gmt, currow.visit_start_date_gmt) > 30 THEN NEWID()
ELSE NULL --prevrow.engagement_guid
END AS engagement_guid
FROM
SessionAlloc currow LEFT JOIN
SessionAlloc prevrow
ON currow.user_id = prevrow.user_id AND
currow.site_id = prevrow.site_id AND
currow.rownum = prevrow.rownum + 1
)
SELECT
* -- put your desired query here
FROM
uniqify
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
Viewing 3 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply