Cursors Be Gone!

  • And, just to be 100% clear... would you give a couple of actual examples of the various cookie formats you're expecting so I can make a good cross section of cookie types, as well? Thanks.

    Last but not least, would you explain to folks why you're adding two days to the date/time of each cookie?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (1/2/2009)


    I agree... JacRoberts, would you provide the CREATE TABLE statement for the dbo.WebLogEvents table, please. From that, I'll make a million row table over a period of 10 years for 50,000 different session id's... unless you have a different "spread" that more closely approximates your data.

    The sessionId is hidden in the [cs(Cookie)] Column.

    An example row is:

    perm_track=9be22f85-0aeb-492e-a1cb-0ef74043bf41;+__utma=131431056.2740713761679478300.1230240763.1230240763.1230240763.1;+__utmz=131431056.1230240763.1.1.utmcsr=google|utmccn=(organic)|utmcmd=organic|utmctr=trains%20to%20leeds;+sess_track=20c12566-eddb-43cf-a9a7-974b027a3834;+ASP.NET_SessionId=iyjuwx45r2rl5455fmdqpp55

    And the sessionId for that value is iyjuwx45r2rl5455fmdqpp55 which can be located anywhere in the string hence all the CharIndex stuff in the SQL.

    The table has about 10,000,000 rows added per day and about 60,000 site visits per day it stores just 8 days worth of data as all data over 8 days old is deleted each night, there is only 1 non-unique index on the [c-ip] column. There are about 80 million rows on that table all together.

    The 'date' column is just the date without the time part. And the 'time' column is just a time in '1899-12-30'

    You probably don't need all the columns but here is the actual table definition:

    CREATE TABLE [dbo].[WebLogEvents](

    [date] [datetime] NULL,

    [time] [datetime] NULL,

    [s-sitename] [varchar](255) NULL,

    [s-ip] [varchar](50) NULL,

    [cs-method] [varchar](50) NULL,

    [cs-uri-stem] [varchar](255) NULL,

    [cs-uri-query] [varchar](2048) NULL,

    [s-port] [varchar](50) NULL,

    [cs-username] [varchar](50) NULL,

    [c-ip] [varchar](50) NULL,

    [cs(User-Agent)] [varchar](255) NULL,

    [cs(Cookie)] [varchar](2048) NULL,

    [cs(Referer)] [varchar](2048) NULL,

    [sc-status] [int] NULL,

    [sc-substatus] [int] NULL,

    [sc-win32-status] [int] NULL,

    [sc-bytes] [int] NULL,

    [cs-bytes] [int] NULL,

    [time-taken] [int] NULL

    ) ON [PRIMARY]

  • Cool... that 1899-12-30 thingy about the time also explains why you add 2 days to the combination of date and time. Thanks, Jac.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • TheSQLGuru (1/2/2009)

    1) Tom, can you or anyone else give an example of UPDATEs that cannot be done without using the FROM clause that can be done with it's use?

    2) I interacted with Hugo on a number of occassions at last years MVP summit and I can tell you from first hand experience he is a very smart (and thorough) guy. I for one will never dismiss out of hand anything that he blogs about.

    Clearly it isn't possible to do anything with the clause that is impossible without it if one ignores performance. SQL is Turing-complete. There may be examples where the FROM clause can't be replaced by a string of select subclauses so that the same effect (ignoring legibility and performance) can't be achieved in a single SQL statement, but I doubt it. If you want an example where the cost in legibility and performance is unacceptably high, look in HK's blog where he gives a fine example. The performance issue is that the optimiser can't easily see what's going on - the FROM cause is much easier to understand mechanically. The clarity/legibility issue is a language expressiveness issues - there are other ways of getting clarity. HK suggests using row assignments, which I first came across in discussions in the 1980s about what we (ICL, Bull, Siemens, various Academic institutions) hoped "SQL3" might contain (and we might include in our joint massively parallel SQL implementation experiment - but we left this one out); in my view this is the correct approach in many cases of using "FROM" in update, but no MS implementation - not even SQLS 2008 - supports it.

    HK himself gives a nice example where there is a vast performance difference in the very blog entry that deprecates the FROM clause. The blog entry does not say starkly "let's abolish FROM in UPDATE in TSQL", it says something like "Remember that FROM in UPDATE is not part of the SQL Standard and use it only when gains in performance or clarity are such as to offset the resulting non-portability" - something with which I'm wholly in agreement. HK himself says he would use the FROM clause in the example he gives in TSQL releases that don't have the MERGE feature ("For that performance gain, I will gladly choose the proprietary syntax over the standard!" are the words he uses). So while I maintain that anyone who said "let's abolish this feature" would be a nut, and the impression I got from Jeff's post was that Hugo was saying that, I certainly don't think that what he actually does say is the slightest bit nutty.

    Tom

    Tom

  • Heh... an I'll maintain that what he said is nutty for mor than 1 reason. Consider the title and the first sentence...

    [font="Arial Black"]Let's deprecate UPDATE FROM! [/font]

    I guess that many people using UPDATE … FROM on a daily basis do so without being aware that they are violating all SQL standards.

    Now, if that doesn't sound like he'd like to get rid of it, then, perhaps this other thing he said does...

    With this alternative available, I fail to see any reason why the proprietary UPDATE FROM syntax should be maintained. In my opinion, it can safely be marked as deprecated in SQL Server 2008.

    ... and, maybe even that doesn't do it for you. But this sure as hell does for me...

    but it should be marked as deprecated, and it should eventually be removed from the product.

    My answer to all of that is that code portability is a myth (and a nutty one, at that). Every RDBMS has it's own extensions to the cruddy and way-behind-the-real-world ANSI standards. In order to have true code portability, every database vendor must meet all of the standards (which, are pretty useless for some things to be done) and everyone must ignore some of the better features the various vendors have offered in the form of extensions to the language. If that's the case, then why even bother having different database vendors?

    On the fringe of chaos lives innovation and new technology... the myth of code portability being realized for SQL would destroy both because there would be no need for any form of competition between vendors. We'd all end up using the same crap and, yes, it would all be crap.

    That would be like saying there shall not be Java or C# anymore... instead, we'll only have one language called Cava so that everyone can write portable code. BWAAA=HAAA!!!! That's when I'll go back to writing code in Power Basic and blow everyones' doors off... 😛

    I'd also like Microsoft to stop deprecating stuff... they're taking away stuff that's not broken and is frequently better than what it's being replaced by. :w00t:

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ok... from JacRobert's description, Create Table statement, and a couple of good guesses like the fact that since all the columns are nullable, there's no primary key, here's a chunk of code that'll build a million row test table.

    Now, based on what I saw in JacRobert's code, I've changed the scope of the test code generator just a tiny bit. Instead of a million rows across ten years, it's a million rows across a single month for 50,000 session ID's to simulate a million "hits" on a web site in a month. Here's the code... lemme know if we need any changes so that we can all work from the same thing. As always, details are in the comments...

    --=================================================================================================

    -- Create and populate a 1,000,000 row WebLogEvents test table.

    -- Jeff Moden

    --=================================================================================================

    --===== First, change to a nice safe place that we all have...

    USE TempDb

    --===== If the test table already exists here, drop it so we can rebuild it for reruns.

    IF OBJECT_ID('TempDB.dbo.WebLogEvents','U') IS NOT NULL

    DROP TABLE TempDB.dbo.WebLogEvents

    GO

    --===== Declare a some local variables to handle all the presets well need...

    DECLARE @StartDate DATETIME, --First date in events table

    @NumberOfDays INT, --Max number of different days in the events table

    @TotalRows INT, --Total number of rows in the events table

    @CookieMain VARCHAR(2048), --All the same... will have a session number appended

    @Sessions INT --The max number of unique sessions in the events table

    --===== ... and then assign them the presets we want.

    SELECT @StartDate = '2008-12-01',

    @NumberOfDays = 31,

    @TotalRows = 1000000,

    @CookieMain = 'perm_track=9be22f85-0aeb-492e-a1cb-0ef74043bf41;'

    + '+__utma=131431056.2740713761679478300.1230240763.1230240763.1230240763.1;'

    + '+__utmz=131431056.1230240763.1.1.utmcsr=google|utmccn=(organic)'

    + '|utmcmd=organic|utmctr=trains%20to%20leeds;'

    + '+sess_track=20c12566-eddb-43cf-a9a7-974b027a3834;'

    + 'ASP.NET_SessionId=iyjuwx45r2rl5455fmdqpp55',

    @Sessions = 50000

    --===== All set... build the table and populate it with random data according to the presets

    -- We'll also measure how long this takes, just for grins... see the "Messages" tab

    SET STATISTICS TIME ON

    SELECT TOP (@TotalRows)

    [Date] = @StartDate + ABS(CHECKSUM(NEWID())) % @NumberOfDays,

    [Time] = CAST('1899-12-30' AS DATETIME) + RAND(CHECKSUM(NEWID())),

    [cs(Cookie)] = CAST(@CookieMain + CAST(ABS(CHECKSUM(NEWID()))%@Sessions+1 AS VARCHAR(10)) AS VARCHAR(2048))

    INTO dbo.WebLogEvents

    FROM Master.dbo.SysColumns t1

    CROSS JOIN Master.dbo.SysColumns t2

    SET STATISTICS TIME OFF

    GO

    --===== And, finally, let's see what some of the data looks like and what

    -- the min and max for each column is as a sanity check. This takes some time.

    SELECT TOP 100 * FROM dbo.WebLogEvents

    SELECT MIN(Date) AS MinDate,

    MAX(Date) AS MaxDate,

    MIN(Time) AS MinTime,

    MAX(Time) AS MaxTime,

    MIN([cs(Cookie)]) AS MinCookie,

    MAX([cs(Cookie)]) AS MaxCookie

    FROM dbo.WebLogEvents

    [font="Arial Black"]Hey, Phil Factor[/font]... do you want to do the honors on the "quirky update" method, or do you want me to?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (1/2/2009)


    My answer to all of that is that code portability is a myth (and a nutty one, at that). ...snip... If that's the case, then why even bother having different database vendors?

    Funny - that was in the long post I lost. Agree 100%.

    Jeff Moden (1/2/2009)


    I'd also like Microsoft to stop deprecating stuff... they're taking away stuff that's not broken and is frequently better than what it's being replaced by. :w00t:

    I don't agree here. Yes - when they're deprecating stuff that's better than the replacement, then that sucks, and deprecation for deprcation's sake is also lame... but...

    I have to say I am a big fan of the breaking change in the right places. Some stuff from .NET 1.1 was just poorly thought out, for example. I am glad that someone at MS thought 'you know what guys, let's not live with this forever - let's deprecate, accept our mistakes and move on'.

    Similary - I think the idea of defined rules and defaults, and having them bound to columns / data types sucks in SQL Server. One of the things I am doing is a schema replication program, and working with those types doesn't do it for me. I think CHECK / DEFAULT constraints are a definite step forward in that respect.

    And lastly - one of the reason Linux hacks me off is that compatibility with the 1970's seems to be an important factor. I know, it's a contentious point criticising Linux. But really, there are some amazingly smart people working on/with that OS, and I just wonder what they would come up with, if they were given a clean slate and put usability at the core of their architecture... I think it could be brilliant. Similarly, I think singularity has the promise of a great OS - but some marketing dude at MS is going to have a benny when they realise it's not binary compatible, and that will be where the layers of mediocrity creep in.

    Semi-thinking out loud here.

    wrt the script - top job. I'll get on with the set based method later on - i'm pretty sure it will come in second place, but i want to do it anyway for completeness and an excercise. For now i'm just putting the nippers to bed...

    Atlantis Interactive - SQL Server Tools
    My blog[/url]
    Why I wrote a sql query analyzer clone

  • Hey, Phil Factor... do you want to do the honors on the "quirky update" method, or do you want me to?

    --Jeff Moden

    [p]Over to you Jeff. I had to nip off to do DBW. I'm now off to bed! I'll see how you got on tomorrow. Your build script looks very cool![/p]

    Best wishes,
    Phil Factor

  • Jeff Moden (1/2/2009)


    Ok... from JacRobert's description, Create Table statement, and a couple of good guesses like the fact that since all the columns are nullable, there's no primary key, here's a chunk of code that'll build a million row test table.

    Now, based on what I saw in JacRobert's code, I've changed the scope of the test code generator just a tiny bit. Instead of a million rows across ten years, it's a million rows across a single month for 50,000 session ID's to simulate a million "hits" on a web site in a month. Here's the code... lemme know if we need any changes so that we can all work from the same thing. As always, details are in the comments...

    The table has about 10,000,000 rows added per day and about 60,000 site visits per day it stores just 8 days worth of data as all data over 8 days old is deleted each night, there is only 1 non-unique index on the [c-ip] column, this column is not used in the TVF. There are about 80 million rows on that table all together. The average time a user spends on a site is about 20 minutes with about 2 percent spending over 30 minutes. For the test it probably doesn't matter too much as long as there are a large number of rows.

    The process to count site visits takes place once a day, for the previous day, so you only need to aggregate the data for 1 day.

    I can test the end result on real data if you make the input and output of the table valued function the same as the one I pasted in earlier.

  • Ok - here's one for you then...

    Before i start - a couple of things. Given the volume of data you're talking about, i'm not confident my first method would be quicker. Because it's a function i was forced to use table variables, and not temp tables as i'd have preferred for such a large data set.

    Also, i'm not convinced the logic is right in your function. I think the datediff(minute, @CurrDateTime, @PrevDateTime) should be datediff(minute, @PrevDateTime, @CurrDateTime). This is because how it is in the code you pasted, if @CurrDateTime is later then @PrevDateTime (which it would be always) then the value returned would be negative in the parameter order specified. This would mean that it would never be > 30, however it may be < -30. So it could well be that you are undercounting there?

    Another thing was that I found that, because of the lack of indexing, reading from the table was more efficient in two parts. I altered the cursor declaration (pasted below) and I found that on my machine the existing way took 31.828 and the new way took 4.437. Anyway, here's some code.

    Direct functional replacement, best I could get on Jeff's test data

    Exec time 7.046

    -- *******************************************************************

    -- FUNCTION SiteVisits

    -- Description: This function creates a table with a row for each site visit

    -- for a day.

    -- *******************************************************************

    CREATE FUNCTION [dbo].[SiteVisitsMattTestData]

    (

    @Date datetime

    )

    RETURNS @Table TABLE

    (

    [Hour] tinyint, -- 0 to 23

    [FiveMinute] tinyint -- 0 to 11

    )

    AS

    BEGIN

    DECLARE @ASPString nvarchar(100)

    DECLARE @Len int

    DECLARE @DateMinus1 datetime

    DECLARE @ElevenThirty datetime

    SET @ElevenThirty = '1899-12-30 23:30:00.000'

    SET @ASPString = 'ASP.NET_SessionId='

    SET @Len = Len(@ASPString)

    SET @Date = Convert(varchar, @Date, 112)

    SET @DateMinus1 = DateAdd(dd, -1, @Date)

    -- ok - so get out the data for today and yesterday into separate temp tables

    -- - the query processor and OR are not best of friends

    DECLARE @sessionstoday TABLE (EventDateTime datetime, ASPSessionID varchar(40))

    DECLARE @sessionsyesterday TABLE (EventDateTime datetime, ASPSessionID varchar(40))

    -- todays

    INSERT INTO @sessionstoday (EventDateTime, ASPSessionID)

    SELECT DateAdd(dd, 2, Date + [Time]) EventDateTime,

    CASE

    WHEN CharIndex(';', [cs(Cookie)], CharIndex(@ASPString , [cs(Cookie)])) > 1 THEN

    SubString([cs(Cookie)], CharIndex(@ASPString, [cs(Cookie)])

    + @Len, CharIndex(';', [cs(Cookie)], CharIndex(@ASPString, [cs(Cookie)]))

    - CharIndex(@ASPString, [cs(Cookie)])- @Len)

    ELSE

    SubString( [cs(Cookie)], CharIndex(@ASPString , [cs(Cookie)]) + @Len, 100)

    END ASPSessionId

    FROM dbo.WebLogEvents WITH (NOLOCK)

    WHERE Date = @date

    AND CharIndex(@ASPString , [cs(Cookie)]) > 0

    -- yesterdays

    INSERT INTO @sessionsyesterday (EventDateTime, ASPSessionID)

    SELECT DateAdd(dd, 2, Date + [Time]) EventDateTime,

    CASE

    WHEN CharIndex(';', [cs(Cookie)], CharIndex(@ASPString , [cs(Cookie)])) > 1 THEN

    SubString([cs(Cookie)], CharIndex(@ASPString, [cs(Cookie)])

    + @Len, CharIndex(';', [cs(Cookie)], CharIndex(@ASPString, [cs(Cookie)]))

    - CharIndex(@ASPString, [cs(Cookie)])- @Len)

    ELSE

    SubString( [cs(Cookie)], CharIndex(@ASPString , [cs(Cookie)]) + @Len, 100)

    END ASPSessionId

    FROM dbo.WebLogEvents WITH (NOLOCK)

    WHERE Date = @DateMinus1 AND [Time] >= @ElevenThirty

    AND CharIndex(@ASPString , [cs(Cookie)]) > 0

    -- now create a unified table with a clustered PK for both sets of data

    DECLARE @sessions TABLE (ID int identity(1,1) primary key clustered, EventDateTime datetime, ASPSessionID varchar(40))

    -- and insert in the right order

    insert INTO @sessions (EventDateTime, ASPSessionID)

    SELECT EventDateTime, ASPSessionID FROM

    (

    SELECT EventDateTime, ASPSessionID FROM

    @sessionstoday

    UNION ALL

    SELECT EventDateTime, ASPSessionID FROM

    @sessionsyesterday

    ) itbl

    ORDER BY ASPSessionID, EventDateTime

    -- now insert into the functional result table

    INSERT INTO @Table

    (

    [Hour],

    [FiveMinute]

    )

    -- the hour and five minute marker

    select datepart(hour, eventdatetime), datepart(minute, eventdatetime) / 5 FROM

    (

    -- from a subquery that finds the previous hit by inner joining onto the previous ID

    -- and selects out the current hit's ID, evendatetime, the minutes between them, and the session IDs from each

    SELECT s1.id, s1.eventdatetime, datediff(minute, s2.eventdatetime, s1.eventdatetime) as timediff,

    s1.aspsessionid as sid1, s2.aspsessionid as sid2

    FROM @sessions s1 INNER JOIN

    @sessions s2

    on s1.ID = s2.ID + 1

    UNION ALL

    -- union the first row in, as the above join will never pick it up

    SELECT s1.id, s1.eventdatetime, 999 as timediff, s1.aspsessionid as sid1, '' as sid2

    FROM @sessions s1 where id = 1

    ) itbl

    -- where the hit is on the right day

    WHERE convert(datetime,convert(varchar, eventdatetime, 112)) = @date

    -- and the session id is not the same as the previous

    and ((sid1 <> sid2) or

    -- or the time differential is > 30 for the same session id

    (timediff > 30))

    -- order by ID because we want the result sets to be *exactly* the same

    ORDER BY id

    RETURN

    END

    Temp table version of the above, outside the scope of a function

    Exec time 3.843

    DECLARE @Date datetime

    SET @Date = '20081220'

    DECLARE @Table TABLE

    (

    [Hour] tinyint, -- 0 to 23

    [FiveMinute] tinyint -- 0 to 11

    )

    BEGIN

    DECLARE @ASPString nvarchar(100)

    DECLARE @Len int

    DECLARE @DateMinus1 datetime

    DECLARE @ElevenThirty datetime

    SET @ElevenThirty = '1899-12-30 23:30:00.000'

    SET @ASPString = 'ASP.NET_SessionId='

    SET @Len = Len(@ASPString)

    SET @Date = Convert(varchar, @Date, 112)

    SET @DateMinus1 = DateAdd(dd, -1, @Date)

    CREATE TABLE #sessions(ID int IDENTITY(1,1) primary key clustered, EventDateTime datetime, ASPSessionID varchar(40))

    INSERT INTO #sessions (EventDateTime, ASPSessionID)

    SELECT EventDateTime, ASPSessionID FROM

    (

    SELECT DateAdd(dd, 2, Date + [Time]) EventDateTime,

    CASE

    WHEN CharIndex(';', [cs(Cookie)], CharIndex(@ASPString , [cs(Cookie)])) > 1 THEN

    SubString([cs(Cookie)], CharIndex(@ASPString, [cs(Cookie)])

    + @Len, CharIndex(';', [cs(Cookie)], CharIndex(@ASPString, [cs(Cookie)]))

    - CharIndex(@ASPString, [cs(Cookie)])- @Len)

    ELSE

    SubString( [cs(Cookie)], CharIndex(@ASPString , [cs(Cookie)]) + @Len, 100)

    END ASPSessionId

    FROM dbo.WebLogEvents WITH (NOLOCK)

    WHERE Date = @date

    AND CharIndex(@ASPString , [cs(Cookie)]) > 0

    UNION ALL

    SELECT DateAdd(dd, 2, Date + [Time]) EventDateTime,

    CASE

    WHEN CharIndex(';', [cs(Cookie)], CharIndex(@ASPString , [cs(Cookie)])) > 1 THEN

    SubString([cs(Cookie)], CharIndex(@ASPString, [cs(Cookie)])

    + @Len, CharIndex(';', [cs(Cookie)], CharIndex(@ASPString, [cs(Cookie)]))

    - CharIndex(@ASPString, [cs(Cookie)])- @Len)

    ELSE

    SubString( [cs(Cookie)], CharIndex(@ASPString , [cs(Cookie)]) + @Len, 100)

    END ASPSessionId

    FROM dbo.WebLogEvents WITH (NOLOCK)

    WHERE Date = @DateMinus1 AND [Time] >= @ElevenThirty

    AND CharIndex(@ASPString , [cs(Cookie)]) > 0

    ) itbl

    ORDER BY ASPSessionID, EventDateTime

    INSERT INTO @Table

    (

    [Hour],

    [FiveMinute]

    )

    select datepart(hour, eventdatetime), datepart(minute, eventdatetime) / 5 FROM

    (

    SELECT s1.id, s1.eventdatetime, datediff(minute, s2.eventdatetime, s1.eventdatetime) as timediff,

    s1.aspsessionid as sid1, s2.aspsessionid as sid2

    FROM #sessions s1 INNER JOIN

    #sessions s2

    on s1.ID = s2.ID + 1

    UNION ALL

    -- union the first row in, as the above join will never pick it up

    SELECT s1.id, s1.eventdatetime, 999 as timediff, s1.aspsessionid as sid1, '' as sid2

    FROM #sessions s1 where id = 1

    ) itbl

    WHERE convert(datetime,convert(varchar, eventdatetime, 112)) = @date

    and ((sid1 <> sid2) or

    (timediff > 30))

    ORDER BY id

    END

    Revised declaration from cursor, reduced exec time of cursor method from 31.828 to 4.437

    /****** Object: UserDefinedFunction [dbo].[SiteVisits] Script Date: 01/03/2009 00:06:48 ******/

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    -- *******************************************************************

    -- FUNCTION SiteVisits

    -- Description: This function creates a table with a row for each site visit

    -- for a day.

    -- *******************************************************************

    CREATE FUNCTION [dbo].[SiteVisitsNewCursor]

    (

    @Date datetime

    )

    RETURNS @Table TABLE

    (

    [Hour] tinyint, -- 0 to 23

    [FiveMinute] tinyint -- 0 to 11

    )

    AS

    BEGIN

    DECLARE @ASPString nvarchar(100)

    DECLARE @Len int

    DECLARE @DateMinus1 datetime

    DECLARE @ElevenThirty datetime

    SET @ElevenThirty = '1899-12-30 23:30:00.000'

    SET @ASPString = 'ASP.NET_SessionId='

    SET @Len = Len(@ASPString)

    SET @Date = Convert(varchar, @Date, 112)

    SET @DateMinus1 = DateAdd(dd, -1, @Date)

    DECLARE @PrevDateTime datetime

    DECLARE @PrevSessionId nvarchar(50)

    SET @PrevDateTime = '1999-01-01' --Initialise

    SET @PrevSessionId = 'xxxxxxxx' --Initialise

    DECLARE @CurrDateTime datetime

    DECLARE @CurrSessionId nvarchar(50)

    DECLARE @NewVisit bit

    DECLARE SessionCursor CURSOR LOCAL FORWARD_ONLY STATIC READ_ONLY FOR -- Fulfilment site Downtime Cursor

    SELECT EventDateTime, ASPSessionID FROM

    (

    SELECT DateAdd(dd, 2, Date + [Time]) EventDateTime,

    CASE

    WHEN CharIndex(';', [cs(Cookie)], CharIndex(@ASPString , [cs(Cookie)])) > 1 THEN

    SubString([cs(Cookie)], CharIndex(@ASPString, [cs(Cookie)])

    + @Len, CharIndex(';', [cs(Cookie)], CharIndex(@ASPString, [cs(Cookie)]))

    - CharIndex(@ASPString, [cs(Cookie)])- @Len)

    ELSE

    SubString( [cs(Cookie)], CharIndex(@ASPString , [cs(Cookie)]) + @Len, 100)

    END ASPSessionId

    FROM dbo.WebLogEvents WITH (NOLOCK)

    WHERE Date = @DateMinus1

    AND [Time] >= @ElevenThirty

    AND CharIndex(@ASPString , [cs(Cookie)]) > 0

    UNION ALL

    SELECT DateAdd(dd, 2, Date + [Time]) EventDateTime,

    CASE

    WHEN CharIndex(';', [cs(Cookie)], CharIndex(@ASPString , [cs(Cookie)])) > 1 THEN

    SubString([cs(Cookie)], CharIndex(@ASPString, [cs(Cookie)])

    + @Len, CharIndex(';', [cs(Cookie)], CharIndex(@ASPString, [cs(Cookie)]))

    - CharIndex(@ASPString, [cs(Cookie)])- @Len)

    ELSE

    SubString( [cs(Cookie)], CharIndex(@ASPString , [cs(Cookie)]) + @Len, 100)

    END ASPSessionId

    FROM dbo.WebLogEvents WITH (NOLOCK)

    WHERE Date = @Date

    AND CharIndex(@ASPString , [cs(Cookie)]) > 0

    ) itbl

    ORDER BY ASPSessionID, EventDateTime

    OPEN SessionCursor

    FETCH NEXT -- Get the first values into the @Curr variables

    FROM SessionCursor

    INTO @CurrDateTime,

    @CurrSessionId

    WHILE (@@Fetch_Status = 0)

    BEGIN

    SET @NewVisit = 0 --Initialise

    IF @CurrSessionId <> @PrevSessionId -- New @CurrSessionId

    BEGIN

    IF Convert(varchar, @CurrDateTime, 112) = @Date -- It's for the day we are looking at date

    BEGIN

    SET @NewVisit = 1

    END

    --END IF

    END

    ELSE --It is the same session Id so test at least 30 mins since last logged

    BEGIN

    IF Convert(varchar, @CurrDateTime, 112) = @Date -- It's for the day we are looking at date

    BEGIN

    IF DateDiff(minute, @CurrDateTime, @PrevDateTime) > 30

    BEGIN

    SET @NewVisit = 1

    END

    --END IF

    END

    --END IF

    END

    --END IF

    IF (@NewVisit = 1) --Insert a row into the table

    BEGIN

    INSERT INTO @Table

    (

    [Hour],

    [FiveMinute]

    )

    VALUES

    (

    Cast(DatePart(hh, @CurrDateTime) as tinyint),

    Cast(DatePart(minute, @CurrDateTime)/5 as tinyint)

    )

    END

    --END IF

    SET @PrevDateTime = @CurrDateTime

    SET @PrevSessionId = @CurrSessionId

    FETCH NEXT -- Get the first values into the @prev variables

    FROM SessionCursor

    INTO @CurrDateTime,

    @CurrSessionId

    END

    --END WHILE

    CLOSE SessionCursor

    DEALLOCATE SessionCursor

    RETURN

    END

    GO

    Atlantis Interactive - SQL Server Tools
    My blog[/url]
    Why I wrote a sql query analyzer clone

  • Matt Whitfield (1/2/2009)


    Before i start - a couple of things. Given the volume of data you're talking about, i'm not confident my first method would be quicker. Because it's a function i was forced to use table variables, and not temp tables as i'd have preferred for such a large data set.

    I am not able to test the performance of the different methods until Monday.

    The stored procedure that calls the function just takes the results table from the TVF and inserts this into a reporting table number of new site visits for every 5 minute period in a day, it does this by inserting a:

    SELECT @Date, [Hour], [FiveMinute], Count(*)

    FROM SiteVisits(@Date)

    GROUP BY [Hour], [FiveMinute]

    There is no reason why it shouldn't all be done inside one stored procedure without a TVF.

    Also, i'm not convinced the logic is right in your function. I think the datediff(minute, @CurrDateTime, @PrevDateTime) should be datediff(minute, @PrevDateTime, @CurrDateTime). This is because how it is in the code you pasted, if @CurrDateTime is later then @PrevDateTime (which it would be always) then the value returned would be negative in the parameter order specified. This would mean that it would never be > 30, however it may be < -30. So it could well be that you are undercounting there?

    Thanks, looks like you've spotted a bug!

    Another thing was that I found that, because of the lack of indexing, reading from the table was more efficient in two parts. I altered the cursor declaration (pasted below) and I found that on my machine the existing way took 31.828 and the new way took 4.437. Anyway, here's some code.

    That's interesting. I can't see why it should take longer as your method would require 2 full table scans as opposed to just 1 table scan if an OR were used in the WHERE clause? Maybe it's calculating the AND CharIndex(@ASPString , [cs(Cookie)]) > 0 for the whole table first rather than evaluating:AND Date = @Date

    OR (Date = @DateMinus1

    AND [Time] >= @ElevenThirty)and then afterwards evaluating: AND CharIndex(@ASPString , [cs(Cookie)]) > 0

  • Matt Whitfield (1/2/2009)


    Temp table version of the above, outside the scope of a function

    Exec time 3.843

    Oh, now I know I need to upgrade my poor ol' 6 year old work horse... that little slice of computational heaven took over 28 seconds on my machine with no index on the test table.

    Still if I can get close to your time on my box, it should absolutely wail on yours... I vote you be the final tester for this soiree. 😛

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Flipping between watching Alabama get stomped, playing with 3yo daughter and surfing so not much detailed review. But is it possible that some good ol' fashioned indexing will help out here??

    Look forward to the continuation of this thread!

    Hey, good job to whomever found the bug in the logic too!

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • You seem pretty sure. Have you tested it?

    Tested / Worked with = Yes

    Imagination is more important than knowledge.

    – Albert Einstein

  • jacroberts (1/2/2009)


    The stored procedure that calls the function just takes the results table from the TVF and inserts this into a reporting table number of new site visits for every 5 minute period in a day, it does this by inserting a:

    SELECT @Date, [Hour], [FiveMinute], Count(*)

    FROM SiteVisits(@Date)

    GROUP BY [Hour], [FiveMinute]

    There is no reason why it shouldn't all be done inside one stored procedure without a TVF.

    Then, if no one has any objections, there are certain huge peformance benefits to NOT using a TVF in this case... I'm just gonna do it as a sproc if no one has any serious objections.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 106 through 120 (of 272 total)

You must be logged in to reply to this topic. Login to reply