Cursors Be Gone!

  • Jeff Moden (1/3/2009)


    Then, if no one has any objections, there are certain huge peformance benefits to NOT using a TVF in this case... I'm just gonna do it as a sproc if no one has any serious objections.

    Not from me - i'm just doing the same 🙂

    Atlantis Interactive - SQL Server Tools
    My blog[/url]
    Why I wrote a sql query analyzer clone

  • You know, I'm not sure how much point there is to this particular exercise. 90% of the time is in the intiial scanning of the WebLogEvents table and as there is not clustered index and indeed, no relevant index at all, there is no way to avoid scanning the entire table every time.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Ok then - my entry, given the size of the dataset will be the code below.

    I'm not sure if the UNION approach to getting the data out of the table will work better on jacroberts' database - because a table scan of that magnitude would be considerably expensive.

    Anyway - this on my machine weighs in at 3.562.

    DECLARE @Date datetime

    SET @Date = '20081220'

    DECLARE @ASPString nvarchar(100)

    DECLARE @Len int

    DECLARE @DateMinus1 datetime

    DECLARE @ElevenThirty datetime

    SET @ElevenThirty = '1899-12-30 23:30:00.000'

    SET @ASPString = 'ASP.NET_SessionId='

    SET @Len = Len(@ASPString)

    SET @Date = Convert(varchar, @Date, 112)

    SET @DateMinus1 = DateAdd(dd, -1, @Date)

    CREATE TABLE #sessions(ID int IDENTITY(1,1) primary key clustered, EventMinuteIndex smallint, ASPSessionID char(40))

    INSERT INTO #sessions (EventMinuteIndex, ASPSessionID)

    SELECT EventMinuteIndex, ASPSessionID FROM

    (

    SELECT datediff(minute, '18991230', [Time]) as EventMinuteIndex,

    CASE

    WHEN CharIndex(';', [cs(Cookie)], CharIndex(@ASPString , [cs(Cookie)])) > 1 THEN

    SubString([cs(Cookie)], CharIndex(@ASPString, [cs(Cookie)])

    + @Len, CharIndex(';', [cs(Cookie)], CharIndex(@ASPString, [cs(Cookie)]))

    - CharIndex(@ASPString, [cs(Cookie)])- @Len)

    ELSE

    SubString( [cs(Cookie)], CharIndex(@ASPString , [cs(Cookie)]) + @Len, 100)

    END ASPSessionId

    FROM dbo.WebLogEvents WITH (NOLOCK)

    WHERE Date = @date

    AND CharIndex(@ASPString , [cs(Cookie)]) > 0

    UNION ALL

    SELECT datediff(minute, '18991230', [Time]) - 1440 as EventMinuteIndex,

    CASE

    WHEN CharIndex(';', [cs(Cookie)], CharIndex(@ASPString , [cs(Cookie)])) > 1 THEN

    SubString([cs(Cookie)], CharIndex(@ASPString, [cs(Cookie)])

    + @Len, CharIndex(';', [cs(Cookie)], CharIndex(@ASPString, [cs(Cookie)]))

    - CharIndex(@ASPString, [cs(Cookie)])- @Len)

    ELSE

    SubString( [cs(Cookie)], CharIndex(@ASPString , [cs(Cookie)]) + @Len, 100)

    END ASPSessionId

    FROM dbo.WebLogEvents WITH (NOLOCK)

    WHERE Date = @DateMinus1 AND [Time] >= @ElevenThirty

    AND CharIndex(@ASPString , [cs(Cookie)]) > 0

    ) itbl

    ORDER BY ASPSessionID, EventMinuteIndex

    SELECT @date as Date, hr, fm, count(*) FROM

    (

    select eventminuteindex / 60 as hr, (eventminuteindex / 5) % 12 as fm FROM

    (

    SELECT s1.eventminuteindex,

    CASE WHEN s1.eventminuteindex - s2.eventminuteindex > 30

    THEN 1

    ELSE 0

    END as enoughTimePassed,

    CASE WHEN s1.aspsessionid = s2.aspsessionid

    THEN 0

    ELSE 1

    END as SessionDifferent

    FROM #sessions s1 INNER JOIN

    #sessions s2

    on s1.ID = s2.ID + 1

    UNION ALL

    -- union the first row in, as the above join will never pick it up

    SELECT s1.eventminuteindex, 1 as enoughTimePassed, 1 as SessionDifferent

    FROM #sessions s1 where id = 1

    ) itbl

    WHERE EventMinuteIndex >= 0

    and (EnoughTimePassed + SessionDifferent) > 0

    ) otbl

    GROUP BY hr, fm

    Atlantis Interactive - SQL Server Tools
    My blog[/url]
    Why I wrote a sql query analyzer clone

  • RBarryYoung (1/3/2009)


    You know, I'm not sure how much point there is to this particular exercise. 90% of the time is in the intiial scanning of the WebLogEvents table and as there is not clustered index and indeed, no relevant index at all, there is no way to avoid scanning the entire table every time.

    I know where you're coming from.

    I did actually create a normalised data table whereby the aspsessionid was already extracted, and the clustered indexing was appropriate. The performance of the set based method there was quite good also - my best effort set based was 0.625 seconds, whereas the cursor method, refactored to use the new table, was 1.750.

    To the others - FYI the code I used to create the normalised table was

    DECLARE @ASPString nvarchar(100)

    DECLARE @Len int

    SET @ASPString = 'ASP.NET_SessionId='

    SET @Len = Len(@ASPString)

    CREATE TABLE [dbo].[WebLogEventsNormalised] (

    [ASPSessionID] varchar(40),

    [Date] datetime,

    [Time] datetime

    )

    CREATE CLUSTERED INDEX [IX_WebLogEventsNormalised] ON [dbo].[WebLogEventsNormalised] (ASPSessionID, Date, [Time])

    INSERT INTO [dbo].[WebLogEventsNormalised] ([Date], [Time], [ASPSessionID])

    SELECT [Date],

    [Time],

    CASE

    WHEN CharIndex(';', [cs(Cookie)], CharIndex(@ASPString , [cs(Cookie)])) > 1 THEN

    SubString([cs(Cookie)], CharIndex(@ASPString, [cs(Cookie)])

    + @Len, CharIndex(';', [cs(Cookie)], CharIndex(@ASPString, [cs(Cookie)]))

    - CharIndex(@ASPString, [cs(Cookie)])- @Len)

    ELSE

    SubString( [cs(Cookie)], CharIndex(@ASPString , [cs(Cookie)]) + @Len, 100)

    END ASPSessionId

    FROM dbo.WebLogEvents WITH (NOLOCK)

    WHERE CharIndex(@ASPString , [cs(Cookie)]) > 0

    Atlantis Interactive - SQL Server Tools
    My blog[/url]
    Why I wrote a sql query analyzer clone

  • repicurus (12/24/2008)


    I believe that Abed misses or eludes the entire point / need for the occasional use of cursors, bad though they may be:

    To loop through a record set and take some sort action(s) (possibly updating select columns) based on complex procedural logic, possibly involving other database access to adhere to business rules, that cannot be accomplished via set-at-a-time constructs.

    I would be happy to contribute TWO examples that I do not think could be accomplished WITHOUT cursors.

    Perhaps our resident 'expert' on bad cursors (as opposed to good cursors) would be so kind as to enlighten me as to the proper manner in which change my evil cursor ways.

    :rolleyes:

    DUDE! Bring It ON!

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • OK... here we go. First, I'm gonna recommend that you add the following index to the WebLogEvents table... it will NOT interfere with inserts or deletes, will not need any form of maintenance (no pages splits if inserts are always done in order by date and time) and it WILL make all date sensitive queries 30% faster (or more)...

    CREATE CLUSTERED INDEX IXC_WebLogEvents_Date_Time

    ON dbo.WebLogEvents (Date,Time) WITH FILLFACTOR = 100

    Now, based on the fact that JacRobert's said that he doesn't care whether or not the TVF is preserved or not, I went ahead and just did it as a stored procedure that returns a single result set.

    This proc returns the result set on my box in just over 6 seconds without the index above and in about 4 seconds with the recommended index above. Matt's fine code generally took between 18 and 28 seconds for the same day's testing. And, that's what I wanted to show folks... between using the Select Into (can't be done on table variables) and the awesome speed of the "pseudo cursor" (Phil Factor calls it the "quirky update"), [font="Arial Black"]you can do some trully awesome things that most people would use a cursor for.[/font]

    Here's the code and thanks for playing... as always, the details are commented in the code...

    CREATE PROCEDURE dbo.GetDailyVisitCount

    /*********************************************************************************************************************

    Purpose:

    Returns the visit count for each 5 minute period of each hour for the given day.

    Programmer notes and references:

    1. A "visit" is defined as a SessionId that hasn't been seen for over 30 minutes on the web site. So the data is

    processed sorted by SessionId and Datetime and the time between each session visit is measured to see if it is a

    new session or the same one.

    2. This sproc uses a "pseudo cursor update" which is discussed in detail in the following article:

    Solving the "Running Total" & "Ordinal Rank" Problems in SS 2k/2k5

    By Jeff Moden, 2008/01/31

    http://www.sqlservercentral.com/articles/Advanced+Querying/61716/

    Usage:

    EXEC dbo.GetDailyVisitCount @Date

    ... where @Date is anything that converts to a DATETIME data type.

    Revision History:

    Rev 00 - 03 Jan 2009 - Jeff Moden - Initial creation and test

    Reference: http://www.sqlservercentral.com/Forums/Topic625172-1373-2.aspx

    *********************************************************************************************************************/

    --===== Declare procedure I/O

    @pDate DATETIME

    AS

    --=====================================================================================================================

    -- Test Harness... uncomment this section and run from here down for on screen testing.

    -- See the final Select in this proc for addition field outputs

    --=====================================================================================================================

    --DECLARE @pDate DATETIME

    -- SET @pDate = '12/20/2008' --Must be a date that's actually in the dbo.WebLogEvents table at the time

    --=====================================================================================================================

    -- Presets

    --=====================================================================================================================

    --===== Declare and preset local variables

    -- Note... all of these variables are used with the "pseudo cursor update"

    -- to keep track of values from previous rows

    DECLARE @PrevSessionID VARCHAR(50),

    @PrevKeepDate DATETIME,

    @PrevKeep TINYINT

    SELECT @PrevSessionID = '',

    @PrevKeepDate = 0,

    @PrevKeep = 0

    --===== Ensure the date parameter is a whole date

    SELECT @pDate = DATEADD(dd,DATEDIFF(dd,0,@pDate),0)

    --===== Supress the autodisplay of rowcounts so only the result set will be returned

    SET NOCOUNT ON

    --=====================================================================================================================

    -- Copy data from the event table to a working table where we can work on it using a special index

    --=====================================================================================================================

    --===== Create and populate the working table on the fly

    SELECT CAST(SUBSTRING(Cookie,1,CHARINDEX(';',Cookie)-1) AS VARCHAR(50)) AS SessionID,

    TheDate+2.0 AS TheDate,

    CAST(0 AS TINYINT) AS KeepMe,

    Date AS DateOnly

    INTO #MyHead

    FROM (--==== Subquery does some of the calcs and isolates the start of the SessionID

    SELECT Date+Time AS TheDate,

    SUBSTRING([cs(Cookie)],CHARINDEX('ASP.NET_SessionId=',[cs(Cookie)])+18,8000)+';' AS Cookie,

    Date

    FROM dbo.WebLogEvents

    WHERE Date = @pDate

    OR (Date = @pDate-1 AND Time >= '1899-12-30 23:30')

    )p1

    --===== Add the necessary index to support the "pseudo cursor" update

    CREATE CLUSTERED INDEX IXC_#MyHead_SessionID_TheDate

    ON #MyHead (SessionID,TheDate) WITH FILLFACTOR = 100

    --=====================================================================================================================

    -- Mark "visits" according to the rules in "Programmer's notes #1" above

    --=====================================================================================================================

    --===== Do the "pseudo cursor" update to determine which rows to keep

    UPDATE #MyHead

    SET @PrevKeep = KeepMe = CASE WHEN SessionID = @PrevSessionID

    AND DATEDIFF(mi,@PrevKeepDate,TheDate)>30

    THEN 1

    WHEN SessionID <> @PrevSessionID

    THEN 1

    ELSE 0

    END,

    @PrevKeepDate = TheDate,

    @PrevSessionID = SessionID

    FROM #MyHead WITH(INDEX(0)) --Forces a clustered index scan to maintain the update order

    --=====================================================================================================================

    -- Return the required output as a single result set

    --=====================================================================================================================

    --===== Return the results

    SELECT @pDate AS Date,

    DATEPART(hh,TheDate) AS [Hour],

    DATEPART(mi,TheDate)/5 AS [FiveMinute],

    COUNT(*) AS TheCount

    FROM #MyHead

    WHERE KeepMe = 1

    AND DateOnly = @pDate

    GROUP BY DATEPART(hh,TheDate),

    DATEPART(mi,TheDate)/5

    --=====================================================================================================================

    -- Test Harness... uncomment this section for on screen testing.

    --=====================================================================================================================

    -- SELECT SessionID,

    -- TheDate,

    -- DateOnly,

    -- KeepMe,

    -- DATEPART(hh,TheDate) AS [Hour],

    -- DATEPART(mi,TheDate)/5 AS [FiveMinute]

    -- FROM #MyHead

    -- WHERE KeepMe = 1

    -- ORDER BY SessionID, TheDate

    GO

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (1/3/2009)


    jacroberts (1/2/2009)


    The stored procedure that calls the function just takes the results table from the TVF and inserts this into a reporting table number of new site visits for every 5 minute period in a day, it does this by inserting a:

    SELECT @Date, [Hour], [FiveMinute], Count(*)

    FROM SiteVisits(@Date)

    GROUP BY [Hour], [FiveMinute]

    There is no reason why it shouldn't all be done inside one stored procedure without a TVF.

    Then, if no one has any objections, there are certain huge peformance benefits to NOT using a TVF in this case... I'm just gonna do it as a sproc if no one has any serious objections.

    Yes, no need to be in a TVF.

  • Jeff,

    My solution was almost identical, except that I did an explicit create for the temporary table. (I'm showing my age here; It used to lock system tables for the whole implied transaction: i.e. the SELECT INTO, whereas the explicit creation of the temp table did it only for the CREATE)

    The other thing I missed was the WITH FILLFACTOR = 100. Neat!

    Goes like a rocket.

    Best wishes,
    Phil Factor

  • Phil Factor (1/3/2009)


    Jeff,

    My solution was almost identical, except that I did an explicit create for the temporary table. (I'm showing my age here; It used to lock system tables for the whole implied transaction: i.e. the SELECT INTO, whereas the explicit creation of the temp table did it only for the CREATE)

    The other thing I missed was the WITH FILLFACTOR = 100. Neat!

    Goes like a rocket.

    Heh... Guess I'll show my age, as well... I believe it was SP1 in SQL Server 6.5 that allowed for a fix using a startup parameter. When SQL Server 7.0 hit the streets, it had the fix built in from the git.

    Also, I thing it's absolutely hilarious that someone who uses the handle of "Phil Factor" missed the "Fill Factor" speed enhancement. 😛

    As always, Phil, I very much appreciate your feedback. It would be fun to work on a project with you, someday, even if it were remote.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Just a followup folks... I tried the original cursor/While loop based function on my box... now, keep in mind that it doesn't produce the required output... it just creates a return that would be used to return the final output. It took over 3 minutes. Even if you didn't know how to do the ultra fast methods I used, the standard method of using a self-joined table like that in Matt's good code still blew the doors off of the cursor method.

    The other thing I want everyone to notice is how simple the code I wrote is... the comments are longer than the code and the code is very simple to understand. That kinda blows the doors off two of the biggest excuses for writing cursors... the cursor code turned out to be much longer and much more complex than the set based code did. The set based code took less time to write, produced shorter and MUCH faster code, and will be easier to troubleshoot or modify in the future because it's so simple.

    There's two things to be learned here...

    1. Like I said before, people give up too easily on finding a set based method. 99.9999% of the time in SQL Server 2000, there's a set based method for everything you can imagine. The number goes to 100% in SQL Server 2005. There is not now, nor will there ever be a trully good reason to use a cursor or a while loop in SQL Server. Learn your trade well enough to know the highspeed ways around a cursor so that you never ever have to make the trade off between using the proper set based technology or using a cursor because it's simple to do so you can meet a schedule.

    2. If you ever think there's something that can't be done without a cursor in a high performance manner in SQL Server or that it will be quicker and easier to write a cursor so you can meet a &^%$#! schedule, see item 1 above. 😛

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Incredible result, Jeff. About once a week you post something that just blows me away.

    And for the record, it wasn't the pseudo-cursor, the SELECT INTO or the FillFactor that got me on this one, those either I knew about and/or they didn't make much difference because the table load time was so long. What blew me away on this one was how you got the the table load time down to almost nothing.

    I tried everything that I could think of (including the Select Into & the source table index) and the table load still took 8-20 seconds on my laptop. Yours takes just over a second! And I honestly haven't a clue why.

    You're a freakin' Jedi Wizard, man.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Jeff Moden (1/3/2009)


    OK... here we go. First, I'm gonna recommend that you add the following index to the WebLogEvents table... it will NOT interfere with inserts or deletes, will not need any form of maintenance (no pages splits if inserts are always done in order by date and time) and it WILL make all date sensitive queries 30% faster (or more)...

    CREATE CLUSTERED INDEX IXC_WebLogEvents_Date_Time

    ON dbo.WebLogEvents (Date,Time) WITH FILLFACTOR = 100

    Now, based on the fact that JacRobert's said that he doesn't care whether or not the TVF is preserved or not, I went ahead and just did it as a stored procedure that returns a single result set.

    This proc returns the result set on my box in just over 6 seconds without the index above and in about 4 seconds with the recommended index above. Matt's fine code generally took between 18 and 28 seconds for the same day's testing. And, that's what I wanted to show folks... between using the Select Into (can't be done on table variables) and the awesome speed of the "pseudo cursor" (Phil Factor calls it the "quirky update"), [font="Arial Black"]you can do some trully awesome things that most people would use a cursor for.[/font]

    Here's the code and thanks for playing... as always, the details are commented in the code...

    CREATE PROCEDURE dbo.GetDailyVisitCount

    /*********************************************************************************************************************

    Purpose:

    Returns the visit count for each 5 minute period of each hour for the given day.

    Programmer notes and references:

    1. A "visit" is defined as a SessionId that hasn't been seen for over 30 minutes on the web site. So the data is

    processed sorted by SessionId and Datetime and the time between each session visit is measured to see if it is a

    new session or the same one.

    2. This sproc uses a "pseudo cursor update" which is discussed in detail in the following article:

    Solving the "Running Total" & "Ordinal Rank" Problems in SS 2k/2k5

    By Jeff Moden, 2008/01/31

    http://www.sqlservercentral.com/articles/Advanced+Querying/61716/

    Usage:

    EXEC dbo.GetDailyVisitCount @Date

    ... where @Date is anything that converts to a DATETIME data type.

    Revision History:

    Rev 00 - 03 Jan 2009 - Jeff Moden - Initial creation and test

    Reference: http://www.sqlservercentral.com/Forums/Topic625172-1373-2.aspx

    *********************************************************************************************************************/

    --===== Declare procedure I/O

    @pDate DATETIME

    AS

    --=====================================================================================================================

    -- Test Harness... uncomment this section and run from here down for on screen testing.

    -- See the final Select in this proc for addition field outputs

    --=====================================================================================================================

    --DECLARE @pDate DATETIME

    -- SET @pDate = '12/20/2008' --Must be a date that's actually in the dbo.WebLogEvents table at the time

    --=====================================================================================================================

    -- Presets

    --=====================================================================================================================

    --===== Declare and preset local variables

    -- Note... all of these variables are used with the "pseudo cursor update"

    -- to keep track of values from previous rows

    DECLARE @PrevSessionID VARCHAR(50),

    @PrevKeepDate DATETIME,

    @PrevKeep TINYINT

    SELECT @PrevSessionID = '',

    @PrevKeepDate = 0,

    @PrevKeep = 0

    --===== Ensure the date parameter is a whole date

    SELECT @pDate = DATEADD(dd,DATEDIFF(dd,0,@pDate),0)

    --===== Supress the autodisplay of rowcounts so only the result set will be returned

    SET NOCOUNT ON

    --=====================================================================================================================

    -- Copy data from the event table to a working table where we can work on it using a special index

    --=====================================================================================================================

    --===== Create and populate the working table on the fly

    SELECT CAST(SUBSTRING(Cookie,1,CHARINDEX(';',Cookie)-1) AS VARCHAR(50)) AS SessionID,

    TheDate+2.0 AS TheDate,

    CAST(0 AS TINYINT) AS KeepMe,

    Date AS DateOnly

    INTO #MyHead

    FROM (--==== Subquery does some of the calcs and isolates the start of the SessionID

    SELECT Date+Time AS TheDate,

    SUBSTRING([cs(Cookie)],CHARINDEX('ASP.NET_SessionId=',[cs(Cookie)])+18,8000)+';' AS Cookie,

    Date

    FROM dbo.WebLogEvents

    WHERE Date = @pDate

    OR (Date = @pDate-1 AND Time >= '1899-12-30 23:30')

    )p1

    --===== Add the necessary index to support the "pseudo cursor" update

    CREATE CLUSTERED INDEX IXC_#MyHead_SessionID_TheDate

    ON #MyHead (SessionID,TheDate) WITH FILLFACTOR = 100

    --=====================================================================================================================

    -- Mark "visits" according to the rules in "Programmer's notes #1" above

    --=====================================================================================================================

    --===== Do the "pseudo cursor" update to determine which rows to keep

    UPDATE #MyHead

    SET @PrevKeep = KeepMe = CASE WHEN SessionID = @PrevSessionID

    AND DATEDIFF(mi,@PrevKeepDate,TheDate)>30

    THEN 1

    WHEN SessionID <> @PrevSessionID

    THEN 1

    ELSE 0

    END,

    @PrevKeepDate = TheDate,

    @PrevSessionID = SessionID

    FROM #MyHead WITH(INDEX(0)) --Forces a clustered index scan to maintain the update order

    --=====================================================================================================================

    -- Return the required output as a single result set

    --=====================================================================================================================

    --===== Return the results

    SELECT @pDate AS Date,

    DATEPART(hh,TheDate) AS [Hour],

    DATEPART(mi,TheDate)/5 AS [FiveMinute],

    COUNT(*) AS TheCount

    FROM #MyHead

    WHERE KeepMe = 1

    AND DateOnly = @pDate

    GROUP BY DATEPART(hh,TheDate),

    DATEPART(mi,TheDate)/5

    --=====================================================================================================================

    -- Test Harness... uncomment this section for on screen testing.

    --=====================================================================================================================

    -- SELECT SessionID,

    -- TheDate,

    -- DateOnly,

    -- KeepMe,

    -- DATEPART(hh,TheDate) AS [Hour],

    -- DATEPART(mi,TheDate)/5 AS [FiveMinute]

    -- FROM #MyHead

    -- WHERE KeepMe = 1

    -- ORDER BY SessionID, TheDate

    GO

    Jeff, That's a neat way of doing it. I hadn't read up on setting working variables through update statement process so that's new to me. In fact combined with a CASE statement in the sql it's almost like using a cursor, it could be argued that this isn't set based logic, though I've no doubt it runs fast which is the aim of this excercise!

    I will test both your method and Matt's on Monday, time permitting, on some real data (10 million rows per day). I think at the moment the cursor driven procedure takes about 20 minutes to complete, a full table scan takes about 4 minutes.

    I take is that @PrevKeep variable is redundant as it doesn't seem to be used?

    Regarding adding the index on WeblogEvents table it would impact insert performance as there are 4 web servers each one generates a weblog file every hour which are imported into the database, the rows within each file are datetime ordered but there are 4 files all for the same datetime range so inserts are not always done in date and time order.

  • Barry, coming from the likes of you, that's a totally awesome compliment. Thanks, ol' friend.

    Just a hint about speed... I've found that SQL Server and I both have a couple of things in common... we're both lazy and neither of us can remember more than a couple of things at a time. 😛 I knew the split code to dig the SessionID out of the cookie was going to require multiple CharIndexes (ie, more than 2) and a whole bunch of other hooie if I tried to do it all in the same Select... WAY too much typing and thinking for me... it was shorter to do two Selects... one to find the beginning of the SessionID as a derived table and then a very simple outer select to get rid of the end. The outer select was also simple enough so I could do the mid-air conversion from VARCHAR(2048) to a nice little VARCHAR(50) in the process. It's "Divide and Conquer" at it's best, it usually works the best, requires the least amount of thinking, and usually requires the least amount of code. And, if you think about it... that's the same kind of stuff they use CTE's for. I just used the older technology so this code would work in 2k or 2k5.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • jacroberts (1/3/2009)


    Jeff, That's a neat way of doing it. I hadn't read up on setting working variables through update statement process so that's new to me.

    I will test both your method and Matt's on Monday, time permitting, on some real data (10 million rows per day). I think at the moment the cursor driven procedure takes about 20 minutes to complete, a full table scan takes about 4 minutes.

    Read more about the method, here... many of us, including Phil Factor, have been using the method for many, many different things. But, if you don't follow the rules with the clustered index on the temp table or the index hint on the FROM temp table, you can and will get incorrect answers. Please take the time to study and try the examples in the following article before you try one of these bad boys on your own... there are some ver specific rules to follow...

    [font="Arial Black"]Solving the "Running Total" & "Ordinal Rank" Problems in SS 2k/2k5[/font][/url]

    I take is that @PrevKeep variable is redundant as it doesn't seem to be used?

    BWAA-HAAA!!! Well, at least I know someone read the code. Yes sir, I made a final change to the code and forgot to delete it from the declaration section of the code.

    Regarding adding the index on WeblogEvents table it would impact insert performance as there are 4 web servers each one generates a weblog file every hour which are imported into the database, the rows within each file are datetime ordered but there are 4 files all for the same datetime range so inserts are not always done in date and time order.

    Ah... understood... I thought it was a single feed. You're absolutely correct about the clustered index on overlapping feeds. Thanks for the feedback.

    Just don't delete the clustered index from the temp table in the stored proc... that one is critical to the "pseudo cursor".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (1/3/2009)


    Just a hint about speed... I've found that SQL Server and I both have a couple of things in common... we're both lazy and neither of us can remember more than a couple of things at a time. 😛 I knew the split code to dig the SessionID out of the cookie was going to require multiple CharIndexes (ie, more than 2) and a whole bunch of other hooie if I tried to do it all in the same Select...

    Thanks, that does make sense. It just didn't occur to me that that was the source of the slowness.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

Viewing 15 posts - 121 through 135 (of 272 total)

You must be logged in to reply to this topic. Login to reply