Help writing a SQL

  • Greg Snidow (9/17/2009)


    Ok, this might be a dumb question, but if I specify the physical order of the records on the insert, then why is the index so vital? If I select * from this table a billion times, is there ever a chance that it will return the records in any other order than the physical order, in the absence of an index? Is it that without the index, the update statement does not necessarily run by physical order? Sorry for asking so many questions on this one, but this little baby always intrigues me, and I always look for chances to use it. I can always manage to get it to work by trying everything until it does, but I still only "get it" just a little.

    SELECT operates differently than UPDATE does and you cannot trust any natural or index order with SELECT. If you want to guarantee SELECT will return something in the correctly, you MUST use ORDER BY even if the rows were originally inserted using an ORDER BY.

    The 3 part Quirky Update, when properly formed with the necessary clustered index and a couple of other things including the prevention of parallelism, will always do an update in the same order as the clustered index even if you fool around with CTE's to try to force an actual ORDER BY.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (9/17/2009)


    Another of those rules is that it must be a single table update. By that I mean, no joins... period.

    That was going to be my next question. I tried doing it without the temp table, and failed miserably. Now I know. Thanks, Jeff.

    Greg
    _________________________________________________________________________________________________
    The glass is at one half capacity: nothing more, nothing less.

  • Heh... I'm actually worse... I just don't let people put CLR's in my systems. It's not that I'm against them... it's that they're not worth having to go through two different languages and maintaining source control for both the source code and the DLL especially since the T-SQL solutions frequently beat them for speed.

    Yes, there are a few places where a CLR will beat T-SQL... RegEx replace is one of them and, if done correctly, splitting strings is another. But the differences in performance are so relatively small that I just can't justify the use of CLR's.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Garadin (9/17/2009)


    If anyone reading this knows of a high performance method other than CLR/Quirky Update/Cursor/Loop to do iterative logic like this, I'd love to hear about it.

    How about a recursive CTE ? I think this has the correct output , if not let me know.

    Also i would be very interested in how this performs against a large record set

    using this index

    CREATE CLUSTERED INDEX idxmultirecs ON multrecs (acct,ndt)

    I see index seeks all the way through so it *should* be pretty damn fast

    with cteFirstDates(acct,ndt)

    as(

    Select acct,min(ndt)

    from multrecs

    group by acct

    )

    ,

    CteRecurs(acct,ndt)

    as

    (

    Select multrecs.acct,multrecs.ndt

    from cteFirstDates

    join multrecs

    on multrecs.acct = cteFirstDates.acct

    and multrecs.ndt = cteFirstDates.ndt

    union all

    Select acct,ndt

    from (

    Select multrecs.acct,multrecs.ndt,rown =row_number() over (partition by multrecs.acct order by multrecs.ndt)

    from CteRecurs

    join multrecs

    on multrecs.acct = CteRecurs.acct

    and multrecs.ndt >= CteRecurs.ndt + 38

    ) as x

    where rown = 1

    )

    select * from CteRecurs

    order by acct,ndt



    Clear Sky SQL
    My Blog[/url]

  • That works but be careful... recursive CTE's are as slow as a While Loop in most cases and are actually a form of RBAR.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (9/18/2009)


    That works but be careful... recursive CTE's are as slow as a While Loop in most cases and are actually a form of RBAR.

    1st off apologies to the OP only just noticed what forum this is in :doze:

    I agree that it isnt a perfect set based solution , but any set where you are wanting to include / exclude rows ( and to a certain extent the row data)

    based upon data from previous rows will have to be RBAR somewhere.

    My 'gut' feeling is it should be light on the system than a quirky update though.



    Clear Sky SQL
    My Blog[/url]

  • Dave Ballantyne (9/18/2009)


    My 'gut' feeling is it should be light on the system than a quirky update though.

    My gut feeling is that you're wrong :-D. But that's what we have testing for. I'd test it, but first I have to make up a couple million rows of suitable test data. Got any scripts for that lying about by chance?

    Also, thank you for posting another method for this. Although it is a 2005+ only method, and is technically RBAR, it's one more thing to keep in mind when coming to these iterative logic problems, and is another use(although whether or not it is a 'good' use remains to be seen) for a recursive CTE.

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • Your wish is my command 🙂

    Happy to be proved wrong , evidence either way is good in my book.

    You just need to manually update the acct value ,

    and keep inserting values till happy 😀

    also note that this will produce increments of between 1 and 11 days

    so adjust your update to be +4 (or something else rather than the specified 38)

    Create View vwnewId

    as

    Select new_id = newId()

    go

    drop table #datelist

    go

    Create table #datelist

    (

    numskip integer,

    date datetime

    )

    go

    delete from #datelist

    ;with ctenums(number)

    as(

    select number from

    master..spt_values

    where

    TYPE='p' and number between 0 and 400

    )

    insert into #datelist(numskip)

    Select abs(checksum(new_id))%10

    from ctenums

    cross join vwnewid

    go

    declare @Date datetime

    Select @Date = '01jan1970'

    update #datelist

    set Date=@Date,

    @Date = @Date+numskip+1

    delete from multrecs where acct = 3

    insert into multrecs (acct,cdt,val,ndt)

    select 3,'01jan1970','test',date

    from #datelist



    Clear Sky SQL
    My Blog[/url]

  • Also, with the quirk update , although you have specified index(0) are the rows guaranteed to come back in the required order ? . If you joined out to another table , things could go wrong.

    (No criticism , just observation) 🙂



    Clear Sky SQL
    My Blog[/url]

  • Dave Ballantyne (9/18/2009)


    Also, with the quirk update , although you have specified index(0) are the rows guaranteed to come back in the required order ? . If you joined out to another table , things could go wrong.

    (No criticism , just observation) 🙂

    There's a few posts earlier in the thread from Jeff directly explaining the index(0) thing and why we won't use it anymore, but the gist of it is that you are guaranteed the row order (assuming you've used the clustered index to create this order) on a single table.

    You're absolutely right that Joins present a problem, and AFAIK, you cannot use joins at all. I don't believe you can use a WHERE clause either, but I'm not sure on that. The logic for these updates should be in the case statement.

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • Dave Ballantyne (9/18/2009)


    Your wish is my command 🙂

    Happy to be proved wrong , evidence either way is good in my book.

    You just need to manually update the acct value ,

    and keep inserting values till happy 😀

    also note that this will produce increments of between 1 and 11 days

    so adjust your update to be +4 (or something else rather than the specified 38)

    Create View vwnewId

    as

    Select new_id = newId()

    go

    drop table #datelist

    go

    Create table #datelist

    (

    numskip integer,

    date datetime

    )

    go

    delete from #datelist

    ;with ctenums(number)

    as(

    select number from

    master..spt_values

    where

    TYPE='p' and number between 0 and 400

    )

    insert into #datelist(numskip)

    Select abs(checksum(new_id))%10

    from ctenums

    cross join vwnewid

    go

    declare @Date datetime

    Select @Date = '01jan1970'

    update #datelist

    set Date=@Date,

    @Date = @Date+numskip+1

    delete from multrecs where acct = 3

    insert into multrecs (acct,cdt,val,ndt)

    select 3,'01jan1970','test',date

    from #datelist

    Awesome, I appreciate the code. Let the battle of the guts commence!

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • And the results are in. It probably wasn't a fair contest to begin with. You were going up against a gut accustomed to eating 2 day old Taco Bell burritos without flinching.

    Quirky Method No DOP set, Messages:

    (2000000 row(s) affected)

    (1 row(s) affected)

    Update Done. Time Elapsed: 28 seconds

    (64202 row(s) affected)

    (1 row(s) affected)

    Select to Table Done. Total Time Elapsed: 30 seconds

    (1 row(s) affected)

    Final Row Count, Quirky Method: 64202

    Quirky Method MAXDOP 1, Messages:

    (2000000 row(s) affected)

    (1 row(s) affected)

    Update Done. Time Elapsed: 12 seconds

    (64202 row(s) affected)

    (1 row(s) affected)

    Select to Table Done. Total Time Elapsed: 15 seconds

    (1 row(s) affected)

    Final Row Count, Quirky Method:64202

    CTE Method Messages:

    (71705 row(s) affected)

    (1 row(s) affected)

    Select to Table Done. Total Time Elapsed: 170 seconds

    (1 row(s) affected)

    Final Row Count, CTE Method: 71705

    Notes:

    1. I simplified your test data script a bit. I created 2 Million rows of test data.

    2. In the event that there are two items with exactly the same date that meet the criteria, the Recursive CTE solution includes both, where as the Quirky update does not.

    3. I picked a dumb name for the Clustered index (Quirky), because it's used by both methods. Because I'm including the .sqlplans, I didn't rename it post testing.

    4. I select these into tables to take screen display times out of the equation.

    5. I haven't completely researched the difference in rows returned. I'm sure some is due to the duplicates, but I'm not sure it all is.

    6. It's possible that both methods could be optimized further. The Quirky update actually took a lot longer than I expected it to, which probably means I did something wrong.

    7. I forgot the MAXDOP the first time around. The second time I added it and re-ran. 15 seconds instead of 30.

    Test Data Setup

    IF OBJECT_ID('multrecs','u') IS NOT NULL DROP TABLE multrecs

    CREATE TABLE multrecs(

    acctint,

    ndtdatetime

    )

    DECLARE @i int,

    @Acct int

    SET @i = 0

    WHILE @i 400 records due to duplication.

    INSERT INTO multrecs(acct,ndt)

    SELECT @Acct, DATEADD(D,ABS(CHECKSUM(NEWID())%10000),'01/01/1970') -- Generates a number 0-10000 and adds this number of days to 1970. This is unsequenced data.

    FROM Tally

    WHERE N <=400

    SET @i = @i + 1

    CONTINUE

    END

    ALTER TABLE multrecs ADD display bit

    CREATE CLUSTERED INDEX Quirky ON multrecs (acct, ndt) -- Silly Name for the index.

    Quirky Update Code Used:

    -- Stop and Start SQL Server Service to completely clear memory.

    DECLARE @PrevAcctbigint,

    @LDNDT datetime, -- Last Displayed NDT

    @Display bit, -- Even though you don't use it, this method won't function without the 3 part update

    @timerdatetime,

    @FRCint

    SET @timer = GETDATE()

    UPDATE multrecs

    SET @Display = Display = CASE WHEN Acct = @PrevAcct

    AND DATEDIFF(d,@LDNDT,NDT) < 150

    THEN 0

    ELSE 1

    END,

    @LDNDT =CASE WHEN Acct = @PrevAcct

    AND DATEDIFF(d,@LDNDT,NDT) = CteRecurs.ndt + 150

    ) as x

    where rown = 1

    )

    select *

    INTO CTE

    from CteRecurs

    order by acct,ndt

    PRINT 'Select to Table Done. Total Time Elapsed: ' + CAST(DATEDIFF(s,@Timer,GETDATE()) AS varchar(10)) + ' seconds'

    SET @FRC = (SELECT COUNT(*) FROM CTE)

    PRINT 'Final Row Count, CTE Method:' + CAST(@FRC AS varchar(10))

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • Taking this one step further, if anyone wants to supply the script for a CLR function(and possibly explain to me how to use it) or any other alternate method they think might beat this, I'd love to test them.

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • Turn off the execution plan and see what you get.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (9/18/2009)


    Turn off the execution plan and see what you get.

    Without the execution plan, 12 seconds for quirky with MAXDOP 1, 151 for the RCTE. I ran them a few times each with similar results. I also ran them without stopping/starting the server, which actually made almost no difference, probably because this is a workstation with only like 2 gigs of memory.

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

Viewing 15 posts - 16 through 30 (of 70 total)

You must be logged in to reply to this topic. Login to reply