Help writing a SQL

  • I have a situation described below - let me know if more information is needed.

    create table multrecs (

    acct bigint,

    cdt smalldatetime,

    val varchar(20),

    ndt smalldatetime

    )

    insert into multrecs values (702, '2009-04-06', 'tstval1', '2009-04-09')

    insert into multrecs values (702, '2009-04-06', 'tstval1', '2009-04-15')

    insert into multrecs values (702, '2009-04-06', 'tstval1', '2009-04-22')

    insert into multrecs values (702, '2009-04-06', 'tstval1', '2009-05-26')

    insert into multrecs values (802, '2009-01-01', 'tstval2', '2009-01-04')

    insert into multrecs values (802, '2009-01-01', 'tstval2', '2009-01-15')

    insert into multrecs values (802, '2009-01-01', 'tstval2', '2009-03-22')

    insert into multrecs values (802, '2009-01-01', 'tstval2', '2009-03-30')

    insert into multrecs values (802, '2009-01-01', 'tstval2', '2009-05-15')

    insert into multrecs values (902, '2009-03-01', 'tstval3', '2009-03-04')

    insert into multrecs values (902, '2009-03-01', 'tstval3', '2009-04-14')

    insert into multrecs values (902, '2009-03-01', 'tstval3', '2009-05-22')

    insert into multrecs values (602, '2009-02-01', 'tstval4', '2009-02-04')

    insert into multrecs values (602, '2009-02-01', 'tstval4', '2009-04-04')

    insert into multrecs values (602, '2009-02-01', 'tstval4', '2009-06-02')

    insert into multrecs values (602, '2009-02-01', 'tstval4', '2009-06-10')

    I need to pull the records for each ACCT, CDT, VAL combination

    such that I get the record that has minimum NDT and then

    the next record whch is 38 days after it and then

    the next record which is 38 days after it and so on

    so for the first set above I need

    702, '2009-04-06', 'tstval1', '2009-04-09' the minimum NDT

    (702, '2009-04-06', 'tstval1', '2009-05-26') the one 38 days after it

    For the second combo

    (802, '2009-01-01', 'tstval2', '2009-01-04') The minimum NDT

    (802, '2009-01-01', 'tstval2', '2009-03-22') 38 days after first one

    (802, '2009-01-01', 'tstval2', '2009-05-15') 38 days after one above - 03/22/09

    For the third combo

    I will need all three - as they are 38 days apart

    For the fourth

    I will need first three

    Any help will be greatly appreciated.

  • Would you mind sharing what you've tried so far and what you have trouble with?

    Btw: strange requirement though.... what's the business reason behind it?



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • SQLMAIN, here is a temp table solution. It works by determining the previous ndt date for each record in the group, then keeping a running sum of the interval between the ndt and the previous ndt. It is powered by Jeff Moden's running sum solution using the "quirky update". Admitedly, I don't fully grasp how it works exactly, so when I am using it, I have to add columns to the temp table for each of the variables I use, and populate the columns with the current value of each variable at various point for each record. That way, I can move stuff around, and see how it affects the value of each variable, and when in the code it does it. Anyhow, I think this works as per your specs, but there are no guarantees of any data beyond that. Also, I tried to to all the updates in one fell swoop, but I could not get it to work.

    I can't find the link to Jeff's article, I have it on my work pc, but I am at home. Maybe someone could post it?

    IF OBJECT_ID('TempDB..#multrecs','u') IS NOT NULL

    DROP TABLE #multrecs

    SELECT

    ID = IDENTITY(INT,1,1),

    acct,

    cdt,

    val,

    ndt

    INTO #multrecs

    FROM multrecs

    ORDER BY acct,cdt,val,ndt

    --Add columns for later use

    ALTER TABLE #multrecs

    ADD MIN_ndt BIT,

    PREV_NDT SMALLDATETIME,

    INTERVAL INT,

    INTERVAL_RUN INT

    --Start by setting the MIN_ndt flag for the lowest ndt for each acct

    UPDATE #multrecs

    SET MIN_ndt = CASE WHEN t1.acct IS NOT NULL THEN 1 ELSE 0 END

    FROM #multrecs m LEFT OUTER JOIN

    (

    SELECT

    acct,

    cdt,

    val,

    MIN_ndt = MIN(ndt)

    FROM multrecs

    GROUP BY acct,cdt,val

    ) t1

    ON m.acct = t1.acct

    AND m.cdt = t1.cdt

    AND m.val = t1.val

    AND m.ndt = t1.MIN_ndt

    --Set the prev_ndt field = MIN_ndt for the first of each group

    --I am assuming if the ndt is lowest, there are no previous.

    --It will be use later for a datediff calculation to check for

    --the 38 day interval.

    UPDATE m

    SET PREV_NDT = ndt

    FROM #multrecs m

    WHERE MIN_ndt = 1

    --Declare the local variables for the updates

    DECLARE @acct VARCHAR(20)

    DECLARE @cdt SMALLDATETIME

    DECLARE @ndt SMALLDATETIME

    DECLARE @val VARCHAR(20)

    DECLARE @prev_ndt SMALLDATETIME

    DECLARE @interval_run INT

    --First, populate the prev_ndt field for the rest of the records

    UPDATE #multrecs

    SET @prev_ndt =

    CASE WHEN @acct = acct

    AND @cdt = cdt

    AND @val = val

    AND prev_ndt IS NULL

    THEN @ndt

    ELSE prev_ndt

    END,

    prev_ndt = @prev_ndt,

    @acct = acct,

    @cdt = cdt,

    @val = val,

    @ndt = ndt

    FROM #multrecs

    --Now populate the interval field, which will tell you the interval

    --between the ndt and the previous ndt

    UPDATE m

    SET interval = DATEDIFF(d,prev_ndt,ndt),

    interval_run = 0

    FROM #multrecs m

    --Finally, update the interval_run field. It will show 0 for the first record in each

    --acct. Then it will keep a running sum of the interval, and reset when it gets

    --to be >= 38

    UPDATE m

    SET @interval_run = interval_run =

    CASE WHEN acct = @acct AND @interval_run = 38

    ORDER BY acct,cdt,val,ndt

    Greg
    _________________________________________________________________________________________________
    The glass is at one half capacity: nothing more, nothing less.

  • Here's the link to the article I referenced. http://www.sqlservercentral.com/articles/Advanced+Querying/61716/. However, it seems to be under construction now, but I am fairly certain it described pretty well how the quirky update works. Any how, the test scripts are still there, but be sure to be aware of Jeff's caveats.

    Greg
    _________________________________________________________________________________________________
    The glass is at one half capacity: nothing more, nothing less.

  • Greg Snidow (9/17/2009)


    Admitedly, I don't fully grasp how it works exactly

    The article is under construction atm, but I've prettymuch got it memorized :hehe:. The whole theory works off of using the clustered index to update the table. You *must* both create and specify this clustered index in the order that the rows should be read. It will usually work fine regardless, but without specifying the index, it *could* fail.

    Greg Snidow (9/17/2009)


    Also, I tried to to all the updates in one fell swoop, but I could not get it to work.

    Here's an example of it all in one pass. (I inserted the data into a temp table #multrecs instead of his default multrecs)

    Ugh, this thing murders my formatting :crying:

    -------------------- Solution -----------------------------

    ALTER TABLE #multrecs ADD Display bit

    CREATE CLUSTERED INDEX Quirky ON #multrecs (acct,cdt,val,ndt) -- <----- Note this clustered index. Very important.

    DECLARE @PrevAcctbigint,

    @PrevCDTdatetime,

    @PrevValvarchar(20),

    @LDNDTdatetime,--Last Displayed NDT

    @Displaybit-- Even though you don't use it, this won't function without the 3 part update

    UPDATE #multrecs

    SET @Display = Display = CASE WHENAcct = @PrevAcct

    AND CDT = @PrevCDT

    AND Val = @PrevVal

    AND DATEDIFF(d,@LDNDT,NDT) < 38

    THEN 0

    ELSE 1

    END,

    @LDNDT = CASE WHENAcct = @PrevAcct

    AND CDT = @PrevCDT

    AND Val = @PrevVal

    AND DATEDIFF(d,@LDNDT,NDT) < 38

    THEN @LDNDT

    ELSE NDT

    END,

    @PrevAcct = Acct,

    @PrevCDT = CDT,

    @PrevVal = Val

    FROM #Multrecs WITH (INDEX(0))

    SELECT * FROM #MultRecs

    WHERE Display = 1

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • Garadin the whole theory works off of using the clustered index to update the table. You *must* both create and specify this clustered index in the order that the rows should be read.

    The index! I knew I forgot something crutial to its working properly. I knew I had to specify the order on the insert into the table, but completely forgot about the blooming index. Thanks for the insight.

    Greg
    _________________________________________________________________________________________________
    The glass is at one half capacity: nothing more, nothing less.

  • It's ok. Everybody forgets the index :-P.

    I'm still looking for another way to do iterative logic that can compare with this method. I've heard CLR functions can beat it in some instances(per Adam Machanic), but the next fastest method for something like this that I know of is just a straight up loop/cursor.

    If anyone reading this knows of a high performance method other than CLR/Quirky Update/Cursor/Loop to do iterative logic like this, I'd love to hear about it.

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • Ok, this might be a dumb question, but if I specify the physical order of the records on the insert, then why is the index so vital? If I select * from this table a billion times, is there ever a chance that it will return the records in any other order than the physical order, in the absence of an index? Is it that without the index, the update statement does not necessarily run by physical order? Sorry for asking so many questions on this one, but this little baby always intrigues me, and I always look for chances to use it. I can always manage to get it to work by trying everything until it does, but I still only "get it" just a little.

    Greg
    _________________________________________________________________________________________________
    The glass is at one half capacity: nothing more, nothing less.

  • I did try the solution given above (with clustered index) but somehow it was not working for me.

    I am not on my machine right now and do not have exactly what I was trying - but I will post the one I wrote and will appreciate if someone can point out the problem with my SQL.

    I finally ended up doing a while loop - picking up MIN date and then the {{next which is 38 days or more after the one picked last for the group until no more left for the group}}

    {{}} this statement above made my while loop.

    Thanks for the solutions. And I will post my clustered index SQL for your expert perusal.

  • First, apologies for not having the rewrite on the running total article done, yet.

    Part of the reason it's not done yet is I've been testing the heck out of some very large examples. In the process of doing that, I've determined that specifying the index hint is absolutely NOT necessary on single table Quirky Updates. You do, of course, need a clustered index for the update to follow and it must use a 3 part update as well as following some other simple rules. Another of those rules is that it must be a single table update. By that I mean, no joins... period.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Garadin (9/17/2009)


    It's ok. Everybody forgets the index :-P.

    I'm still looking for another way to do iterative logic that can compare with this method. I've heard CLR functions can beat it in some instances(per Adam Machanic), but the next fastest method for something like this that I know of is just a straight up loop/cursor.

    If anyone reading this knows of a high performance method other than CLR/Quirky Update/Cursor/Loop to do iterative logic like this, I'd love to hear about it.

    It depends... if it's a running total, even CLR's have a tough time catching the Quirky Update because they have to go through an additional layer of API's where the Quirky Update doesn't. Cursors just don't stand a chance. There are some "data smears" (ie. special type of grouped numbering) that can be resolved using a double row number that will beat the Quirky Update.

    I'm actually splitting up the article to cover a lot of that. I may even do a comparison against the CLR method if I can find someone to write a good one... I'm just not a C programmer.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Seth,

    I haven't executed your good code but it looks right. My only suggestion would be to add OPTION (MAXDOP 1) to the update to prevent any parallelism which will really mess up the works.

    The other thing is, I know I've recommended the used of WITH(INDEX(0)) in the past just to be on the safe side. In the last couple of months of testing, I've convinced myself that it's absolutely not necessary and considering that it will make a 3 second run on a million rows take 28 seconds instead, it's not worth it. I wouldn't use it any more. The important parts are the way you've used both the 3 part updates and the 2 part updates. That's the big key to making it work everytime.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Greg Snidow (9/17/2009)


    Ok, this might be a dumb question, but if I specify the physical order of the records on the insert, then why is the index so vital? If I select * from this table a billion times, is there ever a chance that it will return the records in any other order than the physical order, in the absence of an index? Is it that without the index, the update statement does not necessarily run by physical order? Sorry for asking so many questions on this one, but this little baby always intrigues me, and I always look for chances to use it. I can always manage to get it to work by trying everything until it does, but I still only "get it" just a little.

    [Edit] Blast. [/Edit]

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • Jeff Moden (9/17/2009)


    Seth,

    I haven't executed your good code but it looks right. My only suggestion would be to add OPTION (MAXDOP 1) to the update to prevent any parallelism which will really mess up the works.

    The other thing is, I know I've recommended the used of WITH(INDEX(0)) in the past just to be on the safe side. In the last couple of months of testing, I've convinced myself that it's absolutely not necessary and considering that it will make a 3 second run on a million rows take 28 seconds instead, it's not worth it. I wouldn't use it any more. The important parts are the way you've used both the 3 part updates and the 2 part updates. That's the big key to making it work everytime.

    Interesting. I had no idea it slowed it down. Ok. I edited out that huge paragraph of reasoning above because you just made it all irrelevant with one sentence =).

    I'll start using the MAXDOP option immediately and scratch the index.

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • Jeff Moden (9/17/2009)


    Garadin (9/17/2009)


    It's ok. Everybody forgets the index :-P.

    I'm still looking for another way to do iterative logic that can compare with this method. I've heard CLR functions can beat it in some instances(per Adam Machanic), but the next fastest method for something like this that I know of is just a straight up loop/cursor.

    If anyone reading this knows of a high performance method other than CLR/Quirky Update/Cursor/Loop to do iterative logic like this, I'd love to hear about it.

    It depends... if it's a running total, even CLR's have a tough time catching the Quirky Update because they have to go through an additional layer of API's where the Quirky Update doesn't. Cursors just don't stand a chance. There are some "data smears" (ie. special type of grouped numbering) that can be resolved using a double row number that will beat the Quirky Update.

    I'm actually splitting up the article to cover a lot of that. I may even do a comparison against the CLR method if I can find someone to write a good one... I'm just not a C programmer.

    The CLR examples I've seen only really start to win at a very large number of rows. As I said though, I've only "heard" this, I have zero experience with CLR's.

    Cursors stand no chance of beating this of course, but they're not as badly beaten as I'd have expected, and they're light years ahead of most "set based" solutions.

    The data smears I'm not familiar with (At least I don't think I am, the double rownumber thing sounds familiar though.) but I'm very interested to find out. I think I've been close to using something very similar in trying to figure out a way to do this in a single select query, but I've just never *quite* gotten it. Every time I come to one of these, my first instinct is to use this quirky update, as it's one of my favorites, but I always try to think about a different way to do it first, as I sometimes find myself applying a single technique to a bunch of things it's not really needed for :cool:.

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

Viewing 15 posts - 1 through 15 (of 70 total)

You must be logged in to reply to this topic. Login to reply