Exploring Recursive CTEs by Example

  • Peter H (7/17/2012)


    Great article, thanks. I really enjoyed going through that.

    I am looking for CTE's that will give me formulae for different predictive analysis routines.

    here is a slight coding problem, the first select in the following code should not have the "Union".

    -- Add an additional origin and destination node

    INSERT INTO #Edges

    UNION ALL SELECT '0', 'N', 15 -- ABS(CHECKSUM(NEWID())) % 100 + 25

    Peter - Another nice catch! Thanks for the correction.

    Hopefully its inclusion hasn't thrown too many people far off the track.

    I am happy to notice that someone is actually taking the time to try running some of my examples! 😀


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • SQL PROCtologist (7/17/2012)


    FINALLY!!! Someone who loves burning up the CPU as much as I do! 😀

    As a token of gratitude, feel free to add my recursive string parser to your excursions into recursions. 🙂

    [font="Courier New"]create function [sql].[StringToTable](

    @string nvarchar(max),

    @delimiter nvarchar(1))

    returns table

    /* t@sqlPROCtologist.com

    Usage:

    declare @STR nvarchar(max)='a,bc,def,ghij,klmno,pqrstu,vwxyz,0123,456,78,9';

    select * from [sql].[StringToTable](@str,',') option(maxrecursion 32767);

    */

    as return(

    with cte as(

    select

    convert(int,1) as [start],

    charindex(@delimiter, @string+@delimiter,1) as [end],

    substring(@string+@delimiter,1,charindex(@delimiter,@string+@delimiter,1)) as [string]

    union all select

    convert(int,[end]+1) as [start],

    charindex(@delimiter,@string+@delimiter,[end]+1) as [end],

    substring(@string+@delimiter,[end]+1,charindex(@delimiter,@string+@delimiter,[end]+1)-[end]) as [string]

    from cte where [end]+1<=len(@string+@delimiter)

    )

    select left(string,len(string)-1) as [String] from cte);

    go

    [/font]

    Procto - I come from a long line of CPU burners! My roots were in numerically intensive computing applications.

    That's not why I love rCTEs though. Not sure love is even the right word here. I just find that they add other solution opportunities that might be overlooked if people don't understand them well. I know I didn't when I started writing this article. I'm now better with them, but I'm sure there are others out there that are way better.

    And thanks for the excellent contribution! People need examples they can draw on to apply to other circumstances. Gathering many in one spot has got to be useful to someone.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Surely it's the Ark of the Covenant?

  • Nice Article. I am really impressed. Keep up the good work!!

  • Holy One (7/18/2012)


    Surely it's the Ark of the Covenant?

    I agree that the Ark of the Covenant should have surely been included in my haul. Alas, my fence didn't think he could unload it.

    Thanks to you Holy One and also to Krtyknm for stopping by and offering your opinions!


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Please help how to solve the problem below using recursion..

    My problem here is how to identify and select a patients service date. A service date could be admission date(hospital), discharge date(hospital), CBC taken date, CBC result date, xray taken date, xray result read date etc.. . Each date is paired and are reconciled thru a similar account number. lets say an admission date is always paired with discharge date and so both dates will have similar account number, so is CBC taken date paired with CBC result date. This might happen in the same day but the time will be different. However we are receiving the data in a format where we don't know what kind of service was performed, we only have the service date. so we are assuming that a patients earliest date is an admission date and whatever date that has a similar account number is the discharge date. I want to select all patients record that contains admission date and discharge date base on the assumption that the earliest date is an admission date and the other date on that accountnumber is the discharge date. whatever dates that fall in between the admission date and discharge date will be ignored. However a patient could be admitted and discharge from the hospital multiple times a year. so the earliest date after the first discharge date will also be considered as a 2nd admission date.. so on and so forth the cycle continues..

    example data..

    memberid, accountnum, servicedate

    1 , ABC11, 10/24/2013

    1 , ABC12, 10/26/2013

    1 , ABC13, 10/28/2013

    1 , ABC12, 10/30/2013

    1 , ABC13, 11/2/2013

    1 , ABC11, 11/5/2013

    1 , ABC14, 11/30/2013

    1 , ABC15, 12/1/2013

    1 , ABC16, 12/3/2013

    1 , ABC17, 12/8/2013

    1 , ABC17, 12/10/2013

    1 , ABC16, 12/9/2013

    1 , ABC15, 12/12/2013

    1 , ABC14, 12/11/2013

    in these sample data the expected data to be return is:

    memberid, accountnumber, servicedate

    1, ABC11, 10/24/2013

    1, ABC11, 11/5/2013

    1, ABC14, 11/30/2013

    1, ABC14, 12/11/2013

    please help me . is this type of logic complex and be better implemented as a SQL CLR or recursive cte or tally table.. I could only manage to recover the first admission date and discharge date for each member...

    ;with cte1 as (

    select memberid, acctnum, servicedate,

    member_earliestdate = ROW_NUMBER()over(partition by memberid order by servicedate asc)

    from patient_admitlog)

    select * from

    (select * from cte1 where member_earliestdate = 1)t1 join

    cte1 on t1.acctnum = cte1. acctnum and t1.memberid = cte1.memberid

  • There is a problem in your problem definition.

    " whatever dates that fall in between the admission date and discharge date will be ignored. However a patient could be admitted and discharge from the hospital multiple times a year. so the earliest date after the first discharge date will also be considered as a 2nd admission date.. so on and so forth the cycle continues.."

    Unless we know the account code for the admission / discharge pair, how do we know what dates to ignore?

  • As ive said , the earliest service date is always assumed as an admission date, now that admission date will have a discharge date pair and you can find it by the account number since they would have similar account number... and any dates that fall in between the admission date and discharge date will be ignored.. got it?

  • Jeztagab (7/18/2014)


    Please help how to solve the problem below using recursion..

    My problem here is how to identify and select a patients service date. A service date could be admission date(hospital), discharge date(hospital), CBC taken date, CBC result date, xray taken date, xray result read date etc.. . Each date is paired and are reconciled thru a similar account number. lets say an admission date is always paired with discharge date and so both dates will have similar account number, so is CBC taken date paired with CBC result date. This might happen in the same day but the time will be different. However we are receiving the data in a format where we don't know what kind of service was performed, we only have the service date. so we are assuming that a patients earliest date is an admission date and whatever date that has a similar account number is the discharge date. I want to select all patients record that contains admission date and discharge date base on the assumption that the earliest date is an admission date and the other date on that accountnumber is the discharge date. whatever dates that fall in between the admission date and discharge date will be ignored. However a patient could be admitted and discharge from the hospital multiple times a year. so the earliest date after the first discharge date will also be considered as a 2nd admission date.. so on and so forth the cycle continues..

    example data..

    memberid, accountnum, servicedate

    1 , ABC11, 10/24/2013

    1 , ABC12, 10/26/2013

    1 , ABC13, 10/28/2013

    1 , ABC12, 10/30/2013

    1 , ABC13, 11/2/2013

    1 , ABC11, 11/5/2013

    1 , ABC14, 11/30/2013

    1 , ABC15, 12/1/2013

    1 , ABC16, 12/3/2013

    1 , ABC17, 12/8/2013

    1 , ABC17, 12/10/2013

    1 , ABC16, 12/9/2013

    1 , ABC15, 12/12/2013

    1 , ABC14, 12/11/2013

    in these sample data the expected data to be return is:

    memberid, accountnumber, servicedate

    1, ABC11, 10/24/2013

    1, ABC11, 11/5/2013

    1, ABC14, 11/30/2013

    1, ABC14, 12/11/2013

    please help me . is this type of logic complex and be better implemented as a SQL CLR or recursive cte or tally table.. I could only manage to recover the first admission date and discharge date for each member...

    ;with cte1 as (

    select memberid, acctnum, servicedate,

    member_earliestdate = ROW_NUMBER()over(partition by memberid order by servicedate asc)

    from patient_admitlog)

    select * from

    (select * from cte1 where member_earliestdate = 1)t1 join

    cte1 on t1.acctnum = cte1. acctnum and t1.memberid = cte1.memberid

    Let me start by mentioning a couple of points:

    1. This is not a forum for seeking help. SSC does have such forums. Technically this is a thread to discuss the article on Recursive CTEs.

    2. This does not sound to me like a rCTE is required to solve it, however see point #3.

    3. Your requirements sound inconsistent to me.

    Jeztagab (7/18/2014)


    so we are assuming that a patients earliest date is an admission date and whatever date that has a similar account number is the discharge date. I want to select all patients record that contains admission date and discharge date base on the assumption that the earliest date is an admission date and the other date on that accountnumber is the discharge date

    OK. I understand you want to pair the first date of an account number with the last. But I don't see that pairing in your results set (looks like just the min and max dates are chosen for the returned rows). Easy to do without a rCTE.

    Jeztagab (7/18/2014)


    whatever dates that fall in between the admission date and discharge date will be ignored. However a patient could be admitted and discharge from the hospital multiple times a year. so the earliest date after the first discharge date will also be considered as a 2nd admission date

    So which is it? Throw away the in-between dates and keep only the first and last, or assume each pairing of dates represents an admission/discharge?

    You'd probably be best advised to follow my advice in point #1, thus not limiting your audience.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Some further advice to get you a quicker answer:

    1. Provide DDL (CREATE TABLE) and consumable sample data. That way you're more likely to get a working query as a response.

    2. Your sample output looks incomplete. Always provide complete sample output that corresponds to the sample data.

    3. Do not presume to guide your helpers to a solution. Saying "you need a recursive" solution doesn't help to solve the problem at all. Like I said, it doesn't appear to require a recursive solution. Suggesting an approach that didn't work for you may just lead those willing to help down a dead end.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Since it is given that an admission date is always paired with a discharge date and it can be reconciled thru its account number ; and the earliest date (service date) is always considered an admission date but a person can have multiple admission and discharge dates in a given year. Aside from admission dates and discharge dates, when a person is admitted he could have different types of lab test and treatments, when those lab test and treatments are administered they would also appear as a service date but it will fall under a certain admission-discharge date, but since we dont need those lab test & treatment dates so Im excluding those data..

    the data below is an example the Patient with memberid = 1 has been admitted and discharge from the hospital 2x for year 2013. all the other dates are considered as lab & treatment dates which fall inside teh 1st admission or 2nd admission. you got it? or am i still not clear enough?

    memberid, accountnum, servicedate

    1 , ABC11, 10/24/2013

    1 , ABC12, 10/26/2013

    1 , ABC13, 10/28/2013

    1 , ABC12, 10/30/2013

    1 , ABC13, 11/2/2013

    1 , ABC11, 11/5/2013

    1 , ABC14, 11/30/2013

    1 , ABC15, 12/1/2013

    1 , ABC16, 12/3/2013

    1 , ABC17, 12/8/2013

    1 , ABC17, 12/10/2013

    1 , ABC16, 12/9/2013

    1 , ABC15, 12/12/2013

    1 , ABC14, 12/11/2013

    in these sample data the expected data to be return is:

    memberid, accountnumber, servicedate

    1, ABC11, 10/24/2013

    1, ABC11, 11/5/2013

    1, ABC14, 11/30/2013

    1, ABC14, 12/11/2013

    sample data below:

    create table test_01

    (

    memberid int,

    acctnum varchar(100),

    servicedate datetime,

    )

    insert test_01

    SELECT 1 , 'ABC11', '10/24/2013' UNION ALL

    SELECT 1 , 'ABC12', '10/26/2013' UNION ALL

    SELECT 1 , 'ABC13', '10/28/2013' UNION ALL

    SELECT 1 , 'ABC12', '10/30/2013' UNION ALL

    SELECT 1 , 'ABC13', '11/2/2013' UNION ALL

    SELECT 1 , 'ABC11', '11/5/2013' UNION ALL

    SELECT 1 , 'ABC14', '11/30/2013' UNION ALL

    SELECT 1 , 'ABC15', '12/1/2013' UNION ALL

    SELECT 1 , 'ABC16', '12/3/2013' UNION ALL

    SELECT 1 , 'ABC17', '12/8/2013' UNION ALL

    SELECT 1 , 'ABC17', '12/10/2013' UNION ALL

    SELECT 1 , 'ABC16', '12/9/2013' UNION ALL

    SELECT 1 , 'ABC15', '12/12/2013' UNION ALL

    SELECT 1 , 'ABC14', '12/11/2013'

  • Jeztagab (7/21/2014)


    the data below is an example the Patient with memberid = 1 has been admitted and discharge from the hospital 2x for year 2013. all the other dates are considered as lab & treatment dates which fall inside teh 1st admission or 2nd admission. you got it? or am i still not clear enough?

    ABC15 on 12/12 falls outside of the ABC14 admission (or in other words after the ABC14 discharge on 12/11). Please explain (because your statement which I noted in bold above seems to indicate that it should not).

    Can you assume that each admission (prior to a discharge) is followed by at least one treatment that starts and ends?

    SQL 2008 or SQL 2012?


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • its ok i Got it. I was just trying to simplify the real scenario. In fact the real scenario, can have multiple lab & treatment dates and some of the lab dates will have an ENDDATE that extends beyond the discharge date .. but its ok i got it.. thanks for tying to help

  • CTE's and table variables have their place but beware of performance, too many people fall in love with this stuff without understanding how destructive they can be with large data sets in an OLTP. I have replaced both with #temp tables which people make fun of until you see the results, in some cases 90% more efficient. Most of it seems to be on the MS side, their is an issue with how the engine analyzes query plan, in one case I went from 27K logical reads to 10, hopefully in 2014 they will start fixing this.

  • Very, very cool article. Thanks for writing it.

Viewing 15 posts - 16 through 30 (of 42 total)

You must be logged in to reply to this topic. Login to reply