Flattening of hierarchy

  • I was wondering if anyone has any sql code samples where they have had to flatten a hierarchy of nodes without the use of a CTE.

    For example,  consider an organizational hierarchy tree with the CEO is at the top and all levels of management down to the lowest levels.

    So, if the structure of the data was like this:

    employeeid, bossid, employeename, employeetitle where the CEO would have bossid = null, but every other bossid would connect employee to the his/her boss,

    I would want the output to be:

    John Doe, CEO, ~ Jane Smith, VP Engineering ~ Jacob Crowley, Engineering Manager ~ Jen Stein, Engineer

    John Doe, CEO, ~ Don Young, VP Sales ~ Susan White, Sales Manager ~ Joyce Abram, Sales

    etc.

    Any ideas?

    Dan

     

  • This was removed by the editor as SPAM

  • Are you able to provide us with some sample data for your expected results in DDL and DML statements? That'll help us help you.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • A recursive CTE is likely to be your best option.  Why are you excluding CTEs?

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Yeah....I agree.  I didn't think we could get the resource issue fixed for running the CTE solution.  It's in Starburst, not SQL Server.  But, thank you.  This would be a real tough one today without CTEs.

  • Thom A:

    Yeah...see my other comment.  We are definitely trying to fix our resource problem in Starburst and use CTEs; however, this problem might be fun to try and solve.

    For the sample DDL I gave you above, the only other field I would add would be:

    branch: varchar(2000)

    It would contain something like this to show a branch of levels in the tree.

    John Doe, CEO ~ Jane Smith, VP Engineering ~ Jacob Crowley, Engineering Manager ~ Jen Stein, Engineer

    Thank you,

    Dan

  • danwellisch wrote:

    For the sample DDL I gave you above, the only other field I would add would be:

    branch: varchar(2000)

    What DDL? What about the DML? You havem't given any. DDL is things like CREATE and ALTER statements, and DML statements are INSERT, UPDATE, etc statements.  Provide DDL (CREATE) and DML (INSERT) statements for your data.

    Are you saying that you are storing delimited data in your database? If so that's your real problem; you need to fix that and normalise your design.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • danwellisch wrote:

    Yeah....I agree.  I didn't think we could get the resource issue fixed for running the CTE solution.  It's in Starburst, not SQL Server.  But, thank you.  This would be a real tough one today without CTEs.

    This is a SQL Server community, not Starburst, and you've posted in the SQL Server 2019 development forum; if you aren't using SQL Server you're in the wrong place I'm afraid. You'll want to use a community that is specific to Starburst, or isn't about a specific RDBMS (SQL Server in this case).

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Why did you post any DDL? I know this is not SQL Server, but you could've at least tried to help us a bit. If you have models or hierarchies using the nested sets model (if you don't know what that is. Google it), then this is trivial. If it is not you will simply imitate a linked list of a lot of recursive procedural code tracing the tree down. You start thinking, and sets. However, a lot of things like this just go away. I haven't got time to post the hundreds of words needed to explain this to you in any detail. You can get my book on treason hierarchies. If you want a lot more detail than you ever cared to see. 🙂

    Please post DDL and follow ANSI/ISO standards when asking for help. 

  • Joe Celko's Trees and Hierarchies in SQL for Smarties -- Treason hierarchies is a whole different topic 🙂

    Auto-correct did a number on your post, Joe.

     

  • To be honest, CTEs can suck real bad.  They're not so bad at Hierarchies but, still, a well written While loop will be a little bit faster than a CTE and it will use about 9 time fewer logical reads.

    The really big question though is... how often does your hierarchy actually change?  If it not changing once a second or anything crazy like that, then it would be to your huge advantage to store a hierarchy in a Nested Set form.  Key your original Adjacency list form because hierarchies are a lot easier to maintain and repair there but, for "downline" reporting, Nested Sets are the berries.

    It used to be that people would convert their Adjacency Lists to Nested Sets and then throw away the Adjacency Lists because it used to take a month of Sundays to do the conversion.  For example, it used to take about 2.5 DAYS to do such a conversion of just a million node hierarchy.

    There's no need for that anymore.  Here's the link for how to do it that was written about a decade ago.  Back then, it got that 2.5 Days time down to 54 seconds on an old 32 bit box.  On today's machines, it only takes about 19 seconds.

    Here's the link to that article.  If you know me, you just had to know that a special-purpose Tally Table would be involved.  It's uses a recursive CTE in a slight different manner to do the conversion and it's fast enough to do the rebuild (for example) once every hour or half hour.  Using a "swap'n'drop" method, you could have virtually zero down time on top of it only taking 19 seconds for a million node hierarchy.  And pay attention to the "SortPath" column... it contains the IDs for the entire "upline" for every node.  Super simple to split (fixed width 4 byte elements) and use and nasty fast for "upline" queries".

    https://www.sqlservercentral.com/articles/hierarchies-on-steroids-1-convert-an-adjacency-list-to-nested-sets

    If you have "downline Aggregations" that are needed, then the second article in that short series may be the answer.  See the following article for that.

    https://www.sqlservercentral.com/articles/hierarchies-on-steroids-2-a-replacement-for-nested-sets-calculations-1

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff and others that referred me to Nested Sets...thank you!  I did find an article on Nested Sets as well.  Jeff, I appreciate the links and explanation as well.

    Regards,

    Dan

  • Thanks for the feedback, Dan.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • an aside: Azure Synapse cannot do recursive CTEs (or heirarchyid...), if you happen to be working in that realm.

    I settled for a few layers of nested CTEs down to the expected "depth", plus a little more, for something that would've been less painful to do with recursive CTEs.  ( "find earliest  qualifying event, get records for the ensuing 12 month span from that 1st date. then, find next qualifying event date after that span, and... lather-rinse-repeat. the event dates are ascending but definitely not continuous. and, lucky for me, was just working with a 5-year moving window of data, so at most 5 layers (plus a couple extras...) of those CTEs were needed.

    I did not see a way to use LAG/LEAD for this, either.

  • corey lawson wrote:

    an aside: Azure Synapse cannot do recursive CTEs (or heirarchyid...), if you happen to be working in that realm.

    I settled for a few layers of nested CTEs down to the expected "depth", plus a little more, for something that would've been less painful to do with recursive CTEs.  ( "find earliest  qualifying event, get records for the ensuing 12 month span from that 1st date. then, find next qualifying event date after that span, and... lather-rinse-repeat. the event dates are ascending but definitely not continuous. and, lucky for me, was just working with a 5-year moving window of data, so at most 5 layers (plus a couple extras...) of those CTEs were needed.

    I did not see a way to use LAG/LEAD for this, either.

    A Recursive CTE is nothing but a different method of doing a While loop.  You can use a While loop to replace a recursive CTE and, even if you do the While loop in a typical manner, it will be almost as fast as the Recursive CTE (rCTE from here on) and use almost 9 times fewer memory reads.  If you do an "optimized" While loop (Explicit Transaction around the WHILE, it will also beat the rCTE for both CPU and Duration not to mention the same amount of fewer memory reads (almost 9 times fewer) than the rCTE.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply