Tell me if I'm crazy - Recursive CTE + View (Materialized, indexed?)

  • First off, I'm a .NET developer, and not a SQL wonk. Please be gentle. And thanks for your help.

    So I have a lattice (a multiparent hierarchy) of objects representing sets of people sliced along some axis. Currently these sets exist as documents in RavenDB. Example:

    Set

    {

    Id: 'some-guid',

    Name: 'RegionName',

    Dimension: 'Location',

    ImmediateParents: ['containing-set-id', 'containing-set-id'],

    ImmediateChildren: ['contained-set-id', 'contained-set-id'],

    ContainedBy: ['denormalized', 'list', 'of', 'all', 'sets', 'that', 'contain', 'me']

    }

    Person

    {

    ...

    ImmediateParents: ['sets', 'i', 'am', 'assigned', 'to', 'directly'],

    ContainedBy: ['denormalized', 'list', 'of', 'all', 'sets', 'that', 'contain', 'me']

    }

    The bulk of our queries are either direct reads from one of those lists, or "give me all the people contained by this set" query. Obviously our current implementation makes those queries really fast, but keeping all this updated is a delicate dance.

    It occurred to me that having a normalized representation in SQL, and then building a view on top of it to denormalize the "contained by" might get me off the dancefloor. And that if I could materialize (usage?) that view and index it, I could get roughly the same performance profile I have now, with the workload heavily weighted toward write time.

    Then tonight I learned of Recursive CTEs and the whole thing gelled for me. But I'm betting on some pretty big assumptions about features of Views that I don't understand. Am I sniffing up a good tree? Thanks for any insight.

  • bump. Any thoughts at all? Is my question clear?

  • It sounds like you are thinking about having 3 columns that each have a delimited list of IDs or something like that? That is certainly not a normalized version. Or maybe I am misunderstanding what you are trying to do.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • I'm thinking these tables:

    Set(

    Id,

    Name,

    Dimension,

    ...)

    Person (

    Id,

    Name,

    ...)

    SetRelationships (

    ParentSetId,

    ChildSetId

    )

    PersonSets (

    PersonId,

    SetId

    )

    And then these tables give me a lot of my queries directly. The ones I want to denormalize would be

    PersonContainedBy(

    PersonId

    SetId

    )

    SetContainedBy(

    SetId,

    AncestorId

    )

    Which would be a flattening of the hierarchy described in the tables.

    I.E. if A contains B, and B contains C per the tables, Then C would have records for A and B in the ContainedBy view.

    I guess my question is if I build a materialized (or indexed) view over a Recursive CTE, how performant will that be? If I change the underlying data, will it have to recalculate the whole view, or just the changed records?

  • jace.bennett (7/18/2013)


    I'm thinking these tables:

    Set(

    Id,

    Name,

    Dimension,

    ...)

    Person (

    Id,

    Name,

    ...)

    SetRelationships (

    ParentSetId,

    ChildSetId

    )

    PersonSets (

    PersonId,

    SetId

    )

    And then these tables give me a lot of my queries directly. The ones I want to denormalize would be

    PersonContainedBy(

    PersonId

    SetId

    )

    SetContainedBy(

    SetId,

    AncestorId

    )

    Which would be a flattening of the hierarchy described in the tables.

    I.E. if A contains B, and B contains C per the tables, Then C would have records for A and B in the ContainedBy view.

    I guess my question is if I build a materialized (or indexed) view over a Recursive CTE, how performant will that be? If I change the underlying data, will it have to recalculate the whole view, or just the changed records?

    I guess I am missing why you think you need a recursive cte for this. Your data is denormalized so it would seem to eliminate the need for recursion.

    Your definition of the problem is so vague that I am having a hard time envisioning what you are doing here.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Maybe this link will help.

    http://www.sqlservercentral.com/articles/Hierarchy/94040/[/url]

     

  • First, thanks for your time. Sorry If I'm communicating poorly.

    Imagine the sets describe locations: North America, USA, Georgia, Fulton County, Atlanta.

    I might describe this relationship in a relationship table (I'm excluding the actual person and set tables and using names instead of Ids for clarity):

    Parent Set | Child Set

    ============================

    North America | USA

    USA | Georgia

    Georgia | Fulton County

    Georgia | Dekalb County

    Fulton County | Atlanta

    Dekalb County | Decatur

    Person | Set

    ================

    Joe | Atlanta

    Jane | Decatur

    Now, I have an extremely frequent and latency-sensitive query that says "Give me all the people (or sets) under Georgia". My thought is to make this query really fast by projecting it into this shape:

    Person | Contained By

    ==============

    Joe | Atlanta

    Joe | Fulton County

    Joe | Georgia

    Joe | USA

    Joe | North America

    Jane | Decatur

    Jane | Dekalb County

    Jane | Georgia

    Jane | USA

    Jane | North America

    And then another one for SetsContainedBy.

    Sadly, Using actual snowflake schema has two distinct disadvantages in my case, the first is that sets (and dimensions) are really dynamic, and that I can have sets be child to multiple other sets (for example, Atlanta actually lies in two counties, Fulton and Dekalb).

    I hope that clarifies my goal, and why the recursion seems like an obvious win to a non-dba. Thanks again.

  • Are you looking for something like this?

    DECLARE @AdjacencyList TABLE

    (

    Parent VARCHAR(20)

    ,Child VARCHAR(20)

    );

    DECLARE @PersonsLocations TABLE

    (

    Person VARCHAR(20)

    ,Location VARCHAR(20)

    );

    INSERT INTO @AdjacencyList

    SELECT 'North America','USA'

    UNION ALL SELECT 'USA','Georgia'

    UNION ALL SELECT 'Georgia','Fulton County'

    UNION ALL SELECT 'Georgia','Dekalb County'

    UNION ALL SELECT 'Fulton County','Atlanta'

    UNION ALL SELECT 'Dekalb County','Decatur';

    INSERT INTO @PersonsLocations

    SELECT 'Joe','Atlanta'

    UNION ALL SELECT 'Jane','Decatur';

    WITH ExpandHierarchy AS (

    SELECT n=1, Person, Location, Parent, Child

    FROM @PersonsLocations a

    INNER JOIN @AdjacencyList b ON b.Child = a.Location

    UNION ALL

    SELECT n+1, a.Person, a.Location, b.Parent, b.Child

    FROM ExpandHierarchy a

    INNER JOIN @AdjacencyList b ON a.Parent = b.Child

    )

    SELECT Person, Location

    FROM (

    SELECT n, Person, Location=Parent

    FROM ExpandHierarchy

    UNION ALL

    SELECT 0, Person, Location

    FROM @PersonsLocations) a

    ORDER BY Person, n;


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Yup that just about does it I think. Thanks so much for that.

    Particulars of the projection aside, my original question was about how this style query plays with views. I want the results of that query indexed, and kept largely in sync with the adjacency list (learned a new term, thanks, guys). What is my best option? Will a View over that query be performant in the face of the occasional update? Or will it have to rerun the entirety of the query and recreate the index? Should I just run the projection on a schedule or try to work out something with triggers?

  • Steven Willis (7/18/2013)


    Maybe this link will help.

    http://www.sqlservercentral.com/articles/Hierarchy/94040/[/url]

     

    Thanks! I found that link very helpful. The Nested Sets structure is not appropriate for me case, as I don't have a strict hierarchy, but a lattice. The solution presented in Part 2 seems promising, but I'm still trying to digest the primer materials.

  • jace.bennett (7/18/2013)


    Yup that just about does it I think. Thanks so much for that.

    Particulars of the projection aside, my original question was about how this style query plays with views. I want the results of that query indexed, and kept largely in sync with the adjacency list (learned a new term, thanks, guys). What is my best option? Will a View over that query be performant in the face of the occasional update? Or will it have to rerun the entirety of the query and recreate the index? Should I just run the projection on a schedule or try to work out something with triggers?

    The rCTE I provided against your adjacency list will start being less performant if you have many levels in the hierarchy. But if you say, flatten your hierarchy this way in an overnight job, putting those results into another table, that might not work out too bad for you.

    I guess it depends on how often you need to access the results of the rCTE.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply