Slow running "ranking" problem

  • Hi all

     

    I've got a table that contains 1.8million (yes, million) records in a parent/child relationship.

     

    Now, children can also be parents and have their own children (if that makes sense).

     

    What I need to do is take a top-level parent (call it level 1) and then list all the children under it (with the appropriate level number).

    Either that or start with the lowest child value and work up the "tree" to get the top level parent.

     

    In either case, I need to list all the intermediate levels and number them accordingly.

     

    I've done some googling and come up with the following code:-

    ;WITH cte AS
    (
    SELECT
    a.IsActive
    ,Child= a.SourceID
    ,Parent = a.DestinationID
    ,Level= 1
    FROM
    Lookups.dbo.tbl_SCT2_Relationships AS a
    WHERE
    a.TypeID= 116680003
    --AND a.SourceID IN
    --(83821000000107, 100000000, 100005005)

    UNION ALL

    SELECT
    a.IsActive
    ,Child= a.SourceID
    ,Parent = a.DestinationID
    ,Level= c.Level + 1
    FROM
    Lookups.dbo.tbl_SCT2_RelationshipsAS a
    JOIN cteAS c
    ON a.SourceID = c.Parent

    )

    SELECTDISTINCT
    c1.Parent
    ,c1.Child
    ,c1.IsActive
    ,c1.Level
    ,c2.Level
    FROM
    cteAS c1
    LEFT JOIN cte AS c2
    ON c1.Child = c2.Parent
    ORDER BY
    c1.Parent
    ,c1.Child
    OPTION (MAXRECURSION 0)

    If I run it with just a few records (see the numbers commented out in the first part of the CTE) it runs in around 40 seconds.  I know, that's fairly slow but gives me exactly what I want.

     

    If I run it against the full table (around 1.8million records), it runs forever.  It's currently been running for 4 hours with no end in sight.

     

    I was looking at putting separate indexes on the parent and child columns in the original table but I'm not sure how much extra that will give (especially as the execution plan isn't recommending any).

     

    Anyone any ideas on how I can speed this up?

     

    Thanks in advance

     

    Richard

  • what indexes do you have on Lookups.dbo.tbl_SCT2_Relationships?  what does the execution plan look like?

    For better, quicker answers, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Hi Mike

     

    Here's the table:-

    USE [Lookups]
    GO

    /****** Object: Table [dbo].[tbl_SCT2_Relationships] Script Date: 27/01/2020 14:53:52 ******/
    SET ANSI_NULLS ON
    GO

    SET QUOTED_IDENTIFIER ON
    GO

    CREATE TABLE [dbo].[tbl_SCT2_Relationships](
    [pkID] [BIGINT] NOT NULL,
    [EffectiveDate] [DATE] NOT NULL,
    [IsActive] [BIT] NOT NULL,
    [ModuleID] [BIGINT] NOT NULL,
    [SourceID] [BIGINT] NOT NULL,
    [DestinationID] [BIGINT] NOT NULL,
    [RelationshipGroup] [INT] NOT NULL,
    [TypeID] [BIGINT] NOT NULL,
    [CharacteristicTypeID] [BIGINT] NOT NULL,
    [ModifierID] [BIGINT] NOT NULL,
    [SYSDateLoaded] [DATETIME] NOT NULL,
    [SYSDateLastUpdated] [DATETIME] NOT NULL,
    [SYSIsDeleted] [BIT] NOT NULL,
    CONSTRAINT [PK__tbl_sct2__3213E83FA2182033] PRIMARY KEY CLUSTERED
    (
    [pkID] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
    ) ON [PRIMARY]
    GO

    ALTER TABLE [dbo].[tbl_SCT2_Relationships] ADD DEFAULT (GETDATE()) FOR [SYSDateLoaded]
    GO

    ALTER TABLE [dbo].[tbl_SCT2_Relationships] ADD DEFAULT (GETDATE()) FOR [SYSDateLastUpdated]
    GO

    ALTER TABLE [dbo].[tbl_SCT2_Relationships] ADD DEFAULT ((0)) FOR [SYSIsDeleted]
    GO


     

    And here the only index:-

    USE Lookups;
    GO

    /****** Object: Index [idx_tbl_SCT2_Relationships_TypeID_SYSIsDeleted_DestinationID] Script Date: 27/01/2020 14:54:10 ******/
    CREATE NONCLUSTERED INDEX idx_tbl_SCT2_Relationships_TypeID_SYSIsDeleted_DestinationID
    ON dbo.tbl_SCT2_Relationships(TypeID ASC, SYSIsDeleted ASC, DestinationID ASC, SourceID ASC)
    INCLUDE(IsActive)
    WITH(PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON
    ,FILLFACTOR = 90
    )
    ON [PRIMARY];
    GO


     

    It won't let me add the execution plan (as an exeuction plan file) as it's a banned file-type.

    I've changed the extension to ".txt" so you'll need to change it back to ".sqlplan" to open it.

     

    Cheers

    Richard

    Attachments:
    You must be logged in to view attached files.
  • I think performance might improve if you add an index on SourceId including other columns in the query.

    Try this and see if there is an improvement:

     CREATE INDEX IX_tbl_SCT2_Relationships_1 
    ON dbo.tbl_SCT2_Relationships(SourceID)
    INCLUDE (IsActive, DestinationID, TypeID);
  • What you've created as part of the CTE JOIN CTE is joining 1.8million records to 1.8million records where the engine has to build both 1.8 million record sets twice.

    The expression is built at run time, so "FROM CTE" builds the 1st version of the result set "JOIN CTE" builds the next version of the same result set, then the "ON x = y" has to process the joins which is all done in memory.

    If you where to build the expression and dump it to a #table and joined that instead, you're only expressing the data once (not twice) which will speed up half of the problem.

    ;WITH cte AS
    (
    SELECT
    a.IsActive
    ,Child= a.SourceID
    ,Parent = a.DestinationID
    ,Level= 1
    FROM
    Lookups.dbo.tbl_SCT2_Relationships AS a
    WHERE
    a.TypeID= 116680003
    --AND a.SourceID IN
    --(83821000000107, 100000000, 100005005)

    UNION ALL

    SELECT
    a.IsActive
    ,Child= a.SourceID
    ,Parent = a.DestinationID
    ,Level= c.Level + 1
    FROM
    Lookups.dbo.tbl_SCT2_RelationshipsAS a
    JOIN cteAS c
    ON a.SourceID = c.Parent

    )
    SELECT * INTO #TEMP FROM cte

    SELECTDISTINCT
    c1.Parent
    ,c1.Child
    ,c1.IsActive
    ,c1.Level
    ,c2.Level
    FROM
    #tempAS c1
    LEFT JOIN #temp AS c2
    ON c1.Child = c2.Parent
    ORDER BY
    c1.Parent
    ,c1.Child
    OPTION (MAXRECURSION 0)

    DROP TABLE #TEMP
  • Thanks Jonathan

     

    My smaller version of the query (using just 3 IDs) has now gone from 40 seconds to almost instant (less than 2 seconds).

    I'm now doing a full against all the records and I'll let you know what happens.

     

    Thanks Anthony

     

    I'll give that a go as well.

  • It's a good idea from Anthony, you might get even better performance if you also add an index to #TEMP:

    IF OBJECT_ID('tempdb..#TEMP') IS NOT NULL
    DROP TABLE #TEMP;

    ;WITH cte AS
    (
    ...
    ...
    )
    SELECT *
    INTO #TEMP
    FROM cte;

    /* One of the two indexes below */
    CREATE INDEX IX_#TEMP_1
    ON #TEMP(Parent)
    INCLUDE (Child, IsActive, Level);

    CREATE INDEX IX_#TEMP_2
    ON #TEMP(Child)
    INCLUDE (Parent, IsActive, Level);

    SELECT DISTINCT
    ...
    ;

    • This reply was modified 4 years, 3 months ago by  Jonathan AC Roberts. Reason: maybe a different index
  • Thanks all

     

    I'll try out the ideas and let you know when I finally get this thing to run.

  • hey,

    hold on - wasn't this why hierarchyID types were invented ?

    you can even use persisted computed columns to make it nice and fast.

    you would need a 1 time hit on your database for a new column and calculate it's position in the hierarchy then you have access to Hierarchycolumn.getlevel(), .getancestor(), .getdescendants() etc etc - you can even reparent entire sections of the tree.

    no need for a CTE at all

     

     

    MVDBA

  • MVDBA (Mike Vessey) wrote:

    hey,

    hold on - wasn't this why hierarchyID types were invented ?

    you can even use persisted computed columns to make it nice and fast.

    you would need a 1 time hit on your database for a new column and calculate it's position in the hierarchy then you have access to Hierarchycolumn.getlevel(), .getancestor(), .getdescendants() etc etc - you can even reparent entire sections of the tree.

    no need for a CTE at all 

    I was thinking that but I've never used hierarchyid so can't give a solution.

    Maybe you could provide some code that would do the job for this question?

  • might take me a little while just to pull all of the examples and test data together - it's 2 pm and I've not had lunch yet... my laptop is on a go "slow" session as it's shrinking files and the SSDs can't cope - give me an hour

    MVDBA

  • I haven't heard of the HeirarchyID datatype, so can you give me some pointers please?

     

    I've also tried using a temp table (with indexing) against the full dataset but my tempdb went bananas.

    Just to add a possible curveball, I'll need all the parents for any given code all the way to the top of the "tree".

     

    Not sure if that will cause any issues, but thought I'd better mention it.

  • MVDBA (Mike Vessey) wrote:

    hey,

    hold on - wasn't this why hierarchyID types were invented ?

    you can even use persisted computed columns to make it nice and fast.

    you would need a 1 time hit on your database for a new column and calculate it's position in the hierarchy then you have access to Hierarchycolumn.getlevel(), .getancestor(), .getdescendants() etc etc - you can even reparent entire sections of the tree.

    no need for a CTE at all

    I've found that HierarchyID relatively sucks especially if "repairs" to the structure need to be made. 😀  Nested Sets have worked much better for me.  Whatever you do, don't use the classic "push stack" to build them.  It will take several days (literally) to execute even just for a million rows.  Please see the following article which also has demonstrable code that creates a million node hierarchy to test against.  The second article shows a possible alternative that takes the same amount of time to execute and pre-aggregates most answers that you might ask of a Parent-Child table.

    https://www.sqlservercentral.com/articles/hierarchies-on-steroids-1-convert-an-adjacency-list-to-nested-sets

    https://www.sqlservercentral.com/articles/hierarchies-on-steroids-2-a-replacement-for-nested-sets-calculations-1

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • ok here is the most basic stuff you need - the rest you can google and find lots of cool code

    CREATE TABLE #htest (id INT IDENTITY, hierarchyvalue hierarchyid PRIMARY key, sometext VARCHAR(500) )
    DECLARE @h HIERARCHYID
    SET @h=hierarchyid::GetRoot()
    INSERT INTO #htest (hierarchyvalue,sometext) SELECT @h,'this is the root node - the cheif exec of the business'


    INSERT INTO #htest (hierarchyvalue,sometext) SELECT @h.GetDescendant(NULL,NULL),'line manager number 1'
    SELECT *,hierarchyvalue.GetLevel() AS lvl,hierarchyvalue.ToString() AS location FROM #htest
    DROP TABLE #htest

    basically I create a table (temp table just for the demo) - I add a "ROOT" node (@h) using ::GetRoot() - be carefull this is case sensitive - this creates my parent

    next I add a child (line manager number 1) to the data using "GetDescendant of the node value for the chief exec

    have a look at the query I use in the select statement - I have .ToString (shows me a pretty picture of where the row sits in the structure)

    / = root

    /1 = first child

    /2= sister or brother of the first child

    /1.1=first child of /1

    also notice .GetLevel() - it shows you the level in the tree

    To query the tree you can use functions like "isdescendantof" (is it anywhere  under the parent you are looking at ) - combine this with getlevel and you can do really cool stuff

    read the MS documentation around some of these functions - I don't want to re-type them all 🙂 but have fun learning

    https://docs.microsoft.com/en-us/sql/t-sql/data-types/isdescendantof-database-engine?view=sql-server-ver15

    if you want help on a specific function of HierarchyID stuff then PM me.. ( getdescendant can be tricky)

     

    MVDBA

  • Jeff Moden wrote:

    MVDBA (Mike Vessey) wrote:

    hey,

    hold on - wasn't this why hierarchyID types were invented ?

    you can even use persisted computed columns to make it nice and fast.

    you would need a 1 time hit on your database for a new column and calculate it's position in the hierarchy then you have access to Hierarchycolumn.getlevel(), .getancestor(), .getdescendants() etc etc - you can even reparent entire sections of the tree.

    no need for a CTE at all

    I've found that HierarchyID relatively sucks especially if "repairs" to the structure need to be made. 😀  Nested Sets have worked much better for me.  Whatever you do, don't use the classic "push stack" to build them.  It will take several days (literally) to execute even just for a million rows.  Please see the following article which also has demonstrable code that creates a million node hierarchy to test against.  The second article shows a possible alternative that takes the same amount of time to execute and pre-aggregates most answers that you might ask of a Parent-Child table.

    https://www.sqlservercentral.com/articles/hierarchies-on-steroids-1-convert-an-adjacency-list-to-nested-sets

    https://www.sqlservercentral.com/articles/hierarchies-on-steroids-2-a-replacement-for-nested-sets-calculations-1

    Hi jeff

    I found the HierarchyID stuff "waterproofed" me from developer idiots who allowed the data (id, managerid type scenario) to create an infinite loop where the janitor is the boss of the CTO and we loop forever - even maxrecursion on a CTO cant fix developers getting it wrong

    I found the cleanup of using Int values for managers was even harder to repair. you can put a unique constraint on the hierarchy to make sure it's all nice and clean... and it's not possible to make my grandson my father. (although there are parts of the UK we think that is possible)

    I wouldn't dismiss it out of hand - maybe not fast, but it does protect you.

    and if you ever need to reposition columns in a table on screen they are amazing - rather than shuffle a "display order" field  using a bubble sort - you simply just just move the hierarchy object to "somewhere" between the objects you want - no need to touch the other rows . Hierarchy stuff doesn't always have to be vertical (familly tree) - it can be horizontal (gant chart style)

     

    MVDBA

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply