Slow running "ranking" problem

Question

Post reply

Slow running "ranking" problem

richardmgreen1

SSChampion

Points: 10306
More actions
January 27, 2020 at 2:04 pm

#3720620
Hi all

I've got a table that contains 1.8million (yes, million) records in a parent/child relationship.

Now, children can also be parents and have their own children (if that makes sense).

What I need to do is take a top-level parent (call it level 1) and then list all the children under it (with the appropriate level number).
Either that or start with the lowest child value and work up the "tree" to get the top level parent.

In either case, I need to list all the intermediate levels and number them accordingly.

I've done some googling and come up with the following code:-
```
;WITH cte AS
(
SELECT
a.IsActive
,Child= a.SourceID
,Parent = a.DestinationID
,Level= 1
FROM
Lookups.dbo.tbl_SCT2_Relationships AS a
WHERE
a.TypeID= 116680003
--AND a.SourceID IN
--(83821000000107, 100000000, 100005005)

UNION ALL

SELECT
a.IsActive
,Child= a.SourceID
,Parent = a.DestinationID
,Level= c.Level + 1
FROM
Lookups.dbo.tbl_SCT2_RelationshipsAS a
JOIN cteAS c
ON a.SourceID = c.Parent

)

SELECTDISTINCT
c1.Parent
,c1.Child
,c1.IsActive
,c1.Level
,c2.Level
FROM
cteAS c1
LEFT JOIN cte AS c2
ON c1.Child = c2.Parent
ORDER BY
c1.Parent
,c1.Child
OPTION (MAXRECURSION 0)
```
If I run it with just a few records (see the numbers commented out in the first part of the CTE) it runs in around 40 seconds. I know, that's fairly slow but gives me exactly what I want.

If I run it against the full table (around 1.8million records), it runs forever. It's currently been running for 4 hours with no end in sight.

I was looking at putting separate indexes on the parent and child columns in the original table but I'm not sure how much extra that will give (especially as the execution plan isn't recommending any).

Anyone any ideas on how I can speed this up?

Thanks in advance

Richard

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply

Mike01 SSChampion Points: 11393 More actions · Answer 1

what indexes do you have on Lookups.dbo.tbl_SCT2_Relationships? what does the execution plan look like?

For better, quicker answers, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

richardmgreen1 SSChampion Points: 10306 More actions · Answer 2

Hi Mike

Here's the table:-

USE [Lookups]
GO

/****** Object:  Table [dbo].[tbl_SCT2_Relationships]    Script Date: 27/01/2020 14:53:52 ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [dbo].[tbl_SCT2_Relationships](
[pkID] [BIGINT] NOT NULL,
[EffectiveDate] [DATE] NOT NULL,
[IsActive] [BIT] NOT NULL,
[ModuleID] [BIGINT] NOT NULL,
[SourceID] [BIGINT] NOT NULL,
[DestinationID] [BIGINT] NOT NULL,
[RelationshipGroup] [INT] NOT NULL,
[TypeID] [BIGINT] NOT NULL,
[CharacteristicTypeID] [BIGINT] NOT NULL,
[ModifierID] [BIGINT] NOT NULL,
[SYSDateLoaded] [DATETIME] NOT NULL,
[SYSDateLastUpdated] [DATETIME] NOT NULL,
[SYSIsDeleted] [BIT] NOT NULL,
 CONSTRAINT [PK__tbl_sct2__3213E83FA2182033] PRIMARY KEY CLUSTERED 
(
[pkID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
) ON [PRIMARY]
GO

ALTER TABLE [dbo].[tbl_SCT2_Relationships] ADD  DEFAULT (GETDATE()) FOR [SYSDateLoaded]
GO

ALTER TABLE [dbo].[tbl_SCT2_Relationships] ADD  DEFAULT (GETDATE()) FOR [SYSDateLastUpdated]
GO

ALTER TABLE [dbo].[tbl_SCT2_Relationships] ADD  DEFAULT ((0)) FOR [SYSIsDeleted]
GO

And here the only index:-

USE Lookups;
GO

/****** Object:  Index [idx_tbl_SCT2_Relationships_TypeID_SYSIsDeleted_DestinationID]    Script Date: 27/01/2020 14:54:10 ******/
CREATE NONCLUSTERED INDEX idx_tbl_SCT2_Relationships_TypeID_SYSIsDeleted_DestinationID
ON dbo.tbl_SCT2_Relationships(TypeID ASC, SYSIsDeleted ASC, DestinationID ASC, SourceID ASC)
INCLUDE(IsActive)
WITH(PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON
,FILLFACTOR = 90
)
ON [PRIMARY];
GO

It won't let me add the execution plan (as an exeuction plan file) as it's a banned file-type.

I've changed the extension to ".txt" so you'll need to change it back to ".sqlplan" to open it.

Cheers

Richard

This reply was modified 4 years, 3 months ago by richardmgreen1.

Attachments:

You must be logged in to view attached files.

Jonathan AC Roberts SSCoach Points: 18271 More actions · Answer 3

I think performance might improve if you add an index on SourceId including other columns in the query.

Try this and see if there is an improvement:

 CREATE INDEX IX_tbl_SCT2_Relationships_1 
     ON dbo.tbl_SCT2_Relationships(SourceID)
INCLUDE (IsActive, DestinationID, TypeID);

Ant-Green SSC Guru Points: 113445 More actions · Answer 4

What you've created as part of the CTE JOIN CTE is joining 1.8million records to 1.8million records where the engine has to build both 1.8 million record sets twice.

The expression is built at run time, so "FROM CTE" builds the 1st version of the result set "JOIN CTE" builds the next version of the same result set, then the "ON x = y" has to process the joins which is all done in memory.

If you where to build the expression and dump it to a #table and joined that instead, you're only expressing the data once (not twice) which will speed up half of the problem.

;WITH cte AS
(
SELECT
a.IsActive
,Child= a.SourceID
,Parent = a.DestinationID
,Level= 1
FROM
Lookups.dbo.tbl_SCT2_Relationships AS a
WHERE
a.TypeID= 116680003
--AND a.SourceID IN
--(83821000000107, 100000000, 100005005)

UNION ALL

SELECT
a.IsActive
,Child= a.SourceID
,Parent = a.DestinationID
,Level= c.Level + 1
FROM
Lookups.dbo.tbl_SCT2_RelationshipsAS a
JOIN cteAS c
ON a.SourceID = c.Parent

)
SELECT * INTO #TEMP FROM cte

SELECTDISTINCT
c1.Parent
,c1.Child
,c1.IsActive
,c1.Level
,c2.Level
FROM
#tempAS c1
LEFT JOIN #temp AS c2
ON c1.Child = c2.Parent
ORDER BY
c1.Parent
,c1.Child
OPTION (MAXRECURSION 0)

DROP TABLE #TEMP

richardmgreen1 SSChampion Points: 10306 More actions · Answer 5

Thanks Jonathan

My smaller version of the query (using just 3 IDs) has now gone from 40 seconds to almost instant (less than 2 seconds).

I'm now doing a full against all the records and I'll let you know what happens.

Thanks Anthony

I'll give that a go as well.

This reply was modified 4 years, 3 months ago by richardmgreen1.

Jonathan AC Roberts SSCoach Points: 18271 More actions · Answer 6

It's a good idea from Anthony, you might get even better performance if you also add an index to #TEMP:

IF OBJECT_ID('tempdb..#TEMP') IS NOT NULL
    DROP TABLE #TEMP;

;WITH cte AS
(
    ...
    ...
)
SELECT *
  INTO #TEMP
  FROM cte;

/* One of the two indexes below */
 CREATE INDEX IX_#TEMP_1
     ON #TEMP(Parent)
INCLUDE (Child, IsActive, Level);

 CREATE INDEX IX_#TEMP_2
     ON #TEMP(Child)
INCLUDE (Parent, IsActive, Level);

SELECT DISTINCT
       ...
;

This reply was modified 4 years, 3 months ago by Jonathan AC Roberts. Reason: maybe a different index

richardmgreen1 SSChampion Points: 10306 More actions · Answer 7

Thanks all

I'll try out the ideas and let you know when I finally get this thing to run.

MVDBA (Mike Vessey) SSC-Insane Points: 21797 More actions · Answer 8

hey,

hold on - wasn't this why hierarchyID types were invented ?

you can even use persisted computed columns to make it nice and fast.

you would need a 1 time hit on your database for a new column and calculate it's position in the hierarchy then you have access to Hierarchycolumn.getlevel(), .getancestor(), .getdescendants() etc etc - you can even reparent entire sections of the tree.

no need for a CTE at all

MVDBA

Jonathan AC Roberts SSCoach Points: 18271 More actions · Answer 9

MVDBA (Mike Vessey) wrote:

hey,
hold on - wasn't this why hierarchyID types were invented ?
you can even use persisted computed columns to make it nice and fast.
you would need a 1 time hit on your database for a new column and calculate it's position in the hierarchy then you have access to Hierarchycolumn.getlevel(), .getancestor(), .getdescendants() etc etc - you can even reparent entire sections of the tree.
no need for a CTE at all

I was thinking that but I've never used hierarchyid so can't give a solution.

Maybe you could provide some code that would do the job for this question?

MVDBA (Mike Vessey) SSC-Insane Points: 21797 More actions · Answer 10

might take me a little while just to pull all of the examples and test data together - it's 2 pm and I've not had lunch yet... my laptop is on a go "slow" session as it's shrinking files and the SSDs can't cope - give me an hour

MVDBA

richardmgreen1 SSChampion Points: 10306 More actions · Answer 11

I haven't heard of the HeirarchyID datatype, so can you give me some pointers please?

I've also tried using a temp table (with indexing) against the full dataset but my tempdb went bananas.

Just to add a possible curveball, I'll need all the parents for any given code all the way to the top of the "tree".

Not sure if that will cause any issues, but thought I'd better mention it.

Jeff Moden SSC Guru Points: 1003895 More actions · Answer 12

MVDBA (Mike Vessey) wrote:

hey,
hold on - wasn't this why hierarchyID types were invented ?
you can even use persisted computed columns to make it nice and fast.
you would need a 1 time hit on your database for a new column and calculate it's position in the hierarchy then you have access to Hierarchycolumn.getlevel(), .getancestor(), .getdescendants() etc etc - you can even reparent entire sections of the tree.
no need for a CTE at all

I've found that HierarchyID relatively sucks especially if "repairs" to the structure need to be made. 😀 Nested Sets have worked much better for me. Whatever you do, don't use the classic "push stack" to build them. It will take several days (literally) to execute even just for a million rows. Please see the following article which also has demonstrable code that creates a million node hierarchy to test against. The second article shows a possible alternative that takes the same amount of time to execute and pre-aggregates most answers that you might ask of a Parent-Child table.

https://www.sqlservercentral.com/articles/hierarchies-on-steroids-1-convert-an-adjacency-list-to-nested-sets

https://www.sqlservercentral.com/articles/hierarchies-on-steroids-2-a-replacement-for-nested-sets-calculations-1

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

MVDBA (Mike Vessey) SSC-Insane Points: 21797 More actions · Answer 13

ok here is the most basic stuff you need - the rest you can google and find lots of cool code

CREATE TABLE #htest (id INT IDENTITY, hierarchyvalue hierarchyid PRIMARY key, sometext VARCHAR(500) )
DECLARE @h HIERARCHYID
SET @h=hierarchyid::GetRoot()
INSERT INTO #htest (hierarchyvalue,sometext) SELECT @h,'this is the root node - the cheif exec of the business'


INSERT INTO #htest (hierarchyvalue,sometext) SELECT @h.GetDescendant(NULL,NULL),'line manager number 1'
SELECT *,hierarchyvalue.GetLevel()  AS lvl,hierarchyvalue.ToString() AS location FROM #htest
DROP TABLE #htest

basically I create a table (temp table just for the demo) - I add a "ROOT" node (@h) using ::GetRoot() - be carefull this is case sensitive - this creates my parent

next I add a child (line manager number 1) to the data using "GetDescendant of the node value for the chief exec

have a look at the query I use in the select statement - I have .ToString (shows me a pretty picture of where the row sits in the structure)

/ = root

/1 = first child

/2= sister or brother of the first child

/1.1=first child of /1

also notice .GetLevel() - it shows you the level in the tree

To query the tree you can use functions like "isdescendantof" (is it anywhere under the parent you are looking at ) - combine this with getlevel and you can do really cool stuff

read the MS documentation around some of these functions - I don't want to re-type them all 🙂 but have fun learning

https://docs.microsoft.com/en-us/sql/t-sql/data-types/isdescendantof-database-engine?view=sql-server-ver15

if you want help on a specific function of HierarchyID stuff then PM me.. ( getdescendant can be tricky)

MVDBA

MVDBA (Mike Vessey) SSC-Insane Points: 21797 More actions · Answer 14

Jeff Moden wrote:

MVDBA (Mike Vessey) wrote:
hey,
hold on - wasn't this why hierarchyID types were invented ?
you can even use persisted computed columns to make it nice and fast.
you would need a 1 time hit on your database for a new column and calculate it's position in the hierarchy then you have access to Hierarchycolumn.getlevel(), .getancestor(), .getdescendants() etc etc - you can even reparent entire sections of the tree.
no need for a CTE at all
I've found that HierarchyID relatively sucks especially if "repairs" to the structure need to be made. 😀 Nested Sets have worked much better for me. Whatever you do, don't use the classic "push stack" to build them. It will take several days (literally) to execute even just for a million rows. Please see the following article which also has demonstrable code that creates a million node hierarchy to test against. The second article shows a possible alternative that takes the same amount of time to execute and pre-aggregates most answers that you might ask of a Parent-Child table.
https://www.sqlservercentral.com/articles/hierarchies-on-steroids-1-convert-an-adjacency-list-to-nested-sets
https://www.sqlservercentral.com/articles/hierarchies-on-steroids-2-a-replacement-for-nested-sets-calculations-1

Hi jeff

I found the HierarchyID stuff "waterproofed" me from developer idiots who allowed the data (id, managerid type scenario) to create an infinite loop where the janitor is the boss of the CTO and we loop forever - even maxrecursion on a CTO cant fix developers getting it wrong

I found the cleanup of using Int values for managers was even harder to repair. you can put a unique constraint on the hierarchy to make sure it's all nice and clean... and it's not possible to make my grandson my father. (although there are parts of the UK we think that is possible)

I wouldn't dismiss it out of hand - maybe not fast, but it does protect you.

and if you ever need to reposition columns in a table on screen they are amazing - rather than shuffle a "display order" field using a bubble sort - you simply just just move the hierarchy object to "somewhere" between the objects you want - no need to touch the other rows . Hierarchy stuff doesn't always have to be vertical (familly tree) - it can be horizontal (gant chart style)

MVDBA