CTE performance

Question

CTE performance

kazim.raza

SSCrazy

Points: 2390
More actions
May 8, 2010 at 1:05 pm

#218666

I have the following CTE that has a temp table in its anchor portion. Temp table has about 1.6 hundred thousand rows. When I execute the final select from the CTE it gets very, very slow. Could any one please suggest how can I speed it up? I am listing the schema of temp table, CTE code and the data
I am expecting the amount of dataset to be in millions, at least for a few times (data migration, first deployment etc.)
-- Source table that holds the actual hierarichal data, this table has those 1.6 hundred thousand records
create table #Org
(OrgEntityID bigint, EntityName nvarchar(500), OrgID bigint, EntityType nvarchar(500), OrgLevel int, ParentID bigint)
-- I created this index as SQL server's execution plan suggested.
CREATE NONCLUSTERED INDEX [IX_#Org]
ON [#Org] ([ParentID])
INCLUDE ([OrgEntityID],[EntityName],[EntityType],[OrgLevel])
-- temp table to store the results of CTE
create table #Hie
(EntityName nvarchar(max), OrgEntityID bigint, OrgEntityType nvarchar(500), Level int, FullyQualifiedName nvarchar(max), FullyQualifiedLevel nvarchar(1000))
-- the actual CTE
;With Hie(EntityName, OrgEntityID, OrgEntityType, Level, FullyQualifiedName, FullyQualifiedLevel)
as
(
Select O.EntityName, O.OrgEntityID, O.EntityType, 0, Cast('.'+O.EntityName+'.' as nvarchar(MAX)), Cast('.'+O.EntityType+'.' as nvarchar(MAX))
From #Org o
Where O.ParentID is null
union all
Select o.EntityName, o.OrgEntityID, O.EntityType, O.OrgLevel+1, H.FullyQualifiedName+'.'+O.EntityName+'.', H.FullyQualifiedLevel+'.'+O.EntityType+'.'
from #Org o
inner join Hie H on H.OrgEntityID = O.ParentID
)
insert into #Hie
select * from Hie
CREATE NONCLUSTERED INDEX [IX_#Hie]
ON [#Hie] ([OrgEntityID])
include (EntityName, OrgEntityType, Level, FullyQualifiedName, FullyQualifiedLevel)
GO
I am going to attach a the data in an Excel file shortly.
Thank you.

Viewing 15 posts - 1 through 15 (of 64 total)

You must be logged in to reply to this topic. Login to reply

kazim.raza SSCrazy Points: 2390 More actions · Answer 1

Select * From #Hie

this select is the one that has problems. exact count of records after inserting in #Hie is 161343. My data export to Excel failed after exporting 65000 records hopefully it should give you gurus a fair idea of the dataset.

Sriram-288748 SSCrazy Points: 2874 More actions · Answer 2

Couple of things.

Get rid of the primary org temprary table if you can.

Remove the nvarchar max if you do not need it

For CTE, to handle millions of records is tough....check its execution plan, the number of read writes that happen while that query run, and you will see it by yourself.

and I suppose you are using, Excel 2003 so the transfer failed.

Sriram

kazim.raza SSCrazy Points: 2390 More actions · Answer 3

Thank you for your reply. The CTE itself gets executed within a minute, actually within 40 seconds; its the last select on #Hie that takes time actually a lot of time. it spits out like 4000 records in a minute out of 160K records.

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 4

kazim.raza (5/8/2010)
Select * From #Hie
this select is the one that has problems. exact count of records after inserting in #Hie is 161343. My data export to Excel failed after exporting 65000 records hopefully it should give you gurus a fair idea of the dataset.

First of all, why the heck can't you use a plain ol' zip file. I won't even try to open a rar file.

Second, instead of putting the data in Excel, save it as a text file, zip it (don't rar it), and attach it.

Third, I agree with Sriram... VARCHAR(MAX) will instantly double the time this code takes to run compared to VARCHAR(8000).

Fourth... you're moving too much data. You don't need to copy the EntityName or the fully qualified name... get that information after you've done a lookup in the Hierarchy... it will be MUCH cheaper and much faster.

Fifth... do a SELECT/INTO to build the temp table instead of building it ahead of time. It will be much faster because it won't log each row.

Sixth... of course the final SELECT is going to be slow... you're returning 1.6 million rows to the screen. The question would be... WHY? Who the heck is going to read 1.6 million bloody rows?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

kazim.raza SSCrazy Points: 2390 More actions · Answer 5

This is why I am concerned about the last select, consider this

CTE gives me the data in this form

ID Name Level Parent FullyQualifiedName FullyQualifiedLevel

FullyQualifedName will have data like A.B.C.D.E (hierarchy bottom to top based on name of Entity)

FullyQualifiedLevel will have data like 1.2.3.4.5 (same as above but in terms of level)

I want the final result set as follows:

Columns >> 1 2 3 4 5

data in rows>> A B C D E

In other words I am going to pivot it on Level but select name of the entity.

Please suggest.

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 6

kazim.raza (5/8/2010)
In other words I am going to pivot it on Level but select name of the entity.

Hmmmm.... "pivot it on Level but select name of entity". Is this, by any chance, for a Multi-Level Marketing Company and you're trying to figure out payouts for each node say... oh... seven or so levels deep?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

kazim.raza SSCrazy Points: 2390 More actions · Answer 7

Nope it is not for MLM. It is a part of a system that will allow organizations to manage different hierarchal structures within their organizations. Any suggestions how to break it into Level names and store entity name in it?

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 8

Heh... Pity. I've got a million node solution for the MLM folks that absolutely rocks. Same goes for the creation of a million node "Nested Set" for "ad hoc" queries.

What you've already done is on the correct path (literally). The problem is that I'm just not sure what you mean by "break it into Level names and store entity name in it?" especially since the "names" appear to be numbers. How does that relate to the columns in the data you provided? A small example with, say, 14 or so nodes in it would be very helpful for discussion purposes. A picture of those 14 nodes using MS Org Charts from MS Word would be even better.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

kazim.raza SSCrazy Points: 2390 More actions · Answer 9

Jeff, assume those numbers are name; you may replace them with some alphabets. I have used the numbers instead of the original data to avoid propriety issues; and probably your solution for MLM may do the trick for me so please elaborate 🙂

Here's what I want to do:

Consider this hierarchy:

5.4.3.2.1 (level) in FullyQualifiedLevel

1 reports to 2, 2 reports 3 and so forth; this would translate into

F.E.D.C.B.A (personnel name) in FullyQualifiedName

The destination table has one column for each level Level1, Level2 ... Level 5, therefore data from FullyQualifiedName would be split into those columns in this way:

Level1 Level2 Level3 Level4 Level5

A B C D E

The tricky part is that not always I'll get the values for all levels for instance, for a General Manager who reports to CEO, I may get just 5.4 and for the CEO just 5, which means it will go in columns Level5 and Level4 and rest of the columns will be marked null. I am lost in the midway as in how to execute a select on #HIE to get these results. I know I'll have to do a few CharIndex and substrings but for some odd reason; I just can't think of it, probably getting too old for these problems now 🙂

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 10

kazim.raza (5/9/2010)
Jeff, assume those numbers are name; you may replace them with some alphabets. I have used the numbers instead of the original data to avoid propriety issues; and probably your solution for MLM may do the trick for me so please elaborate 🙂
Here's what I want to do:
Consider this hierarchy:
5.4.3.2.1 (level) in FullyQualifiedLevel
1 reports to 2, 2 reports 3 and so forth; this would translate into
F.E.D.C.B.A (personnel name) in FullyQualifiedName
The destination table has one column for each level Level1, Level2 ... Level 5, therefore data from FullyQualifiedName would be split into those columns in this way:
Level1 Level2 Level3 Level4 Level5
A B C D E
The tricky part is that not always I'll get the values for all levels for instance, for a General Manager who reports to CEO, I may get just 5.4 and for the CEO just 5, which means it will go in columns Level5 and Level4 and rest of the columns will be marked null. I am lost in the midway as in how to execute a select on #HIE to get these results. I know I'll have to do a few CharIndex and substrings but for some odd reason; I just can't think of it, probably getting too old for these problems now 🙂

I get that. My problem is that I don't know why you want the stuff in separate horizontal columns. Seems like a bit of denormalization that you can certainly do without especially since the MAX fanout for 161k rows can be 17 or more depending on the positions. It's typically a whole lot easier for people to understand where hierarchies are listed vertically than horizontally.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

kazim.raza SSCrazy Points: 2390 More actions · Answer 11

This query is part of an ETL process where I am extracting the data out of the transactional database and inserting them into the data mart where our architect has kept them in separate columns.

Can you suggest me a way how I can split them across separate columns? And I didn't get you about max fanout 17?

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 12

kazim.raza (5/9/2010)
This query is part of an ETL process where I am extracting the data out of the transactional database and inserting them into the data mart where our architect has kept them in separate columns.
Can you suggest me a way how I can split them across separate columns? And I didn't get you about max fanout 17?

Heh... you're taking data from an "Adjacency List" and converting it into a "Hierarchical Path" and then want to split the Hierarchical Path into separate columns as part of an ETL process for a datamart??? I'm pretty sure that your "architect" and I just wouldn't get along because, right now and without more information, I think your "architect" is, ummmm... I'll just leave it at us not seeing eye-to-eye about things like unnecessary duplication of data.

What does the target table look like? It would explain a lot if you could post the CREATE statement for the target table. And, yeah... we can do this as crazy as it sounds to me.

"Fan Out" is what a Hierarchy does. Each node usually is only connected to a single superior but can have many direct subordinates. The number of subordinates for each node is oft times referred to as the "Fan Out".

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

kazim.raza SSCrazy Points: 2390 More actions · Answer 13

🙂

Here's the schema of the target table

Key Bigint

EntityID Bigint

Level1 nvarchar(500)

Level2 nvarchar(500)

Level3 nvarchar(500)

Level4 nvarchar(500)

Level5 nvarchar(500)

this table is then used by SSAS cube to perform different aggregations etc. but thats beyond the scope of this trail. I have been hearing a lot about Many to many, different analysis etc. but will take on it after a few days.

I want to get over it right now as I have a client deployment coming up in a couple days and I want to solve this issue. Could you please suggest something? I am considering to loop through all the rows in #Hie and split them into separate columns, could you suggest something better, at least for now?

Thanks,

Kazim Raza

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 14

kazim.raza (5/9/2010)
I am considering to loop through all the rows...

Heh... now them's fightin' words, Kazim. 😛

Not to worry... this is easy now that I know what needs to be done. I'll be back soon. Any chance of you attaching all 161 K rows for a final test when I'm ready?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)