January 23, 2004 at 10:09 am
SCENARIO: I have a table which will potentially have millions of rows. At the moment it has 624560 rows. Table format is:
CREATE TABLE NRGroups (ID_ int primary key, PID int, Grp int)
I am trying to process the rows in form of groups, e.g.
ID_ | PID | Grp
1 - 0
4 9 0
3 6 0
7 5 0
9 7 0
6 - 0
5 1 0
2 - 0
1, 5, 7, 9, 4 belong to one group, 6, 3 belong to next group, 2 is the only member in its group.
i.e.
ID_ | PID | Grp
1 - 1
4 9 1
3 6 2
7 5 1
9 7 1
6 - 2
5 1 1
2 - 3
ps: Not every row has a parent/child
I have written SQL Server code to organise the groups, with one nested loop inside - the nested loop finds the group and the outer loop gets the next record to find the group for.
PROBELM: When I limit the loop to process 10000s rows it sooner or later processes them but when I get it to process the lot i.e. 624560 rows it seems to take over an hour to put them into groups!! Is there anyway to speed this process??
CODE:
DECLARE @iReturnCode int,
@iNextRowId int,
@iCurrentRowId int,
@iLoopControl int,
@count int,
@grp_num int,
@current int,
@lvl tinyint
SELECT @iNextRowId = MIN(ID_) FROM NRGroups
SELECT @iLoopControl = 1
SELECT @grp_num = 0
WHILE @iLoopControl = 1 -- OUTER WHILE
BEGIN
------------------FIND all group and UPDATE Grp field of NRGroups table----------------------------
DECLARE @stack TABLE (item INT, lvl TINYINT)
INSERT INTO @stack VALUES (@iNextRowId, 1)
SELECT @lvl = 1
SELECT @grp_num = @grp_num + 1
WHILE @lvl > 0 -- INNER WHILE
BEGIN
IF EXISTS (SELECT * FROM @stack WHERE lvl = @lvl)
BEGIN
SELECT @current = item FROM @stack WHERE lvl = @lvl
UPDATE NRGroups set Grp = @grp_num where id_ = @current
COMMIT
DELETE FROM @stack WHERE lvl = @lvl AND item = @current
INSERT @stack SELECT ID_, @lvl + 1 FROM NRGroups WHERE PID = @current
IF @@ROWCOUNT > 0
SELECT @lvl = @lvl + 1
END
ELSE
SELECT @lvl = @lvl - 1
END -- INNER WHILE
--------------------end of FIND-------------------------------------------------------------------
-- Reset looping variable
SELECT @iCurrentRowId = @iNextRowId
SELECT @iNextRowId = NULL
-- get the next Row_Id
SELECT @iNextRowId = MIN(ID_) FROM NRGroups WHERE ID_ > @iCurrentRowId and Grp = 0
-- did we get a valid next row id?
IF ISNULL(@iNextRowId, 0) = 0
BREAK
--if @iNextRowId > 1000 break -- DEBUG CODE
END-- OUTER WHILE
COMMIT
January 23, 2004 at 1:32 pm
Now you know why I cringe everytime one of my developers wants to do this! If you put a limit on how many levels deep you will support then you can do this in a set based way... IE: Right now you go to 4 levels deep in your data so you should be able to do a set based self join 4 levels deep to get the lowest levels group. Then do it for 3 levels and 2 levels. Over all it will be much faster than the loop you have above. Only problem is now you are limited to only 4 levels. Another option might be to create a user defined function that calls itself to get the group. Again I think you will find that the performance will be hurt.
Gary Johnson
Microsoft Natural Language Group
DBA, Sr. DB Engineer
This posting is provided "AS IS" with no warranties, and confers no rights. The opinions expressed in this post are my own and may not reflect that of my employer.
January 23, 2004 at 2:11 pm
Here is an example of what I mean.
SET NOCOUNT ON
IF EXISTS(SELECT * FROM sysobjects WHERE id = object_id('NRGroups'))
DROP TABLE NRGroups
CREATE TABLE NRGroups
(ID_ int primary key, PID int, Grp int)
INSERT INTO NRGroups(ID_,PID,Grp)
VALUES( 1,NULL,0)
INSERT INTO NRGroups(ID_,PID,Grp)
VALUES( 4,9,0)
INSERT INTO NRGroups(ID_,PID,Grp)
VALUES( 3,6,0)
INSERT INTO NRGroups(ID_,PID,Grp)
VALUES( 7,5,0)
INSERT INTO NRGroups(ID_,PID,Grp)
VALUES( 9,7,0)
INSERT INTO NRGroups(ID_,PID,Grp)
VALUES( 6,NULL,0)
INSERT INTO NRGroups(ID_,PID,Grp)
VALUES( 5,1,0)
INSERT INTO NRGroups(ID_,PID,Grp)
VALUES( 2,NULL,0)
-- Create a temp table to get GroupIDs
IF EXISTS(SELECT * FROM tempdb..sysobjects WHERE id = object_id('tempdb..#Grps'))
DROP TABLE #Grps
CREATE TABLE #Grps
(
ID_ int
,GRPID int identity(1,1)
)
INSERT INTO #Grps (ID_)
SELECT ID_
FROM NRGroups NRG
WHERE NRG.PID IS NULL
-- Update Grp for Top Levels to the grpid from #Grps.
UPDATE NRG
SET Grp = g.GRPID
FROM NRGroups NRG
JOIN #Grps g ON NRG.ID_ = g.ID_
WHERE NRG.PID IS NULL
-- SELECT * FROM NRGroups
-- Update Grp for 5 levels deep.
UPDATE N5
SET Grp = NRG.Grp
--SELECT *
FROM NRGroups NRG
JOIN NRGroups N2 ON NRG.ID_ = N2.PID AND NRG.PID IS NULL
JOIN NRGroups N3 ON N2.ID_ = N3.PID
JOIN NRGroups N4 ON N3.ID_ = N4.PID
JOIN NRGroups N5 ON N4.ID_ = N5.PID
WHERE N5.Grp = 0
-- SELECT * FROM NRGroups
-- Update Grp for 4 levels deep.
UPDATE N4
SET Grp = NRG.Grp
--SELECT *
FROM NRGroups NRG
JOIN NRGroups N2 ON NRG.ID_ = N2.PID AND NRG.PID IS NULL
JOIN NRGroups N3 ON N2.ID_ = N3.PID
JOIN NRGroups N4 ON N3.ID_ = N4.PID
WHERE N4.Grp = 0
-- SELECT * FROM NRGroups
-- Update Grp for 3 levels deep.
UPDATE N3
SET Grp = NRG.Grp
FROM NRGroups NRG
JOIN NRGroups N2 ON NRG.ID_ = N2.PID AND NRG.PID IS NULL
JOIN NRGroups N3 ON N2.ID_ = N3.PID
WHERE N3.Grp = 0
-- SELECT * FROM NRGroups
-- Update Grp for 2 levels deep.
UPDATE N2
SET Grp = NRG.Grp
FROM NRGroups NRG
JOIN NRGroups N2 ON NRG.ID_ = N2.PID AND NRG.PID IS NULL
WHERE N2.Grp = 0
SELECT * FROM NRGroups
Gary Johnson
Microsoft Natural Language Group
DBA, Sr. DB Engineer
This posting is provided "AS IS" with no warranties, and confers no rights. The opinions expressed in this post are my own and may not reflect that of my employer.
January 23, 2004 at 2:15 pm
While Gary was posting....
-- Example (No "level" limit) possibility
-- May want to modify for performance & other reasons, but I hope the "jist" gets across
-- Build test data
If object_ID('NRGroupsTest') is not NULL Drop Table NRGroupsTest
CREATE TABLE NRGroupsTest (ID_ int primary key, PID int, Grp int)
insert Into NRGroupsTest ( ID_, PID, Grp)
Select 1, NULL, 0 UNION ALL
Select 4, 9, 0 UNION ALL
Select 3, 6, 0 UNION ALL
Select 7, 5, 0 UNION ALL
Select 9, 7, 0 UNION ALL
Select 6, NULL, 0 UNION ALL
Select 5, 1, 0 UNION ALL
Select 2, NULL, 0
-- Clear #Temp Table
If object_ID('Tempdb..#Temp') is not NULL Drop Table #Temp
-- looping for each level
Declare @DynSQL varchar(8000), @LVL Int, @Cnt Int
Select @Cnt = 1
While @Cnt > 0 Begin
if @LVL Is Null Begin -- 1st time, root grp
-- Make #Temp Table with "root" level records
Select NRG.ID_ as ID, @LVL as LVL, identity(int, 1,1) Grp
Into #Temp
From NRGroupsTest NRG
Where PID is NULL
End Else Begin
-- For Each Level add IDs to #Temp table
Set Identity_Insert #Temp On -- so that data can be put into Grp Col
Insert Into #Temp (ID, LVL, Grp)
Select NRG.ID_, @LVL, T.Grp
From NRGroupsTest NRG
Join #Temp T
On T.ID = NRG.PID and isNull(LVL, 0) = @LVL -1
End
-- If above did any work, @Cnt will be > 0, and loop continues
Select @LVL = IsNull(@LVL, 0) + 1, @Cnt = IsNull(@LVL * 0 + @@rowcount, 1)
Set Identity_Insert #Temp Off
End
-- for show purpose only
select * from #Temp order by grp, ID
/*
-- Then
-- Maybe
Create Clustered Index #Temp on #Temp (ID, Grg)
-- Update NRGroupsTest
UPDATE NGR Set NGR.Grp = T.Grp
From NRGroupsTest NGR
join #Temp T On NGR.ID_ = T.ID
*/
-- Cleanup
If object_ID('Tempdb..#Temp') is not NULL Drop Table #Temp
Once you understand the BITs, all the pieces come together
January 23, 2004 at 2:31 pm
Very nice Thomas! I still think I would limit the depth level. Leaving it open will ultimately leave you in a bit of a performance and maintenance bind.
Gary Johnson
Microsoft Natural Language Group
DBA, Sr. DB Engineer
This posting is provided "AS IS" with no warranties, and confers no rights. The opinions expressed in this post are my own and may not reflect that of my employer.
January 23, 2004 at 2:39 pm
Agree Gary, it can be simply limited with SELECT ... WHERE LVL < n in the ELSE portion.
I was also thinking, maybe not use #Temp for all recs, but just to establish the "root" level GRP recs, Update base with those values, then UPDATE ... "Self Join" in the ELSE portion. It may run alot faster since the indecies on the base table can used / configured.
Once you understand the BITs, all the pieces come together
January 26, 2004 at 9:56 am
Thankyou Gary and Thomas - I really appreciate you guys for taking the time and effort for assisting.
I haven't tested Gary's solution but I have tested Thomas' piece of code on 100000s rows of data and it is mega fast - that's without applying the changes (which Thomas has mentioned in his last message).
Cheers
Viewing 7 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply