Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12345»»»

Need help on performance and integrity Expand / Collapse
Author
Message
Posted Thursday, October 16, 2008 8:37 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Thursday, February 12, 2009 6:08 AM
Points: 32, Visits: 196
Hi Everyone,

I have sample code below Item#1 which will create the sample data for testing and Item #2 which is the code to get the interleaving rows and range of unique data.

Item #1: This will create a sample table and data
use tempdb;

SET NOCOUNT ON;
GO

IF OBJECT_ID ('#TempTable','U') IS NOT NULL
BEGIN
DROP TABLE #TempTable
END

CREATE TABLE #TempTable
(
RowID int PRIMARY KEY CLUSTERED,
RowKey varchar(1000),
Locale varchar(10),
Code char(1)
)
GO

INSERT #TempTable
SELECT 1,'1','en','C' UNION ALL
SELECT 2,'1','de','C' UNION ALL
SELECT 3,'5','zh','A' UNION ALL --/first(1) batch of A
SELECT 4,'4','en','A' UNION ALL --/first(1) batch of A
SELECT 5,'3','en','A' UNION ALL --/first(1) batch of A
SELECT 10,'2','zh','A' UNION ALL --/first(1) batch of A
SELECT 11,'2','zh','A' UNION ALL --/first(1) batch of A
SELECT 12,'1','zh','C' UNION ALL
SELECT 13,'1','de','C' UNION ALL
SELECT 14,'1','en','A' UNION ALL --/second(2) batch of A
SELECT 15,'3','en','A' UNION ALL --/second(2) batch of A
SELECT 16,'3','de','A' UNION ALL --/second(2) batch of A
SELECT 18,'1','en','A' UNION ALL --/second(2) batch of A
SELECT 19,'3','de','A' UNION ALL --/second(2) batch of A
SELECT 20,'1','en','C' UNION ALL
SELECT 21,'1','en','C'
GO

Item #2: This will get the interleaving rows of duplicates resulting to a unique range result.
SELECT * FROM #TempTable


DECLARE @RecordCount int;
DECLARE @MaxRowID int; -- Stores the overall maximum row id at the beginning.

-- Initialize the MaxROwID to the max row id number + 1
-- So we can set this value as the last value in the range
SET @MaxRowID = (SELECT MAX(RowID) FROM #TempTable) +1; -- Store this before we delete rows

SET @RecordCount = 0;

SELECT @RecordCount = COUNT(*) FROM #TempTable

IF @RecordCount > 0
BEGIN;
DECLARE @Results TABLE(RowID int); -- Here we assemble the range numbers (result)
DECLARE @RowID int; -- Stores the first rowid which is a duplicate
DECLARE @OptRows int; -- variable which records the optimum number of rows to scan
DECLARE @DeletedRowID int; -- Track the deleted rowids.

SET @DeletedRowID = -1
SET @OptRows = 100;

WHILE 1=1
BEGIN;
SET @RowID = (
SELECT MIN(RowID) FROM (
SELECT RowID,RowKey,
row_number() OVER (PARTITION BY RowKey,Locale ORDER BY RowID) AS DupRank
FROM (SELECT TOP (@OptRows) *
FROM #TempTable
WHERE RowID > @DeletedRowID
ORDER BY RowID ASC
) SubsetRows
) [Aggregate]
WHERE DupRank > 1
);

RAISERROR('RowID: %d, DeletedRowID: %d, OptRows: %d', 10, 1, @RowID, @DeletedRowID, @OptRows);

IF @RowID IS NULL BEGIN;
IF @OptRows >= 2*@MaxRowID BEGIN;
INSERT INTO @Results SELECT @MaxRowID;
BREAK;
END;
ELSE
BEGIN;
SET @OptRows = @OptRows + @OptRows;
CONTINUE;
END;
END;

ELSE BEGIN;
INSERT INTO @Results SELECT @RowID;

SET @DeletedRowID = @RowID - 1;

SET @OptRows = @OptRows - 1;
END;
END;
END;

SELECT * FROM @Results;
GO

My Objectives:
1.) Using Item #2, I need to process data for every batch of A (see Item #1 table data per batch), the problem with Item #2 is it will process the whole data without looking on every A batch.

2.) Aside from processing data for every batch of A, I need to make (Item #2) more faster on certain patterns of data for every batch of A. Since it is much slower when there are consecutive concentrated duplicates in the data like for example pattern (Item #3) below:

Item#3:
INSERT #TempTable
SELECT 1,'1','en','C' UNION ALL
SELECT 2,'1','de','C' UNION ALL
SELECT 3,'1','en','A' UNION ALL --/first(1) batch of A
SELECT 4,'1','en','A' UNION ALL --/first(1) batch of A
SELECT 5,'1','en','A' UNION ALL --/first(1) batch of A
SELECT 10,'1','en','A' UNION ALL --/first(1) batch of A
SELECT 11,'1','en','A' UNION ALL --/first(1) batch of A
SELECT 12,'1','zh','C' UNION ALL
SELECT 13,'1','de','C' UNION ALL
SELECT 14,'2','en','A' UNION ALL --/second(2) batch of A
SELECT 15,'2','en','A' UNION ALL --/second(2) batch of A
SELECT 16,'2','en','A' UNION ALL --/second(2) batch of A
SELECT 18,'2','en','A' UNION ALL --/second(2) batch of A
SELECT 19,'2','en','A' UNION ALL --/second(2) batch of A
SELECT 20,'1','en','C' UNION ALL
SELECT 21,'1','en','C'

Low performance numbers will be seen if tested on 750,000 records or 1 Million records using the pattern above. It will actually take about 2 hours+ by actually processing only 200k using the pattern (Item#3) above but If there aren't that much consecutive duplicates like (Item #1), It would only take 28 minutes to process the 1 million data.


Thank you for your time and help.
Post #587032
Posted Thursday, October 16, 2008 4:01 PM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Thursday, October 10, 2013 4:59 PM
Points: 79, Visits: 183
Thanks for posting the sample data and DML.

I'm not clear on what you want the output to be. If you can post that maybe we can come up with a way to do this without having to do this in a loop and instead do it set based.


Gary Johnson
Sr Database Engineer
Post #587356
Posted Thursday, October 16, 2008 10:51 PM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Thursday, February 12, 2009 6:08 AM
Points: 32, Visits: 196
Gary Johnson (10/16/2008)
Thanks for posting the sample data and DML.

I'm not clear on what you want the output to be. If you can post that maybe we can come up with a way to do this without having to do this in a loop and instead do it set based.


Thank you for your prompt response Gary. Sorry I forgot to show you the output for the following data sample.

For this data sample, Item #1:
INSERT #TempTable
SELECT 1,'1','en','C' UNION ALL
SELECT 2,'1','de','C' UNION ALL
SELECT 3,'5','zh','A' UNION ALL --/first(1) batch of A
SELECT 4,'4','en','A' UNION ALL --/first(1) batch of A
SELECT 5,'3','en','A' UNION ALL --/first(1) batch of A
SELECT 10,'2','zh','A' UNION ALL --/first(1) batch of A
SELECT 11,'2','zh','A' UNION ALL --/first(1) batch of A <- dup on RowID 10 (checked)
SELECT 12,'1','zh','C' UNION ALL
SELECT 13,'1','de','C' UNION ALL
SELECT 14,'1','en','A' UNION ALL --/second(2) batch of A
SELECT 15,'3','en','A' UNION ALL --/second(2) batch of A
SELECT 16,'3','de','A' UNION ALL --/second(2) batch of A
SELECT 18,'1','en','A' UNION ALL --/second(2) batch of A <- dup on RowID 14 (checked)
SELECT 19,'3','de','A' UNION ALL --/second(2) batch of A <- I don't need to include this even if there is dup on RowID 16 since it is already covered in the range from RowID 14-18
SELECT 20,'1','en','C' UNION ALL
SELECT 21,'1','en','C'

Final Output on Item #1 should be:
SELECT 3 as RowID, 'A' as Code UNION ALL
SELECT 11 as RowID, 'A' as Code UNION ALL
SELECT 12 as RowID, 'C' as Code UNION ALL
SELECT 14 as RowID, 'A' as Code UNION ALL
SELECT 18 as RowID, 'A' as Code UNION ALL
SELECT 20 as RowID, 'C'

To explain on how did I come up with the final output on Item #1.
First, I need to get the interleaving rows per Code so it should give me the output of:
SELECT 3 as RowID, 'A' as Code UNION ALL
SELECT 12 as RowID, 'C' as Code UNION ALL
SELECT 14 as RowID, 'A' as Code UNION ALL
SELECT 20 as RowID, 'C'

Next is I need to process the interleaving rows of duplicates per batch of A's that would result on a unique range result. So for the first batch of A, it should give me the output.
SELECT 11 as RowID, 'A'

and the second batch of A will have an output:
SELECT 18 as RowID, 'A'

So that is how I came up with the final output. I put some comments on Item #1 hoping it would help understand the process.

Thank you for your time and help.
Post #587420
Posted Friday, October 17, 2008 12:43 PM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Thursday, February 12, 2009 6:08 AM
Points: 32, Visits: 196
Hi,

I also have an attachment here named demo to help understand on what I'm about to achieve or help better support my objectives.

-dans


  Post Attachments 
Demo.GIF (36 views, 25.46 KB)
Post #587910
Posted Friday, October 17, 2008 1:53 PM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Friday, April 04, 2014 9:22 PM
Points: 1,525, Visits: 4,070
Great image. That goes a lot farther towards explaining what you are after. This seems like it'll be a very interesting query to write. I haven't the slightest idea of *why* you would want something like that, but at least now I think I grasp what you're looking for.

Seth Phelabaum
Consistency is only a virtue if you're not a screwup.

Links: How to Post Sample Data :: Running Totals :: Tally Table :: Cross Tabs/Pivots :: String Concatenation
Post #587968
Posted Friday, October 17, 2008 2:40 PM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Friday, April 04, 2014 9:22 PM
Points: 1,525, Visits: 4,070
OK. This will be attempt #1 at a set based solution. I'm not 100% sure at the speed of this, as the index does not fully cover the subquery in the second part. I'll need you to check that out and get back with us on it if possible, as it's an issue we're currently looking into. If it still runs fairly slow(in this case, > 5 minutes), I'm pretty sure we can speed it up.

The logic of this is simple. I scan through your table and break things into groups by finding changes of Code.
I then go back through, and analyze that group for any duplicates.
I take those duplicates and the first member of each group and make it the output.

If I'm wrong on any part of that logic, please correct me. (I may not answer right away, as I'll probably be out tonight, but someone else here can probably take this one from here, or I'll answer when I get back). This solution is based in part on the running total technique described here:
http://www.sqlservercentral.com/articles/Advanced+Querying/61716/ (Yeah, I know, I kinda get fixated on certain methods and use them everywhere for a while, sorry Jeff ;) )

As part of this solution, I've added two fields to your table, GroupID and Dupe, both INT's.

------ Declare Variables ----------------------------------
DECLARE @GroupID    INT,
@PrevCode      CHAR(5),
@RowID                  INT

SET
@RowID = 0
SET @PrevCode = (SELECT Code FROM #TempTable WHERE RowID = 1)
SET @GroupID = 1
-----------------------------------------------------------
-- Breaks your data out into groups so it can be scanned --
UPDATE #TempTable
SET @GROUPID = CASE WHEN Code <> @PrevCode THEN @GroupID + 1 ELSE @GroupID END,
  
GroupID = @GroupID,
  
@PrevCode = Code
FROM #TempTable WITH (INDEX(0))
-----------------------------------------------------------
------------- Goes through your data and flags Dupes ------
UPDATE T1
SET Dupe = CASE WHEN (SELECT COUNT(*) FROM #TempTable T2
                      
WHERE   T2.GroupID = T1.GroupID AND
                              
T2.Locale = T1.Locale AND
                              
T2.RowKey = T1.RowKey AND
                              
T2.RowID < T1.RowID) > 0 THEN 1 ELSE 0 END,
  
@RowID = RowID                    
FROM #Temptable T1 WITH INDEX(0)
-----------------------------------------------------------
---------- Selects The first member of each group (which is what I took as process 1) and
---------- any rows marked as duplicates and displays them

SELECT T1.RowID, T1.RowKey, T1.Locale, T1.Code
FROM #TempTable T1
      
LEFT JOIN #TempTable T2 ON T1.RowID = T2.RowID AND T2.Dupe = 1
      
LEFT JOIN (SELECT GroupID, MIN(RowID) FirstGroupRow FROM #TempTable WHERE GROUPID > 1 GROUP BY GroupID) T3 ON T1.RowID = T3.FirstGroupRow
WHERE T2.Locale IS NOT NULL OR T3.FirstGroupRow IS NOT NULL




Seth Phelabaum
Consistency is only a virtue if you're not a screwup.

Links: How to Post Sample Data :: Running Totals :: Tally Table :: Cross Tabs/Pivots :: String Concatenation
Post #588006
Posted Friday, October 17, 2008 5:06 PM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Monday, April 14, 2014 7:45 AM
Points: 3,949, Visits: 5,944
Here's a set based solution using CTEs. Please let us see the execution plans and performance statistics when you have tested them. Best of luck.

Bob

------------------------------------------------
-- CTE Solution
------------------------------------------------
;with cte1 as -- identify breaks
(select code,rowid
from #temptable t1
where t1.code <> (select top 1 code
from #temptable t2
where t2.rowID < t1.rowID
order by t2.rowID desc)
)


,cte2 as -- identify ranges for 'A' codes only
(select c1.code,c1.rowid as minRow,min(c1a.rowID) as maxRow
from cte1 c1
join cte1 c1a on c1a.code <> c1.code and c1a.rowID > c1.rowID
where c1.code = 'A'
group by c1.code,c1.rowID
)

,cte3 as -- identify duplicates in 'A' ranges
(select c.minrow,max(rowid) as dupID
from #temptable t
cross join cte2 c
where t.rowid >=minRow and t.rowID < maxRow
group by c.minRow,t.rowKey,t.locale,c.code
having count(*) > 1
)

,cte4 as -- just get the *first* duplicates
(select min(dupID) as rowID
from cte3
group by minRow
)

,cte5 as -- put the breaks and the first "A" dups together
(select rowID from cte1
union all
select rowID from cte4)


select t.*
from cte5 c
join #temptable t on t.rowID = c.rowID
order by c.rowID


__________________________________________________

Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? -- Stephen Stills
Post #588041
Posted Friday, October 17, 2008 9:17 PM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Thursday, February 12, 2009 6:08 AM
Points: 32, Visits: 196
Garadin (10/17/2008)
OK. This will be attempt #1 at a set based solution. I'm not 100% sure at the speed of this, as the index does not fully cover the subquery in the second part. I'll need you to check that out and get back with us on it if possible, as it's an issue we're currently looking into. If it still runs fairly slow(in this case, > 5 minutes), I'm pretty sure we can speed it up....


I can't seem to test your solution since it gave me this error...

Msg 1018, Level 15, State 1, Line 28
Incorrect syntax near 'INDEX'. If this is intended as a part of a table hint, A WITH keyword and parenthesis are now required. See SQL Server Books Online for proper syntax.

Any idea? I'm sure i am missing something here..
Post #588062
Posted Friday, October 17, 2008 9:25 PM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Thursday, February 12, 2009 6:08 AM
Points: 32, Visits: 196
bhovious (10/17/2008)
Here's a set based solution using CTEs. Please let us see the execution plans and performance statistics when you have tested them. Best of luck.

Bob


I tested your solution for data integrity but it seems it is giving me the wrong output on some other patterns. It is correct on Item #1 data that I supplied but for some other patterns it isn't, there is probably a problem with the logic. The image that I attached on my previous reply would explain how to get the result, I also tried the sample data on that image presentation (attached) and the output is lacking using your suggested solution .

Thank you so much for the help.

Here are some other patterns that might help on testing data integrity, logic in getting the output is still the same as stated in the attached image on my previous post.

Pattern #1
INSERT #TempTable
SELECT 1,'1','en','C' UNION ALL
SELECT 2,'1','de','C' UNION ALL
SELECT 3,'1','zh','A' UNION ALL
SELECT 4,'2','en','A' UNION ALL
SELECT 5,'2','en','A' UNION ALL
SELECT 10,'1','zh','A' UNION ALL
SELECT 11,'1','zh','A' UNION ALL
SELECT 15,'1','zh','C' UNION ALL
SELECT 16,'1','de','C' UNION ALL
SELECT 20,'1','en','A' UNION ALL
SELECT 21,'3','en','A' UNION ALL
SELECT 22,'3','de','A' UNION ALL
SELECT 23,'3','en','A' UNION ALL
SELECT 24,'3','de','A' UNION ALL
SELECT 30,'1','en','C' UNION ALL
SELECT 31,'1','en','C'

select * from #temptable

--//Output
SELECT 3 as RowID,'1' as RowKey,'zh' as Locale,'A' as [Action] UNION ALL
SELECT 5,'2','en','A' UNION ALL
SELECT 11,'1','zh','A' UNION ALL
SELECT 15,'1','zh','C' UNION ALL
SELECT 20,'1','en','A' UNION ALL
SELECT 23,'3','en','A' UNION ALL
SELECT 30,'1','en','C'

Pattern #2
INSERT #TempTable
SELECT 1,'1','en','D' UNION ALL
SELECT 2,'1','de','C' UNION ALL
SELECT 3,'1','zh','A' UNION ALL
SELECT 4,'1','zh','A' UNION ALL
SELECT 5,'1','zh','A' UNION ALL
SELECT 10,'1','zh','A' UNION ALL
SELECT 11,'1','zh','C' UNION ALL
SELECT 15,'1','zh','C' UNION ALL
SELECT 16,'1','de','C' UNION ALL
SELECT 20,'3','en','A' UNION ALL
SELECT 21,'3','de','A' UNION ALL
SELECT 22,'3','de','A' UNION ALL
SELECT 23,'3','en','A' UNION ALL
SELECT 24,'3','de','D' UNION ALL
SELECT 30,'1','en','C' UNION ALL
SELECT 31,'1','en','C'

select * from #temptable

--//Output
SELECT 2 as RowID,'1' as RowKey,'de' as Locale,'C' as [Action] UNION ALL
SELECT 3,'1','zh','A' UNION ALL
SELECT 4,'1','zh','A' UNION ALL
SELECT 5,'1','zh','A' UNION ALL
SELECT 10,'1','zh','A' UNION ALL
SELECT 11,'1','zh','C' UNION ALL
SELECT 20,'3','en','A' UNION ALL
SELECT 22,'3','de','A' UNION ALL
SELECT 24,'3','de','D' UNION ALL
SELECT 30,'1','en','C'

Pattern #3
INSERT #TempTable
SELECT 1,'5','en','D' UNION ALL
SELECT 2,'5','de','C' UNION ALL
SELECT 3,'4','zh','C' UNION ALL
SELECT 5,'5','zh','A' UNION ALL
SELECT 6,'4','zh','A' UNION ALL
SELECT 7,'3','zh','A' UNION ALL
SELECT 11,'2','zh','A' UNION ALL
SELECT 15,'1','zh','A' UNION ALL
SELECT 16,'1','zh','A' UNION ALL
SELECT 20,'2','en','C' UNION ALL
SELECT 21,'3','de','C' UNION ALL
SELECT 22,'4','de','A' UNION ALL
SELECT 23,'2','en','A' UNION ALL
SELECT 24,'4','de','A' UNION ALL
SELECT 30,'1','en','C' UNION ALL
SELECT 31,'1','en','C'

select * from #temptable

--//Output
SELECT 2 as RowID,'5' as RowKey,'de' as Locale,'C' as [Action] UNION ALL
SELECT 5,'5','zh','A' UNION ALL
SELECT 16,'1','zh','A' UNION ALL
SELECT 20,'2','en','C' UNION ALL
SELECT 22,'4','de','A' UNION ALL
SELECT 24,'4','de','A' UNION ALL
SELECT 30,'1','en','C'

Post #588063
Posted Friday, October 17, 2008 10:39 PM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Monday, April 14, 2014 7:45 AM
Points: 3,949, Visits: 5,944
Using the data at the bottom of this message, the CTE based code produces the following:
------------------------------------------------------------------------------------------------
3 5 zh A
11 2 zh A
12 1 zh C
14 1 en A
18 1 en A
20 1 en C
------------------------------------------------------------------------------------------------
This corresponds to your expected output as specified in your second post of this thread. All you have to do is remove rowKey and locale from the final select statement.

SELECT 3 as RowID, 'A' as Code UNION ALL
SELECT 11 as RowID, 'A' as Code UNION ALL
SELECT 12 as RowID, 'C' as Code UNION ALL
SELECT 14 as RowID, 'A' as Code UNION ALL
SELECT 18 as RowID, 'A' as Code UNION ALL
SELECT 20 as RowID, 'C'


I have tested it against one of the patterns you just sent and found a discrepancy there which can be fixed, but first let me ask: Will you ever have THREE matching rows in a series of A's like this:

11 2 zh A
12 2 zh A
13 2 zh A

If so, which rowID should be returned, 12 or 13 ??



-----------------------------------------------------------------------
rowID rowKey locale code
1 1 en C
2 1 de C
3 5 zh A
4 4 en A
5 3 en A
10 2 zh A
11 2 zh A
12 1 zh C
13 1 de C
14 1 en A
15 3 en A
16 3 de A
18 1 en A
19 3 de A
20 1 en C
21 1 en C


__________________________________________________

Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? -- Stephen Stills
Post #588072
« Prev Topic | Next Topic »

Add to briefcase 12345»»»

Permissions Expand / Collapse