RE: Need help on performance and integrity

One Orange Chip

Points: 29613

October 17, 2008 at 2:40 pm

OK. This will be attempt #1 at a set based solution. I'm not 100% sure at the speed of this, as the index does not fully cover the subquery in the second part. I'll need you to check that out and get back with us on it if possible, as it's an issue we're currently looking into. If it still runs fairly slow(in this case, > 5 minutes), I'm pretty sure we can speed it up.

The logic of this is simple. I scan through your table and break things into groups by finding changes of Code.

I then go back through, and analyze that group for any duplicates.

I take those duplicates and the first member of each group and make it the output.

If I'm wrong on any part of that logic, please correct me. (I may not answer right away, as I'll probably be out tonight, but someone else here can probably take this one from here, or I'll answer when I get back). This solution is based in part on the running total technique described here:

http://www.sqlservercentral.com/articles/Advanced+Querying/61716/ (Yeah, I know, I kinda get fixated on certain methods and use them everywhere for a while, sorry Jeff 😉 )

As part of this solution, I've added two fields to your table, GroupID and Dupe, both INT's.

[font="Courier New"]------ Declare Variables ----------------------------------

DECLARE @GroupID INT,

@PrevCode CHAR(5),

@RowID INT

SET @RowID = 0

SET @PrevCode = (SELECT Code FROM #TempTable WHERE RowID = 1)

SET @GroupID = 1

-----------------------------------------------------------

-- Breaks your data out into groups so it can be scanned --

UPDATE #TempTable

SET @GROUPID = CASE WHEN Code <> @PrevCode THEN @GroupID + 1 ELSE @GroupID END,

GroupID = @GroupID,

@PrevCode = Code

FROM #TempTable WITH (INDEX(0))

-----------------------------------------------------------

------------- Goes through your data and flags Dupes ------

UPDATE T1

SET Dupe = CASE WHEN (SELECT COUNT(*) FROM #TempTable T2

WHERE T2.GroupID = T1.GroupID AND

T2.Locale = T1.Locale AND

T2.RowKey = T1.RowKey AND

T2.RowID < T1.RowID) > 0 THEN 1 ELSE 0 END,

@RowID = RowID

FROM #Temptable T1 WITH INDEX(0)

-----------------------------------------------------------

---------- Selects The first member of each group (which is what I took as process 1) and

---------- any rows marked as duplicates and displays them

SELECT T1.RowID, T1.RowKey, T1.Locale, T1.Code

FROM #TempTable T1

LEFT JOIN #TempTable T2 ON T1.RowID = T2.RowID AND T2.Dupe = 1

LEFT JOIN (SELECT GroupID, MIN(RowID) FirstGroupRow FROM #TempTable WHERE GROUPID > 1 GROUP BY GroupID) T3 ON T1.RowID = T3.FirstGroupRow

WHERE T2.Locale IS NOT NULL OR T3.FirstGroupRow IS NOT NULL

[/font]

Seth Phelabaum

Consistency is only a virtue if you're not a screwup. 😉

Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]