Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

Finding duplicate values in very large table. Select Row_number() over too slow Expand / Collapse
Author
Message
Posted Wednesday, February 23, 2011 12:30 PM
Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Thursday, May 16, 2013 11:47 AM
Points: 564, Visits: 1,454
This query finds duplicate values and inserts the results into another table using the basic structure shown below. This table will grow from 35 million to probably 65 million records over the next year. It's causing a spike in tempdb up to 14,000 milliseconds wait per IO for about an hour each night. We need a better approach to this.

insert into .......

SELECT ROW_NUMBER
() OVER (
ORDER BY ....( orders by two varchar data type fields )
......
FROM SUMMARY_TABLE AS Summary02
CROSS JOIN
SUMMARY_TABLE AS Summary13
WHERE





Post #1068472
Posted Wednesday, February 23, 2011 12:42 PM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Tuesday, May 07, 2013 12:40 PM
Points: 176, Visits: 237
Couldn't you just do a COUNT/HAVING approach?

Select Field1, Field2, COUNT(*) AS Records
FROM SUMMARY_TABLE
GROUP BY Field1, Field2
HAVING COUNT(*) > 1

That will identify the multiple occurrences and from there you can resolve as needed.


--------------------------------------------------------------------------
When you realize you've dug yourself into a hole....Step 1...stop digging.
Post #1068484
Posted Wednesday, February 23, 2011 1:11 PM
Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Thursday, May 16, 2013 11:47 AM
Points: 564, Visits: 1,454
I tested yours with our info and it ran in 1.5 minutes instead of 17 on a test box. But I'll have to see what the developers are trying to accomplish. Since most sql is object-generated I usually just pass on the culprits as a bug via our ticketing system.


Post #1068506
Posted Wednesday, February 23, 2011 1:13 PM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Tuesday, May 07, 2013 12:40 PM
Points: 176, Visits: 237
Well let's hope it helps. Just based on the numbers you replied back with they should at least look at other "options".

--------------------------------------------------------------------------
When you realize you've dug yourself into a hole....Step 1...stop digging.
Post #1068507
Posted Wednesday, February 23, 2011 4:29 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 12:30 PM
Points: 32,893, Visits: 26,770
I guess I'd have to ask... why do they allow the "dupes" to continue?

--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
Post #1068599
Posted Wednesday, February 23, 2011 4:32 PM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Tuesday, May 07, 2013 12:40 PM
Points: 176, Visits: 237
I'm with you on that one. I didn't want to clutter up the quick answer, but that is a great question. I think i'm afraid to hear the answer though.

--------------------------------------------------------------------------
When you realize you've dug yourself into a hole....Step 1...stop digging.
Post #1068600
Posted Wednesday, February 23, 2011 5:44 PM
Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Thursday, May 16, 2013 11:47 AM
Points: 564, Visits: 1,454
I'm not sure. These are car loans so it might be where a vin number exists twice, but one record is a paid off loan still in the database and the other is the same vin ( vehicle identification number ) showing on a newer loan.
I'm more familiar with our legacy system where a vin can exist twice but anything beyond the primary active record has a special status indicating it is in some sort of an error condition. In other words, there is a unique index on vin excepting that error status flag. Files from clients ( banks ) and state motor vehicle offices can arrive with a vin that already exists. Simply not allowing these files to load where the vin was a duplicate would create it's own problems.






Post #1068610
Posted Wednesday, February 23, 2011 5:48 PM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Tuesday, May 07, 2013 12:40 PM
Points: 176, Visits: 237
Sounds like historical records without dates or status flags :)

Anyway, I hope the new version works well for you. A non-clustered index on those two fields should dramatically speed up the query as well (however I don't know how adversely it would impact your INSERT/UPDATE/DELETES). Definitely test it if you deside to create an index.


--------------------------------------------------------------------------
When you realize you've dug yourself into a hole....Step 1...stop digging.
Post #1068613
Posted Wednesday, February 23, 2011 5:53 PM
Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Thursday, May 16, 2013 11:47 AM
Points: 564, Visits: 1,454
It will be up to the developers to decide if they're going to revise anything. Unless Systems decides to "insist" I've done a lot of googling on row_count and have yet to see anyone else using it with a cross join on a huge table with a copy of itself.




Post #1068614
Posted Wednesday, February 23, 2011 5:55 PM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Tuesday, May 07, 2013 12:40 PM
Points: 176, Visits: 237
Let us know how it goes!

--------------------------------------------------------------------------
When you realize you've dug yourself into a hole....Step 1...stop digging.
Post #1068615
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse