|
|
|
Mr or Mrs. 500
      
Group: General Forum Members
Last Login: Thursday, May 16, 2013 11:47 AM
Points: 564,
Visits: 1,454
|
|
This query finds duplicate values and inserts the results into another table using the basic structure shown below. This table will grow from 35 million to probably 65 million records over the next year. It's causing a spike in tempdb up to 14,000 milliseconds wait per IO for about an hour each night. We need a better approach to this.
insert into .......
SELECT ROW_NUMBER () OVER ( ORDER BY ....( orders by two varchar data type fields ) ...... FROM SUMMARY_TABLE AS Summary02 CROSS JOIN SUMMARY_TABLE AS Summary13 WHERE
|
|
|
|
|
SSC-Enthusiastic
      
Group: General Forum Members
Last Login: Tuesday, May 07, 2013 12:40 PM
Points: 176,
Visits: 237
|
|
Couldn't you just do a COUNT/HAVING approach?
Select Field1, Field2, COUNT(*) AS Records FROM SUMMARY_TABLE GROUP BY Field1, Field2 HAVING COUNT(*) > 1
That will identify the multiple occurrences and from there you can resolve as needed.
-------------------------------------------------------------------------- When you realize you've dug yourself into a hole....Step 1...stop digging.
|
|
|
|
|
Mr or Mrs. 500
      
Group: General Forum Members
Last Login: Thursday, May 16, 2013 11:47 AM
Points: 564,
Visits: 1,454
|
|
I tested yours with our info and it ran in 1.5 minutes instead of 17 on a test box. But I'll have to see what the developers are trying to accomplish. Since most sql is object-generated I usually just pass on the culprits as a bug via our ticketing system.
|
|
|
|
|
SSC-Enthusiastic
      
Group: General Forum Members
Last Login: Tuesday, May 07, 2013 12:40 PM
Points: 176,
Visits: 237
|
|
Well let's hope it helps. Just based on the numbers you replied back with they should at least look at other "options".
-------------------------------------------------------------------------- When you realize you've dug yourself into a hole....Step 1...stop digging.
|
|
|
|
|
SSC-Dedicated
           
Group: General Forum Members
Last Login: Today @ 12:30 PM
Points: 32,893,
Visits: 26,770
|
|
|
|
|
|
SSC-Enthusiastic
      
Group: General Forum Members
Last Login: Tuesday, May 07, 2013 12:40 PM
Points: 176,
Visits: 237
|
|
I'm with you on that one. I didn't want to clutter up the quick answer, but that is a great question. I think i'm afraid to hear the answer though.
-------------------------------------------------------------------------- When you realize you've dug yourself into a hole....Step 1...stop digging.
|
|
|
|
|
Mr or Mrs. 500
      
Group: General Forum Members
Last Login: Thursday, May 16, 2013 11:47 AM
Points: 564,
Visits: 1,454
|
|
I'm not sure. These are car loans so it might be where a vin number exists twice, but one record is a paid off loan still in the database and the other is the same vin ( vehicle identification number ) showing on a newer loan. I'm more familiar with our legacy system where a vin can exist twice but anything beyond the primary active record has a special status indicating it is in some sort of an error condition. In other words, there is a unique index on vin excepting that error status flag. Files from clients ( banks ) and state motor vehicle offices can arrive with a vin that already exists. Simply not allowing these files to load where the vin was a duplicate would create it's own problems.
|
|
|
|
|
SSC-Enthusiastic
      
Group: General Forum Members
Last Login: Tuesday, May 07, 2013 12:40 PM
Points: 176,
Visits: 237
|
|
Sounds like historical records without dates or status flags :)
Anyway, I hope the new version works well for you. A non-clustered index on those two fields should dramatically speed up the query as well (however I don't know how adversely it would impact your INSERT/UPDATE/DELETES). Definitely test it if you deside to create an index.
-------------------------------------------------------------------------- When you realize you've dug yourself into a hole....Step 1...stop digging.
|
|
|
|
|
Mr or Mrs. 500
      
Group: General Forum Members
Last Login: Thursday, May 16, 2013 11:47 AM
Points: 564,
Visits: 1,454
|
|
It will be up to the developers to decide if they're going to revise anything. Unless Systems decides to "insist" I've done a lot of googling on row_count and have yet to see anyone else using it with a cross join on a huge table with a copy of itself.
|
|
|
|
|
SSC-Enthusiastic
      
Group: General Forum Members
Last Login: Tuesday, May 07, 2013 12:40 PM
Points: 176,
Visits: 237
|
|
Let us know how it goes!
-------------------------------------------------------------------------- When you realize you've dug yourself into a hole....Step 1...stop digging.
|
|
|
|