Identifying duplicate rows.. query taking too long

Question

Post reply

Identifying duplicate rows.. query taking too long

taffy

Ten Centuries

Points: 1067
More actions
July 3, 2003 at 3:22 am

#81935

Hi
I am having some problems with a query I am running, and I know it is probably due my poor query writing skills!! It's essentially a deduplication problem
I have a table of credit card transactions and each entry has a credit card number, card holder name etc recorded against it. This database is inherited and a lot of the fields have not been filled in. As a lot of the credit card numbers are repeated, I want to split them out, and hold the 'best' overall details once, to link to all the transactions. Some credit card numbers have names against them and some don't, and I would like to see only the credit card numbers entries which are repeated with a different 'name' value and not the others. I don't just want to delete them from the database I want to be able to compare all the fields to decide which combination I want.
This is the query I used:
select f.credcardno, f.ccname from V2.dbo.findets f
INNER JOIN V2.dbo.findets fo ON fo.credcardno=f.credcardno
where len(f.credcardno)>0 and fo.ccname<>f.ccname
order by f.credcardno
I have used a self-join to get the details out and it runs fine, but I am not getting distinct values. When I try to do this using distinct or 'group by' the query is pulling the right data but taking a really long time to run. (over 40 minutes on 354,000 records). The field credcardno is char(19) and not indexed - could this be why it is taking so long? Or have I written the query wrong? Would it be better to create a temporary table for the duplicate values?

Viewing 4 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply

G.R.Prithiviraj Kulasingham SSCarpal Tunnel Points: 4105 More actions · Answer 1

Why don't you try to get the distinct entries first

Select credcardno, ccname, Count(*) from findets Group by credcardno, ccname

Please check whether you have proper indexes. I suggest you to on credcardno

G.R. Preethiviraj Kulasingham

Cheers,
Prithiviraj Kulasingham

http://preethiviraj.blogspot.com/

guarddata SSCertifiable Points: 6583 More actions · Answer 2

If you have the luxury - I would create the index on credcardno. Even if you drop it later, your performance should be enough better to be worth it. 354,000 is not so many that you should avoid buiding the index right away.

Guarddata-

G.R.Prithiviraj Kulasingham SSCarpal Tunnel Points: 4105 More actions · Answer 3

quote:
Select credcardno, ccname, Count(*) from findets Group by credcardno, ccname
Please check whether you have proper indexes. I suggest you to on credcardno

recently I had a big problem like this when I moved a dbase table into SQL sERVER.

There were duplicate entries, and I wanted to find because of not having proper indexes,

I couldn't run the query effectively. Actually I couldn't have a primary key on that table.

Primary candidate key is duplicated (May be data curruption!).

Even after creating an index the situation was same.

Later I created a clustere index, then everything went well.

G.R. Preethiviraj Kulasingham

Cheers,
Prithiviraj Kulasingham

http://preethiviraj.blogspot.com/