July 3, 2003 at 3:22 am
Hi
I am having some problems with a query I am running, and I know it is probably due my poor query writing skills!! It's essentially a deduplication problem
I have a table of credit card transactions and each entry has a credit card number, card holder name etc recorded against it. This database is inherited and a lot of the fields have not been filled in. As a lot of the credit card numbers are repeated, I want to split them out, and hold the 'best' overall details once, to link to all the transactions. Some credit card numbers have names against them and some don't, and I would like to see only the credit card numbers entries which are repeated with a different 'name' value and not the others. I don't just want to delete them from the database I want to be able to compare all the fields to decide which combination I want.
This is the query I used:
select f.credcardno, f.ccname from V2.dbo.findets f
INNER JOIN V2.dbo.findets fo ON fo.credcardno=f.credcardno
where len(f.credcardno)>0 and fo.ccname<>f.ccname
order by f.credcardno
I have used a self-join to get the details out and it runs fine, but I am not getting distinct values. When I try to do this using distinct or 'group by' the query is pulling the right data but taking a really long time to run. (over 40 minutes on 354,000 records). The field credcardno is char(19) and not indexed - could this be why it is taking so long? Or have I written the query wrong? Would it be better to create a temporary table for the duplicate values?
July 3, 2003 at 7:33 am
Why don't you try to get the distinct entries first
Select credcardno, ccname, Count(*) from findets Group by credcardno, ccname
Please check whether you have proper indexes. I suggest you to on credcardno
G.R. Preethiviraj Kulasingham
Cheers,
Prithiviraj Kulasingham
http://preethiviraj.blogspot.com/
July 3, 2003 at 12:01 pm
If you have the luxury - I would create the index on credcardno. Even if you drop it later, your performance should be enough better to be worth it. 354,000 is not so many that you should avoid buiding the index right away.
Guarddata-
July 3, 2003 at 9:54 pm
quote:
Select credcardno, ccname, Count(*) from findets Group by credcardno, ccname
Please check whether you have proper indexes. I suggest you to on credcardno
recently I had a big problem like this when I moved a dbase table into SQL sERVER.
There were duplicate entries, and I wanted to find because of not having proper indexes,
I couldn't run the query effectively. Actually I couldn't have a primary key on that table.
Primary candidate key is duplicated (May be data curruption!).
Even after creating an index the situation was same.
Later I created a clustere index, then everything went well.
G.R. Preethiviraj Kulasingham
Cheers,
Prithiviraj Kulasingham
http://preethiviraj.blogspot.com/
Viewing 4 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply