Select into a Text file from a Table and removing dupilcates

Question

Post reply

Select into a Text file from a Table and removing dupilcates

Neel Singh

Old Hand

Points: 307
More actions
July 8, 2008 at 12:10 am

#209657

Hi
I have a table of about 21 000 000 rows, I need to export this data to a text file and before exporting I need to remove all duplicates. What would be the fastest and most efficient way to do this task.
Thanks

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

Peter Lavelle Mr or Mrs. 500 Points: 526 More actions · Answer 1

Hi Neel,

Could you provide some more details please?

1. What version SQL Server?

2. Table definition (table name, column names, data types, keys, indexes)?

3. Some sample data rows

4. What exactly do you mean by duplicate rows? If three rows are duplicates do you want to delete all of them, the first two or the last two? There are many postings on this site about eliminating duplicate rows.

5. Is this a once-off request or do you want to run this regularly?

PeterL

amar_mazhar SSC Journeyman Points: 78 More actions · Answer 2

Hi Neel,

I think you can use DTS package to transfer data from table to text file. In Dts packages you can use store procedures or TSQL quesries to control duplicate values.

Neel Singh Old Hand Points: 307 More actions · Answer 3

Hi

For inserting the data into the text file, I did a striaght select into the text file using sql query.

For removing Duplicates I used the below query but I'm sure there is a better option.

--select all dupicates into a temp table

select pho_phone_number,count(*) cc

into #temp

from XD_HM_Extract with (nolock)

where data_type = 'XD'

group by pho_phone_number

having count(*) > 1

--select accounts with no duplicate into a temp table

select acc_account_code,

ped_initials,

ped_name_1,

ped_surname,

adr_line_1,

adr_line_2,

adr_line_3,

adr_line_4,

adr_post_code,

que_code,

que_description,

bck_id,

pho_phone_number,

data_type,

status

into #main

from XD_HM_Extract xd with (nolock)

where not exists (select *

from #temp t

where xd.pho_phone_number = t.pho_phone_number)

and data_type = 'XD'

--select all duplicate accounts

select acc_account_code,

ped_initials,

ped_name_1,

ped_surname,

adr_line_1,

adr_line_2,

adr_line_3,

adr_line_4,

adr_post_code,

que_code,

que_description,

bck_id,

xd.pho_phone_number,

data_type,

status

into #dup

from XD_HM_Extract xd with (nolock)

join #temp t with (nolock)

on (xd.pho_phone_number = t.pho_phone_number)

where data_type = 'XD'

--select distinct accounts where phone number is duplicate

select *

into #main1

from #dup xd

where xd.acc_account_code = (select top 1 t.acc_account_code

from #dup t

where xd.pho_phone_number = t.pho_phone_number

order by acc_account_code desc)

--merging my two temp tables

insert into #main

select *

from #main1

/*

drop table #main

drop table #main1

drop table #temp

drop table #dup*/

select *

from #main

Dallas Martin SSC Veteran Points: 231 More actions · Answer 4

First, you need to understand your data.

Exactly what makes anyone record the "unique" or desired record?

Does the table have an IDENTITY column or a datetime column

that could be used to identity the most recent record for a common

column or set of columns.

For example, your given table has repeating address information

for a pediatrician (ped_?). since the pediatrician's name could

be the same for each of this pediatrician's records then you would

identify the most recent record for the entire set of that's pediatrician's

records using a query like this:

Assume we have an IDENTITY column named: record_id

SELECT * FROM dbo.SomeTable A,

(SELECT ped_Name, MAX(record_id) as MAX_ID

FROM dbo.SomeTable

GROUP BY ped_name) as B

WERE a.record_id=b.MAX_ID

Neel Singh Old Hand Points: 307 More actions · Answer 5

Thanks, I will apply that logic, seems much simpler.