Error-During data import/cleansing using SSIS packages(urgent!!)

Question

Error-During data import/cleansing using SSIS packages(urgent!!)

Viewing 6 posts - 16 through 20 (of 20 total)

You must be logged in to reply to this topic. Login to reply

Pac123 SSCrazy Points: 2464 More actions · Answer 1

Thanks Journeyman

That made sense! Just wondering though.....

IF for table A- that has the duplicates...i just run a distinct query that returns only one record of the duplicate(based on name, id, email etc

For table B- I take the remaining id's that are unique( that is not in the distinct)

Join table A&B into a new table C

I get a new table free of duplicates.

Im trying out these queries though currently.

Thanks for your suggestions ...

Pac123 SSCrazy Points: 2464 More actions · Answer 2

I've deduped data, but only manually(pressed for time..).

I would like to use SSIS for its really meant for then....

hanimichi Old Hand Points: 356 More actions · Answer 3

hello

i did the same thing that you recommended but i got an avertissement message which said :"

The buffer manager detected that the system was low on virtual memory, but

was unable to swap out any buffers. 16 buffers were considered and 16 were

locked. Either not enough memory is available to the pipeline because not

enough is installed, other processes are using it, or too many buffers are

locked."

My server included an system SQLSERVER 2003 :

MEMORY : 1 Go

PENTIUM III 512 MHZ

I know it's old version but my entreprise use it for test it

unfortunately i don't have choice to change it.

I try to use in my package limitation of memory ( 256 Mo : ) but i got same avertissement message.

Have you got an idea how i can solve my problems ?

thanks you 😉

Pac123 SSCrazy Points: 2464 More actions · Answer 4

Could you give a little more detail as to the issue you are facing i.e. how many tables, how much data etc...what ssis package are you using..?

hanimichi Old Hand Points: 356 More actions · Answer 5

hello

i create an table TempClient which contents duplicate fields. To clean that table i create an process composed of :

- query with aggregation on LastName,FirstName, Adress,ZipCode, City

- i used an fuzzy lookup where my reference table is "aggClient table" contened only 3 field not duplicate is "Lastname, Firstname , Zipcode"

- after i conditional fractionnal on Similarity >=0.8 to retrieve unique Client

- then i used an fuzzy grouping with and conditionnal fractionnal where restriction is key_in == key_out

- finally i used an aggregation to retrieve unique client

- Insert in my Client Table

When i did that process for 1000 data or 4 months data , my proccess work

but when i did on all data, it wrote that i got a allocated memory problem.

My table contents more 1 300 000 data on TempClient

Have you got an idea for my problems ?

Thanks you

hani 😉

Pac123 SSCrazy Points: 2464 More actions · Answer 6

Let me see if i have understood correctly, basically you're having a problem when you do a Fuzzy Lookup on a larger dataset.

So, one option would be to make sure that the Buffers size is kept to smallest possible and that the Buffers are stored in a drive that has a lot of space.

Secondly, this is what worked for me, i used t-sql prcedures to initially clean the dataset( when i first began the data was about 4 years old and there were 45 million records!). After cleaning the data and building the routines to be automated i am now operating on a much smaller dataset( as i have only about 60K records or so) that i pull every week.

I have started to use SSIS, but its still slow.

Hope that helps...