Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Fuzzy Grouping Slow


Fuzzy Grouping Slow

Author
Message
Lookup_BI
Lookup_BI
SSC-Enthusiastic
SSC-Enthusiastic (150 reputation)SSC-Enthusiastic (150 reputation)SSC-Enthusiastic (150 reputation)SSC-Enthusiastic (150 reputation)SSC-Enthusiastic (150 reputation)SSC-Enthusiastic (150 reputation)SSC-Enthusiastic (150 reputation)SSC-Enthusiastic (150 reputation)

Group: General Forum Members
Points: 150 Visits: 642
Hi All,
I have ran out of ideas on the following problemCrying.Maybe someone would be able to help me out.I have a package to remove duplicate records from 1 table and move the unique records to a new table .Below is the flow of my SSIS package

Oledb source = 1 table [2 million records]
Transformation =Fuzzy grouping on [based on 6 columns] [score is 70%]
The min similarity of each column is 0.7%
Conditional Split = Unique & Duplicate
Oledb destination =2 tables [Unique & Duplicate table ]

Source table ---> Fuzzy Grouping---->Conditional Split ---->Unique
---- >Duplicate

My Server information is as below
Microsoft SQL Server 2005 - 9.00.3042.00 SP2
Windows Server 2003 SP2
Memory 4.00GB

My Problem is , My package execution goes on for 2 continuous days and still is at [Fuzzy Grouping Inner Data Flow] Progress: Finding similar records - 78 percent complete.
1)Is it suppose to take so long to complete ?
2)How can I improve the performance ?
3)Upon checking, the memory used by the process is 400,000 K


There is no other application or process running on my server except for this package.It take up to 3 hours to complete 5%.Do help me out as I am not able to go on like this for 2 days and still not completed.This package is to run on every month end.
Thank you in advance.

“I haven't failed, I've found 10,000 ways that don't work”........Thomas Alva Edison
WillyWonka
WillyWonka
Valued Member
Valued Member (53 reputation)Valued Member (53 reputation)Valued Member (53 reputation)Valued Member (53 reputation)Valued Member (53 reputation)Valued Member (53 reputation)Valued Member (53 reputation)Valued Member (53 reputation)

Group: General Forum Members
Points: 53 Visits: 118
Hey there,

Fuzzy Group by design is very resource intensive. However, there are a few general things you can do to ease up on the problem of performance.

1.) Make sure SQL Server has access to enough memory. Often you will find it's limited to only a chunk of memory. If SQL is the only thing running on the server can open up most of the system RAM to SQL and leave say 512 to the Operating System. Also make SQL Server has access to all available processors. This can be adjusted through Management Studio under Server Properties.

2.) Limit the size of the data. This can make a tremendous difference. I am doing a Fuzzy grouping where staff submit lists of prospects that I run against our custom CRM system to identify which prospects are already clients. We have a huge client list, but when I filter that list down to only active clients the list becomes much more workable. Went from taking +20 minutes down to 2-3 minutes.

3.) Avoid fuzzy grouping on large strings if possible. The more text it has to comb thru the longer it will take. Also consider whether your need to match against so many columns. By sampling you can see if the results from individual columns are really providing useful matches. If not then the column could possibly be eliminated.

4.) Sample your results. Fuzzy grouping isn't an exact sience. I often found that when it found matches either they were very very close like 80% or better, or no where near. So setting your matching threshold higher can increase performance as well. This option will very greatly depending on your current situation and data requirements.

5.) Then there are the common sense items. Make sure your hardware is up to the task, and that you don't have other things running that could result in resource contention. Always remember that Fuzzy Datasets are stored in memory while they are being processed, if the server is fighting for memory it will end up paging and will take a tremendous performance hit.

Hope this was helpful...Happy Holidays,

Eric
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search