September 10, 2010 at 9:42 am
Greetings good folks at SQLSeverCentral
I am having some performance issues with a job which identifies duplicate account registrations as the account's sign up on our website.
We have 2 databases on the same physical server: DB1 and DB2
TableA in DB1 has CustID(PK),Fname,Lname,Address,State,City,Zip,Email,Phone,JoinDate....(Another 30 fields)
TableB in DB2 has CustID(PK),BirthDate,MotherMaidenName.....(Another 15 fields)
So, as the customers signup, we have a job which fetches new accounts every 30 mins and puts them in a temporary table and then does a join on DB1.TableA and DB2.TableB on CustID and compares different combinations and flags those new accounts depending on similarity dispositions which are predefined. As I understand this is a RBAR operation.
e.g. if the new account's Fname,Lname,Password are similar to any account in the JOIN table, then flag record as 'A'
OR
if the new account's State and Email (chars before '@') are the same then flag record as 'B'
and so on....
At this moment we have around 800K + records in database and the new signups in the 30 min period can vary from 50-60 to 300-400 depending on the time of the day. The job execution takes from 5 to 15 mins and although there are (nolocks) it still causes considerable CPU usage which I would like to avoid.
Any help in changing the logic or suggestions to improve the performance & efficiency would be deeply appreciated.
TIA
Anish
September 10, 2010 at 2:27 pm
The basic concept doesn't sound like RBAR (if done without any kind of a loop). However, it might be better to check for duplicates during signup process and not batch based.
Regarding performance: please post the code (maybe reduced to a few checks to see the concept) together with table def for the tables involved including all related indexes as well as some sample data so we have something "to play with".
To perform those checks shold be a matter of seconds, not minutes...
September 10, 2010 at 4:05 pm
Thanks LutzM for your response. Here's the code with the table structure:
[font="Courier New"]/* This the JOIN which I use to grab the latest sign ups */
SELECT TOP 200
c.CustID,
c.Fname,
c.Lname,
p.BirthDate,
p.MotherMaidenName,
c.Address,
c.City,
c.State,
c.Zip,
c.Email,
c.Password,
c.OpenDateTime,
c.HomePhone
FROM
DB1.dbo.CustomerTable c WITH (NOLOCK)
LEFT JOIN
DB2.dbo.crmProfiles p WITH (NOLOCK)
ON
c.CustID = p.CustID
WHERE
OpenDateTime > @StartVerificationDate AND OpenDateTime <= @EndVerificationDate
AND
c.AgentId NOT IN ('XYX', 'ABC')
ORDER BY
OPenDateTime
/*Please note that the PK is CustID which is a char(10) field and there is also a index which includes all the above fields. Following is the temp table which gets the latest account signups every 30 minutes. To reduce the time for processing we cap this table to 200 records */
CREATE TABLE #JoinedUsers
(
JoinedUserID INT IDENTITY,
CustID CHAR(10),
Fname VARCHAR(30),
Lname VARCHAR(30),
BirthDate SMALLDATETIME,
MotherMaidenName VARCHAR(50),
Address VARCHAR(180),
City VARCHAR(100),
State CHAR(2),
Zip VARCHAR(15),
Email VARCHAR(40),
Password VARCHAR(10),
JoinDate DATETIME,
Phone VARCHAR(25)
)
/* For each record in #JoinedUsers we iterate through the following code:
(This is the block which takes the most time) */
SELECT
C.CustID,
c.Fname,
c.Lname,
p.BirthDate,
p.MotherMaidenName,
c.Address,
c.City,
c.State,
c.Zip,
c.Email,
c.Password,
c.HomePhone
FROM
DB1.dbo.CustomerTable c WITH (NOLOCK)
LEFT JOIN DB2.dbo.crmProfiles p WITH (NOLOCK)
ON
c.CustID = p.CustID,
DB1.dbo.CustomerTable c2 WITH (NOLOCK)
WHERE
c.CustID <> @CustID
AND c2.CustID = @CustID
--EXCLUDE ATS CUSTOMERS
AND c.AgentId <> 'XYZ'
--search for all rules with OR condition
--1. FirstLastState
AND
(
(
(@Fname IS NOT NULL
AND @Fname <> ''
AND c.Fname = @Fname)
AND (@Lname IS NOT NULL
AND @Lname <> ''
AND c.Lname = @Lname)
AND (@State IS NOT NULL
AND @State <> ''
AND c.State = @State)
)
--2. LastAdrLine1State
OR(
(@Lname IS NOT NULL
AND @Lname <> ''
AND c.Lname = @Lname)
AND (
CHARINDEX(' ', @Address) > 0
AND SUBSTRING(c.Address, 0, CHARINDEX(' ', c.Address)) =
SUBSTRING(@Address, 0, CHARINDEX(' ', @Address)))
AND (@State IS NOT NULL
AND @State <> ''
AND c.State = @State)
)
--3. MMNZip
OR(
(@MotherMaidenName IS NOT NULL
AND @MotherMaidenName <> ''
AND p.MotherMaidenName = @MotherMaidenName)
AND (@Zip IS NOT NULL
AND c.Zip = @Zip)
)
--4. EmailNoDomain
OR(
CHARINDEX('@', @Email) > 0
AND SUBSTRING(c.Email, 0, CHARINDEX('@', c.Email)) =
SUBSTRING(@Email, 0, CHARINDEX('@', @Email))
)
--5. LastDOBState
OR(
(@Lname IS NOT NULL
AND @Lname <> ''
AND c.Lname = @Lname)
AND (@BirthDate IS NOT NULL
AND @BirthDate <> ''
AND p.BirthDate = @BirthDate)
AND (@State IS NOT NULL
AND @State <> ''
AND c.State = @State)
)
--6. PassState
OR(
(@Password IS NOT NULL
AND @Password <> ''
AND c.Password = @Password)
AND (@State IS NOT NULL
AND @State <> ''
AND c.State = @State)
)
--7. Phone
OR(
@Phone IS NOT NULL
AND @Phone <> ''
AND c.HomePhone = @Phone
)
) --end big AND
--IF NO MATCHES FOUND, DELETE THIS CUSTOMER FROM #JoinedUsers
IF @@ROWCOUNT = 0
DELETE FROM #JoinedUsers WHERE CustID = @CustID
/* If matches are found then we use it to flag another table which marks the customer as suspicious */[/font]
LutzM (9/10/2010)
The basic concept doesn't sound like RBAR (if done without any kind of a loop). However, it might be better to check for duplicates during signup process and not batch based.Regarding performance: please post the code (maybe reduced to a few checks to see the concept) together with table def for the tables involved including all related indexes as well as some sample data so we have something "to play with".
To perform those checks shold be a matter of seconds, not minutes...
September 11, 2010 at 7:12 am
This is the major cause for the poor performance:
/* For each record in #JoinedUsers we iterate through the following code:
(This is the block which takes the most time) */
That's the reason why I asked you to post the code. Unfortunately, instead of removing some of the checks you obviously removed the cursor in your sample. But the cursor is most probably the major part of the problem. So, please post the complete procedure including the DELETE and "mark suspicious" section.
In order to test our solutions we would nee table def for DB1.dbo.CustomerTable as well as DB2.dbo.crmProfiles including some sample data.
Also, please post some sample rows for #JoinedUsers as well as your expected result.
It would also help if you'd post the actual execution plan for one iteration.
September 12, 2010 at 10:23 am
Hi! Lutz
Thanks for helping out on the weekend.
As I had mentioned earlier we are not using a cursors but, temporary tables with WHILE loop constructs.
I am attaching the stored procs and the table definitions and the sqlplan for the relevant section in a zipped file. I hope its useful.
Thanks
Anish
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply