how to verify duplicate email used by more people

  • Hello Good Morning,

    I need to validate email address in this table depend on duplicate email address by empid.,

    CREATE TABLE #temp1 (Emp_id varchar(6), email1 varchar(20), email2 varchar(20), ValidEmail1 bit,ValidEmail2 bit )

    Insert into #temp1 values (12345,'123@abc.com','132@abc.com',null,null)

    Insert into #temp1 values (23456,'234@abc.com','243@abc.com',null,null)

    Insert into #temp1 values (34567,'123@abc.com','234@abc.com',null,null)

    Insert into #temp1 values (45678,'456@abc.com','456@abc.com',null,null)

    Insert into #temp1 values (56789,'123@abc.com','777@abc.com',null,null)

    Insert into #temp1 values (67891,'777@abc.com','666@abc.com',null,null)

    Insert into #temp1 values (78912,'aaa@abc.com','555@abc.com',null,null)

    Insert into #temp1 values (89123,'bbb@abc.com','123@abc.com',null,null)

    Insert into #temp1 values (91234,'ccc@abc.com','333@abc.com',null,null)

    Insert into #temp1 values (01234,'123@abc.com','222@abc.com',null,null)

    select * from #temp1

    Employee id is uniquee in this table,

    it has two email address columns

    two columns to validate the email address

    if A same email is used by more than one employee id then we need to mark valid column to 0 else 1

    for example. first row email address 1 is 123@abc.com that is used by more than one employee id, so first row validemail1 will be 0, thirdrow validemail1 = 0,8th row validemail2 = 0 last row validemail1 = 0

    Can you please help me here

    Thank you in Advance

    Milan

  • If you are only trying to compare email1, then this will return all email addresses being used by more than one employee...

    SELECT email1

    FROM #temp1

    GROUP BY email1

    HAVING COUNT(*)>1;

    If you need to find duplicates in both columns (email1, email2), then you would have to union those together, and then get the count.

  • Below are 3 different methods for achieving the required results.

    I'll leave it to you to test and find the best performance ...

    Option 1 - Will have to be duplicated for ValidEmail2

    UPDATE #temp1

    SET ValidEmail1 = 1;

    WITH cteBaseData AS (

    SELECT email1

    FROM #temp1

    GROUP BY email1

    HAVING COUNT(*) > 1

    )

    UPDATE #temp1

    SET ValidEmail1 = 0

    FROM #temp1 AS t

    WHERE EXISTS (SELECT 1 FROM cteBaseData

    WHERE email1 = t.email1);

    Option 2 - Will have to be duplicated for ValidEmail1

    WITH cteBaseData AS (

    SELECT email2

    FROM #temp1

    GROUP BY email2

    HAVING COUNT(*) > 1

    )

    UPDATE #temp1

    SET ValidEmail2 = CASE WHEN cte.email2 IS NULL THEN 1 ELSE 0 END

    FROM #temp1 AS t

    LEFT JOIN cteBaseData AS cte

    ON t.email2 = cte.email2;

    Option 3 - Sets ValidEmail1 and ValidEmail2

    WITH cteBaseData AS (

    SELECT Emp_id

    , c1 = COUNT(*) OVER (PARTITION BY email1)

    , c2 = COUNT(*) OVER (PARTITION BY email2)

    FROM #temp1

    )

    UPDATE #temp1

    SET ValidEmail1 = CASE WHEN cte.c1 = 1 THEN 1 ELSE 0 END

    , ValidEmail2 = CASE WHEN cte.c2 = 1 THEN 1 ELSE 0 END

    FROM #temp1 AS t1

    INNER JOIN cteBaseData AS cte

    ON t1.Emp_id = cte.Emp_id;

  • One thing you weren't really specific on is if a user has a email in column 1, and then appears in their (or someelse's) email 2. I've gone with if the email address is found twice it is invalid:

    Update #temp1

    Set ValidEmail1 = (Select case when COUNT(Emp_id) > 1 then 0 else 1 end from #temp1 sq where #temp1.email1 IN (sq.email1, sq.email2)),

    ValidEmail2 = (Select case when COUNT(Emp_id) > 1 then 0 else 1 end from #temp1 sq where #temp1.email2 IN (sq.email1, sq.email2))

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • What I did first is create a list of invalid emails

    With myCTE as

    (

    Select emp_id, emailType, email

    From #temp1

    Unpivot (email for emailType in ([email1],[email2])) as upvt

    )

    Select email, count(email) as numTotal

    into #invalidEmails

    from myCte

    group by email

    Having count(email) > 1

    ;

    GO

    Next I unpivot the data , do a line by line comparison (in the inner view) and then use the MAX functions to do a pivot of the data back to original form... where I can apply conditional logic to the emails.

    With myCTE as

    (

    Select emp_id, emailType, email

    From #temp1

    Unpivot (email for emailType in ([email1],[email2])) as upvt

    )

    Select emp_id,

    Max(case when emailType ='email1' then email else '' End) as Email1,

    Max(case when emailType ='email2' then email else '' End) as Email2,

    Max(case when emailType ='email1' then isValidEmail else NULL End) as isValidEmail1,

    Max(case when emailType ='email2' then isValidEmail else NULL End) as isValidEmail2

    FROM(

    Select c.*, case when inv.email Is Not Null then 0 else 1 end as IsValidEmail

    from myCte as c

    Left Join #invalidEmails as inv On c.email = inv.email

    ) as v

    Group By emp_id

    It showed correct results on my end, as far as my understanding goes. Is this what you need?

    EDIT: corrected field names for the isValid email columns (2).

    ----------------------------------------------------

  • CREATE TABLE #temp1_dup_emails (email varchar(20) PRIMARY KEY);

    Insert into #temp1_dup_emails

    SELECT CASE WHEN which_email = 1 THEN email1 ELSE email2 END AS email

    FROM #temp1

    CROSS JOIN (

    VALUES(1),(2)

    ) AS which_email(which_email)

    GROUP BY CASE WHEN which_email = 1 THEN email1 ELSE email2 END

    HAVING COUNT(*) > 1

    UPDATE t

    SET

    ValidEmail1 = CASE WHEN td1.email IS NULL THEN 1 ELSE 0 END,

    ValidEmail2 = CASE WHEN td2.email IS NULL THEN 1 ELSE 0 END

    FROM #temp1 t

    LEFT OUTER JOIN #temp1_dup_emails td1 ON td1.email = t.email1

    LEFT OUTER JOIN #temp1_dup_emails td2 ON td2.email = t.email2

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • The only issue I see here is with Scott and My results is the emp_id = 45678. The employee has the same email for both fields, and that email is not repeated anywhere else by another employee. That may be a valid email.

    ----------------------------------------------------

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply