Filtering values from a Group By Statement

  • Hi all,

    I have an Audit table that contains customer records.

    These customer records are being changed completely when the should not be.

    I want to get all records from the audit table where the customer surname AND Forename have changed (the others are probably customers that got married)

    All I have so far is where the surname is changed (and I am not positive it is right)

    SELECT customerid

    FROM(

    SELECT customerid,

    count (Distinct surname) AS count1

    FROM tblcustomeraudit nolock

    GROUP BY customerid) a

    where

    count1 > 1

    I tried adding count (Distinct forname) AS count2 .... and count2 > 1 but got back only one record.

    3 other things if possible

    1) If the forname and surname only swapped places, leave those records out.

    2) If the spellings are only slightly different, leave them out too (fuzzy match?)

    3) only those records that have ever had a null value

    The last 3 are nice to have, but if anyone has just the "surname AND Forename have changed" bit, that would be great too!

    Thanks in advance,

    Barry

  • Is there a primary key or unique row identifier for the table?

    Ryan Randall

    Solutions are easy. Understanding the problem, now, that's the hard part.

  • No matter. Something like this...

    declare @tblcustomeraudit table (customerid int, forename varchar(20), surname varchar(20))

    insert @tblcustomeraudit

    select 1, 'John', 'Smith'

    union all select 1, 'John', 'Smith'

    union all select 1, 'John', 'Smithe'

    union all select 2, 'Fred', 'Jones'

    union all select 2, 'Freddy', 'Jonesy'

    union all select 3, 'Bob', 'Williams'

    union all select 3, 'Williams', 'Bob'

    select * from @tblcustomeraudit a inner join @tblcustomeraudit b on

    a.customerid = b.customerid and

    ((a.forename <> b.forename and a.surname <> b.surname)

    and not (a.forename = b.surname and a.surname = b.forename) --Bonus 1.

    )

    Bonus 2: For fuzzy matching, take a look at these...

    http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=66781

    http://www.sqlteam.com/article/using-fuzzy-lookup-transformations-in-sql-server-integration-services

    Bonus 3: I'm not sure what you mean

    Ryan Randall

    Solutions are easy. Understanding the problem, now, that's the hard part.

  • Hi Ryan,

    Yes there is a Field AuditID as the Primary key.

  • Great stuff, thanks Ryan.

    I got the bit about Null wrong. I wanted the Audituser to be null so I can just add

    and a.audituser is null to your query.

    Thanks for your help!

    Brian

  • Etiher one of these should work:

    SELECT customerid, count (Distinct surname) AS count1

    FROM tblcustomeraudit nolock

    GROUP BY customerid

    Having count (Distinct surname) > 1

    And count (Distinct forname) > 1

    SELECT customerid, count (Distinct surname) AS count1

    FROM tblcustomeraudit nolock

    GROUP BY customerid

    Having MIN(surname) != MAX(surname)

    And MIN(forname) != MAX(forname)

    This is just for the basic requirements, not the bonus stuff.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Oops, I just noticed... my first one may not work on SQL 2000. Second one should be fine though.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Here is a version that gets all of the bonuses:

    SELECT customerid, count (Distinct surname) AS count1

    FROM tblcustomeraudit nolock

    Where audituser is NULL-- bonus 3 (?)

    GROUP BY customerid

    Having MIN(surname) != MAX(surname)-- basic

    And MIN(forname) != MAX(forname)-- basic

    And Min(surname) != Max(forname)-- bonus 1

    And DIFFERENCE(MIN(surname), Max(surname)) > 1-- bonus 2

    And DIFFERENCE(MIN(forname), Max(forname)) > 1-- bonus 2

    And Count(*) != Count(audituser)-- bonus 3

    Bonus 3 (audituser is Null) I am uncertain so I wrote it two ways (but you only want to keep one of them):

    If you only want to check tblcustomeraudit records that have audituser = Null, then keep the where clause and remove the last Having clause.

    However, if you want to check only customers, who have any record with audituser = Null, then keep the Having clause and remove the where clause.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • RBarryYoung,

    the way you used NOLOCK makes it's a table alias.

    To make it a table hint you must use (NOLOCK), better (WITH NOLOCK)

    _____________
    Code for TallyGenerator

  • Heh, I don't know how that happened. Thanks, Sergiy.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

Viewing 10 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply