Exclude rows where value used in another row

  • I need to exclude data from a select query where the value of column A is used in other rows and the values of column B are different.

    Column A    | ColumnB
    PhoneNumber | FirstName
    ------------|----------
    02038245161 | Jon
    02038245161 | Henry
    02038881121 | Jamie
    02038881121 | Jamie

    From the above table I want to select all records apart from the first two as the phone number has been used for more than one person. I should only get the last two rows back as the number is used twice but the person is the same.

    I am currently trying the following:

    SELECT cust.PhoneNumber, cust.FirstName, (SELECT COUNT(PhoneNumber) FROM Customers WHERE PhoneNumber = cust.PhoneNumber AND FirstName <> cust.FirstName)
    FROM Customers cust
    Group By cust.PhoneNumber, cust.FirstName

    This for some reason is giving me an error stating that the sub query is returning more than one results however i'm not sure why as it should be returning a single count.

  • If you would like a coded solution, please provide your sample data in the form of CREATE TABLE/INSERT statements.

    Please also confirm the logic for which row should be returned in the case of a duplicate phone number, remembering that there is no concept of 'first' in SQL Server without an ORDER BY.

    If the answer to your question can be found with a brief Google search, please perform the search yourself, rather than expecting one of the SSC members to do it for you.
    See https://www.sqlservercentral.com/articles/forum-etiquette-how-to-post-datacode-on-a-forum-to-get-the-best-help/ for details of how to post T-SQL code-related questions.

  • what do you want return if you had these rows

    02038881121 | Jamie
    02038881121 | Jamie
    02038881121 | John
    02038881121 | Jack

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • Hi Phil,

    Thanks for the reply.

    The create/ insert would be as follows:

    CREATE TABLE CustomerPolicies
    (
    [Mobile] NVARCHAR(20),
    [WorkTelephone] NVARCHAR(20),
    [HomeTelephone] NVARCHAR(20),
    [FirstName] NVARCHAR(30),
    [LastName] NVARCHAR(30),
    [PolicyNumber] INT
    )

    INSERT INTO CustomerPolicies VALUES ('07972234564', 'N/A', 'N/A', 'Jon', 'Evans', 32323232)
    INSERT INTO CustomerPolicies VALUES ('07972234564', 'N/A', 'N/A', 'Tom', 'Davies', 34343434)
    INSERT INTO CustomerPolicies VALUES ('07234322322', 'N/A', 'N/A', 'Will', 'Jones', 12121212)
    INSERT INTO CustomerPolicies VALUES ('07234322322', 'N/A', 'N/A', 'Will', 'Jones', 13131313)

    The SELECT query would need to return the second two rows only and not the first two as the first two have a phone number that is used for two different customers.

    The second two rows have the same phone number but also the same customer so I would want them both returned as they are for two different product policies,

  • J Livingston SQL wrote:

    what do you want return if you had these rows

    02038881121 | Jamie
    02038881121 | Jamie
    02038881121 | John
    02038881121 | Jack

    I would want the first two rows returned as they are the same number AND same customer.

    The second two rows are the same number but different customers so I would want to exclude them.

    The underlying data that I am working with has some flaws in that product resellers sometimes put their phone number against the customer data and as such I am currently getting multiple matches to a phone number for different customers.

    If I can filter those records out then I don;t have to worry as much about matching the wrong customer to a phone number.

  • MattNorman88 wrote:

    J Livingston SQL wrote:

    what do you want return if you had these rows

    02038881121 | Jamie
    02038881121 | Jamie
    02038881121 | John
    02038881121 | Jack

    I would want the first two rows returned as they are the same number AND same customer.

    The second two rows are the same number but different customers so I would want to exclude them.

    The underlying data that I am working with has some flaws in that product resellers sometimes put their phone number against the customer data and as such I am currently getting multiple matches to a phone number for different customers.

    If I can filter those records out then I don;t have to worry as much about matching the wrong customer to a phone number.

    Once again, there is no such thing as FIRST here, unless there is some column we can use to ORDER BY. Just because they appear first in a SELECT * does not imply any sort of implicit ordering.

    If the answer to your question can be found with a brief Google search, please perform the search yourself, rather than expecting one of the SSC members to do it for you.
    See https://www.sqlservercentral.com/articles/forum-etiquette-how-to-post-datacode-on-a-forum-to-get-the-best-help/ for details of how to post T-SQL code-related questions.

  • If I am understanding your request properly, something like this should work, right:

    WITH [cte]
    AS
    (
    SELECT
    [Mobile]
    , [WorkTelephone]
    , [HomeTelephone]
    , [FirstName]
    , [LastName]
    , [PolicyNumber]
    , COUNT(1) OVER (PARTITION BY
    [Mobile]
    , [FirstName]
    , [LastName]
    ) AS [DistinctCount]
    FROM[CustomerPolicies]
    )
    SELECT
    [cte].[Mobile]
    , [cte].[WorkTelephone]
    , [cte].[HomeTelephone]
    , [cte].[FirstName]
    , [cte].[LastName]
    , [cte].[PolicyNumber]
    , [cte].[DistinctCount]
    FROM[cte]
    WHERE [cte].[DistinctCount] > 1;

    MAY need to tweak that windowing function to cover all of the phone numbers; I was just writing it to work with the data.  The other problem you run into is when you have data like:

    Column A    | ColumnB
    PhoneNumber | FirstName
    ------------|----------
    02038245161 | Jon
    02038245162 | Henry
    02038881121 | Jamie
    02038881121 | Jamie

    Here Jon and Henry have different phone numbers, but my above query would exclude them and ONLY give you Jamie.  But that is easy to correct with a second windowing function that is a count on JUST the mobile number and the final where gets that added to it when its count is 1.

    I'd recommend renaming the cte to something that makes more sense to you though.  I am not a big fan of CTE's named "cte" except for demo purposes.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!

  • Mr. Brian Gale wrote:

    If I am understanding your request properly, something like this should work, right:

    WITH [cte]
    AS
    (
    SELECT
    [Mobile]
    , [WorkTelephone]
    , [HomeTelephone]
    , [FirstName]
    , [LastName]
    , [PolicyNumber]
    , COUNT(1) OVER (PARTITION BY
    [Mobile]
    , [FirstName]
    , [LastName]
    ) AS [DistinctCount]
    FROM[CustomerPolicies]
    )
    SELECT
    [cte].[Mobile]
    , [cte].[WorkTelephone]
    , [cte].[HomeTelephone]
    , [cte].[FirstName]
    , [cte].[LastName]
    , [cte].[PolicyNumber]
    , [cte].[DistinctCount]
    FROM[cte]
    WHERE [cte].[DistinctCount] > 1;

    MAY need to tweak that windowing function to cover all of the phone numbers; I was just writing it to work with the data.  The other problem you run into is when you have data like:

    Column A    | ColumnB
    PhoneNumber | FirstName
    ------------|----------
    02038245161 | Jon
    02038245162 | Henry
    02038881121 | Jamie
    02038881121 | Jamie

    Here Jon and Henry have different phone numbers, but my above query would exclude them and ONLY give you Jamie.  But that is easy to correct with a second windowing function that is a count on JUST the mobile number and the final where gets that added to it when its count is 1.

    I'd recommend renaming the cte to something that makes more sense to you though.  I am not a big fan of CTE's named "cte" except for demo purposes.

    Hi Brian,

    Thanks for you for the example.

    This works for the most part however it excludes records where there is a single unique match for mobile number, first name and last name.

    I am only trying to exclude records where the same number appears against multiple different people.

    Regards

    Matt

  • Something like this? (I may be oversimplifying something...)

    -- exclude when phone # refers to multiple people.

    -- exclude when phone # refers to multiple people.
    SELECT * FROM CustomerPolicies cp
    WHERE cp.Mobile NOT IN
    (SELECT Mobile
    FROM CustomerPolicies
    GROUP BY Mobile
    HAVING COUNT(DISTINCT(FirstName))>1);
  • pietlinden wrote:

    Something like this? (I may be oversimplifying something...)

    -- exclude when phone # refers to multiple people.

    -- exclude when phone # refers to multiple people.
    SELECT * FROM CustomerPolicies cp
    WHERE cp.Mobile NOT IN
    (SELECT Mobile
    FROM CustomerPolicies
    GROUP BY Mobile
    HAVING COUNT(DISTINCT(FirstName))>1);

    Thanks pietlinden,

    Nice and simple and does exactly what I need.

  • MattNorman88 wrote:

    Hi Phil,

    Thanks for the reply.

    The create/ insert would be as follows:

    CREATE TABLE CustomerPolicies
    (
    [Mobile] NVARCHAR(20),
    [WorkTelephone] NVARCHAR(20),
    [HomeTelephone] NVARCHAR(20),
    [FirstName] NVARCHAR(30),
    [LastName] NVARCHAR(30),
    [PolicyNumber] INT
    )

    INSERT INTO CustomerPolicies VALUES ('07972234564', 'N/A', 'N/A', 'Jon', 'Evans', 32323232)
    INSERT INTO CustomerPolicies VALUES ('07972234564', 'N/A', 'N/A', 'Tom', 'Davies', 34343434)
    INSERT INTO CustomerPolicies VALUES ('07234322322', 'N/A', 'N/A', 'Will', 'Jones', 12121212)
    INSERT INTO CustomerPolicies VALUES ('07234322322', 'N/A', 'N/A', 'Will', 'Jones', 13131313)

    The SELECT query would need to return the second two rows only and not the first two as the first two have a phone number that is used for two different customers.

    The second two rows have the same phone number but also the same customer so I would want them both returned as they are for two different product policies,

    MattNorman88 wrote:

    pietlinden wrote:

    Something like this? (I may be oversimplifying something...)

    -- exclude when phone # refers to multiple people.

    -- exclude when phone # refers to multiple people.
    SELECT * FROM CustomerPolicies cp
    WHERE cp.Mobile NOT IN
    (SELECT Mobile
    FROM CustomerPolicies
    GROUP BY Mobile
    HAVING COUNT(DISTINCT(FirstName))>1);

    Thanks pietlinden,

    Nice and simple and does exactly what I need.

    Here's another possibility.  The difference is that it only does a single table scan and some "Lazy Spools".  The Actual Execution Plan indicates that should be about twice as fast but I never trust the execution plan to be correct there.  I just don't have enough data for a performance test so, if you have the time to do so, please try it out and let us know how they both worked for you.

       WITH cteEnumerate AS
    (
    SELECT *,NameCnt = COUNT(*) OVER (PARTITION BY Mobile,FirstName,LastName)
    FROM dbo.CustomerPolicies
    )
    SELECT * FROM cteEnumerate WHERE NameCnt > 1
    ;

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Looking a bit closer at the code, there' may be a fly in the ointment of pietlinden's  code.  Give the following a run and see.

    [EDIT] I DISCOVERED THE CODE THAT I POSTED BELOW DOES NOT WORK FOR ALL SITUATIONS BUT THE MOD TO PIETLINDEN'S CODE DOES HANDLE ALL SITUATIONS.  PLEASE SEE THE POST FOLLOWING THIS ONE!

    I left my code here as an example of a mistake to avoid.

    --===== Create the modified test data
    DROP TABLE IF EXISTS dbo.CustomerPolicies;
    CREATE TABLE dbo.CustomerPolicies
    (
    Mobile NVARCHAR(20)
    ,WorkTelephone NVARCHAR(20)
    ,HomeTelephone NVARCHAR(20)
    ,FirstName NVARCHAR(30)
    ,LastName NVARCHAR(30)
    ,PolicyNumber INT
    )
    ;
    INSERT INTO CustomerPolicies
    (Mobile,WorkTelephone,HomeTelephone,FirstName,LastName,PolicyNumber)
    VALUES ('07972234564','N/A','N/A','Jon' ,'Evans' ,32323232)
    ,('07972234564','N/A','N/A','Jon' ,'Jones' ,98755432) --Added this row
    --,('07972234564','N/A','N/A','Tom' ,'Davies',34343434) --Removed this
    ,('07234322322','N/A','N/A','Will','Jones' ,12121212)
    ,('07234322322','N/A','N/A','Will','Jones' ,13131313)
    ;
    --===== This code only looks at the first name and so produces
    -- the wrong answer according to your request.
    SELECT * FROM CustomerPolicies cp
    WHERE cp.Mobile NOT IN
    (SELECT Mobile
    FROM CustomerPolicies
    GROUP BY Mobile
    HAVING COUNT(DISTINCT(FirstName))>1)
    ;
    PRINT 'EDIT. UPON FURTHER TESTING, THIS CODE DOES NOT WORK FOR ALL SITUATIONS';
    PRINT ' SEE THE POST BELOW FOR PROOF OF THAT AND THAT THE MOD TO PIETLINDEN"S CODE DOES!';
    WITH cteEnumerate AS
    (
    SELECT *,NameCnt = COUNT(*) OVER (PARTITION BY Mobile,FirstName,LastName)
    FROM dbo.CustomerPolicies
    )
    SELECT * FROM cteEnumerate WHERE NameCnt > 1
    ;

    If it turns out that pietlinden's is faster, then you need to change his "Having" clause to include a concatenation of FirstName and LastName and it'll work just fine.  Like this.  Just remember to test it for performance again because concatenation can be a little expensive.

    SELECT * FROM CustomerPolicies cp
    WHERE cp.Mobile NOT IN
    (SELECT Mobile
    FROM CustomerPolicies
    GROUP BY Mobile
    HAVING COUNT(DISTINCT(FirstName+LastName))>1)
    ;

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • AND, looking a bit closer, my code doesn't work correctly using the following data but the mod to pietlinden's code does, so use his code with the mod.

    Here's the code I used to discover my error and that the mod to pietlinden's code works.  Use the mod to HIS code and not mine.

    --===== Create the modified test data
    DROP TABLE IF EXISTS dbo.CustomerPolicies;
    CREATE TABLE dbo.CustomerPolicies
    (
    Mobile NVARCHAR(20)
    ,WorkTelephone NVARCHAR(20)
    ,HomeTelephone NVARCHAR(20)
    ,FirstName NVARCHAR(30)
    ,LastName NVARCHAR(30)
    ,PolicyNumber INT
    )
    ;
    INSERT INTO CustomerPolicies
    (Mobile,WorkTelephone,HomeTelephone,FirstName,LastName,PolicyNumber)
    VALUES ('07972234564','N/A','N/A','Jon' ,'Evans' ,32323232)
    ,('07972234564','N/A','N/A','Jon' ,'Jones' ,98755432) --Added this row
    ,('07972234564','N/A','N/A','Tom' ,'Davies',34343434)
    ,('07972234564','N/A','N/A','Tom' ,'Davies',34343434) --Added this row
    ,('07234322322','N/A','N/A','Will','Jones' ,12121212)
    ,('07234322322','N/A','N/A','Will','Jones' ,13131313)

    ;
    --===== This code does NOT work correctly for this example so don't use it.
    -- And my apologies for thinking that it worked before.
    WITH cteEnumerate AS
    (
    SELECT *,NameCnt = COUNT(*) OVER (PARTITION BY Mobile,FirstName,LastName)
    FROM dbo.CustomerPolicies
    )
    SELECT * FROM cteEnumerate WHERE NameCnt > 1
    ;
    --===== The mod to pietlindens' code DOES work so use it instead.
    SELECT * FROM CustomerPolicies cp
    WHERE cp.Mobile NOT IN
    (SELECT Mobile
    FROM CustomerPolicies
    GROUP BY Mobile
    HAVING COUNT(DISTINCT(FirstName+LastName))>1)
    ;

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff Moden wrote:

    AND, looking a bit closer, my code doesn't work correctly using the following data but the mod to pietlinden's code does, so use his code with the mod.

    Here's the code I used to discover my error and that the mod to pietlinden's code works.  Use the mod to HIS code and not mine.

    --===== Create the modified test data
    DROP TABLE IF EXISTS dbo.CustomerPolicies;
    CREATE TABLE dbo.CustomerPolicies
    (
    Mobile NVARCHAR(20)
    ,WorkTelephone NVARCHAR(20)
    ,HomeTelephone NVARCHAR(20)
    ,FirstName NVARCHAR(30)
    ,LastName NVARCHAR(30)
    ,PolicyNumber INT
    )
    ;
    INSERT INTO CustomerPolicies
    (Mobile,WorkTelephone,HomeTelephone,FirstName,LastName,PolicyNumber)
    VALUES ('07972234564','N/A','N/A','Jon' ,'Evans' ,32323232)
    ,('07972234564','N/A','N/A','Jon' ,'Jones' ,98755432) --Added this row
    ,('07972234564','N/A','N/A','Tom' ,'Davies',34343434)
    ,('07972234564','N/A','N/A','Tom' ,'Davies',34343434) --Added this row
    ,('07234322322','N/A','N/A','Will','Jones' ,12121212)
    ,('07234322322','N/A','N/A','Will','Jones' ,13131313)

    ;
    --===== This code does NOT work correctly for this example so don't use it.
    -- And my apologies for thinking that it worked before.
    WITH cteEnumerate AS
    (
    SELECT *,NameCnt = COUNT(*) OVER (PARTITION BY Mobile,FirstName,LastName)
    FROM dbo.CustomerPolicies
    )
    SELECT * FROM cteEnumerate WHERE NameCnt > 1
    ;
    --===== The mod to pietlindens' code DOES work so use it instead.
    SELECT * FROM CustomerPolicies cp
    WHERE cp.Mobile NOT IN
    (SELECT Mobile
    FROM CustomerPolicies
    GROUP BY Mobile
    HAVING COUNT(DISTINCT(FirstName+LastName))>1)
    ;

    That code is not safe either.  That's because two (or more) different FirstName and LastName separately could be the same once they are concatenated.

    Some examples:

    Ric Edelman / Rice Delman

    Hank Nudson / Han Knudson

    etc.

    I suggest adding a vertical bar, say, between the names:

    ...DISTINCT(FirstName+'|'+LastName)...

    SQL DBA,SQL Server MVP(07, 08, 09) Prosecutor James Blackburn, in closing argument in the Fatal Vision murders trial: "If in the future, you should cry a tear, cry one for them [the murder victims]. If in the future, you should say a prayer, say one for them. And if in the future, you should light a candle, light one for them."

Viewing 14 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply