Exclude rows where value used in another row

Question

Post reply

Exclude rows where value used in another row

MattNorman88

SSC Veteran

Points: 211
More actions
September 8, 2021 at 8:01 am
Go to Answer
#3927711
I need to exclude data from a select query where the value of column A is used in other rows and the values of column B are different.
```
Column A    | ColumnB
PhoneNumber | FirstName
------------|----------
02038245161 | Jon
02038245161 | Henry
02038881121 | Jamie
02038881121 | Jamie
```
From the above table I want to select all records apart from the first two as the phone number has been used for more than one person. I should only get the last two rows back as the number is used twice but the person is the same.
I am currently trying the following:
```
SELECT cust.PhoneNumber, cust.FirstName, (SELECT COUNT(PhoneNumber) FROM Customers WHERE PhoneNumber = cust.PhoneNumber AND FirstName <> cust.FirstName)
FROM Customers cust
Group By cust.PhoneNumber, cust.FirstName
```
This for some reason is giving me an error stating that the sub query is returning more than one results however i'm not sure why as it should be returning a single count.

pietlinden

SSC Guru

Points: 63686
More actions
September 12, 2021 at 2:13 am
Answer
#3928855
Something like this? (I may be oversimplifying something...)
-- exclude when phone # refers to multiple people.
```
-- exclude when phone # refers to multiple people.
SELECT * FROM CustomerPolicies cp
WHERE cp.Mobile NOT IN
(SELECT Mobile 
FROM CustomerPolicies
GROUP BY Mobile
HAVING COUNT(DISTINCT(FirstName))>1);
```

Viewing 14 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply

Phil Parkin SSC Guru Points: 247225 More actions · Answer 1

If you would like a coded solution, please provide your sample data in the form of CREATE TABLE/INSERT statements.

Please also confirm the logic for which row should be returned in the case of a duplicate phone number, remembering that there is no concept of 'first' in SQL Server without an ORDER BY.

J Livingston SQL SSC Guru Points: 51550 More actions · Answer 2

what do you want return if you had these rows

02038881121 | Jamie
02038881121 | Jamie
02038881121 | John
02038881121 | Jack

________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day

MattNorman88 SSC Veteran Points: 211 More actions · Answer 3

Hi Phil,

Thanks for the reply.

The create/ insert would be as follows:

CREATE TABLE CustomerPolicies
(
[Mobile] NVARCHAR(20),
[WorkTelephone] NVARCHAR(20),
[HomeTelephone] NVARCHAR(20),
[FirstName] NVARCHAR(30),
[LastName] NVARCHAR(30),
[PolicyNumber] INT
)

INSERT INTO CustomerPolicies VALUES ('07972234564', 'N/A', 'N/A', 'Jon', 'Evans', 32323232)
INSERT INTO CustomerPolicies VALUES ('07972234564', 'N/A', 'N/A', 'Tom', 'Davies', 34343434)
INSERT INTO CustomerPolicies VALUES ('07234322322', 'N/A', 'N/A', 'Will', 'Jones', 12121212)
INSERT INTO CustomerPolicies VALUES ('07234322322', 'N/A', 'N/A', 'Will', 'Jones', 13131313)

The SELECT query would need to return the second two rows only and not the first two as the first two have a phone number that is used for two different customers.

The second two rows have the same phone number but also the same customer so I would want them both returned as they are for two different product policies,

MattNorman88 SSC Veteran Points: 211 More actions · Answer 4

J Livingston SQL wrote:

what do you want return if you had these rows

02038881121 | Jamie
02038881121 | Jamie
02038881121 | John
02038881121 | Jack

I would want the first two rows returned as they are the same number AND same customer.

The second two rows are the same number but different customers so I would want to exclude them.

The underlying data that I am working with has some flaws in that product resellers sometimes put their phone number against the customer data and as such I am currently getting multiple matches to a phone number for different customers.

If I can filter those records out then I don;t have to worry as much about matching the wrong customer to a phone number.

Phil Parkin SSC Guru Points: 247225 More actions · Answer 5

MattNorman88 wrote:

J Livingston SQL wrote:
what do you want return if you had these rows

02038881121 | Jamie
02038881121 | Jamie
02038881121 | John
02038881121 | Jack
I would want the first two rows returned as they are the same number AND same customer.
The second two rows are the same number but different customers so I would want to exclude them.
The underlying data that I am working with has some flaws in that product resellers sometimes put their phone number against the customer data and as such I am currently getting multiple matches to a phone number for different customers.
If I can filter those records out then I don;t have to worry as much about matching the wrong customer to a phone number.

Once again, there is no such thing as FIRST here, unless there is some column we can use to ORDER BY. Just because they appear first in a SELECT * does not imply any sort of implicit ordering.

Mr. Brian Gale One Orange Chip Points: 25026 More actions · Answer 6

If I am understanding your request properly, something like this should work, right:

WITH [cte]
AS
(
SELECT
[Mobile]
, [WorkTelephone]
, [HomeTelephone]
, [FirstName]
, [LastName]
, [PolicyNumber]
, COUNT(1) OVER (PARTITION BY
 [Mobile]
 , [FirstName]
 , [LastName]
) AS [DistinctCount]
 FROM[CustomerPolicies]
)
 SELECT
[cte].[Mobile]
, [cte].[WorkTelephone]
, [cte].[HomeTelephone]
, [cte].[FirstName]
, [cte].[LastName]
, [cte].[PolicyNumber]
, [cte].[DistinctCount]
  FROM[cte]
  WHERE [cte].[DistinctCount] > 1;

MAY need to tweak that windowing function to cover all of the phone numbers; I was just writing it to work with the data. The other problem you run into is when you have data like:

Column A    | ColumnB
PhoneNumber | FirstName
------------|----------
02038245161 | Jon
02038245162 | Henry
02038881121 | Jamie
02038881121 | Jamie

Here Jon and Henry have different phone numbers, but my above query would exclude them and ONLY give you Jamie. But that is easy to correct with a second windowing function that is a count on JUST the mobile number and the final where gets that added to it when its count is 1.

I'd recommend renaming the cte to something that makes more sense to you though. I am not a big fan of CTE's named "cte" except for demo purposes.

The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

MattNorman88 SSC Veteran Points: 211 More actions · Answer 7

Mr. Brian Gale wrote:

If I am understanding your request properly, something like this should work, right:
WITH [cte]
AS
(
SELECT
[Mobile]
, [WorkTelephone]
, [HomeTelephone]
, [FirstName]
, [LastName]
, [PolicyNumber]
, COUNT(1) OVER (PARTITION BY
 [Mobile]
 , [FirstName]
 , [LastName]
) AS [DistinctCount]
 FROM[CustomerPolicies]
)
 SELECT
[cte].[Mobile]
, [cte].[WorkTelephone]
, [cte].[HomeTelephone]
, [cte].[FirstName]
, [cte].[LastName]
, [cte].[PolicyNumber]
, [cte].[DistinctCount]
  FROM[cte]
  WHERE [cte].[DistinctCount] > 1;
MAY need to tweak that windowing function to cover all of the phone numbers; I was just writing it to work with the data. The other problem you run into is when you have data like:
Column A    | ColumnB
PhoneNumber | FirstName
------------|----------
02038245161 | Jon
02038245162 | Henry
02038881121 | Jamie
02038881121 | Jamie
Here Jon and Henry have different phone numbers, but my above query would exclude them and ONLY give you Jamie. But that is easy to correct with a second windowing function that is a count on JUST the mobile number and the final where gets that added to it when its count is 1.
I'd recommend renaming the cte to something that makes more sense to you though. I am not a big fan of CTE's named "cte" except for demo purposes.

Hi Brian,

Thanks for you for the example.

This works for the most part however it excludes records where there is a single unique match for mobile number, first name and last name.

I am only trying to exclude records where the same number appears against multiple different people.

Regards

Matt

MattNorman88 SSC Veteran Points: 211 More actions · Answer 8

pietlinden wrote:

Something like this? (I may be oversimplifying something...)
-- exclude when phone # refers to multiple people.
-- exclude when phone # refers to multiple people.
SELECT * FROM CustomerPolicies cp
WHERE cp.Mobile NOT IN
(SELECT Mobile 
FROM CustomerPolicies
GROUP BY Mobile
HAVING COUNT(DISTINCT(FirstName))>1);

Thanks pietlinden,

Nice and simple and does exactly what I need.

Jeff Moden SSC Guru Points: 1004749 More actions · Answer 9

MattNorman88 wrote:

Hi Phil,
Thanks for the reply.
The create/ insert would be as follows:
CREATE TABLE CustomerPolicies
(
[Mobile] NVARCHAR(20),
[WorkTelephone] NVARCHAR(20),
[HomeTelephone] NVARCHAR(20),
[FirstName] NVARCHAR(30),
[LastName] NVARCHAR(30),
[PolicyNumber] INT
)

INSERT INTO CustomerPolicies VALUES ('07972234564', 'N/A', 'N/A', 'Jon', 'Evans', 32323232)
INSERT INTO CustomerPolicies VALUES ('07972234564', 'N/A', 'N/A', 'Tom', 'Davies', 34343434)
INSERT INTO CustomerPolicies VALUES ('07234322322', 'N/A', 'N/A', 'Will', 'Jones', 12121212)
INSERT INTO CustomerPolicies VALUES ('07234322322', 'N/A', 'N/A', 'Will', 'Jones', 13131313)
The SELECT query would need to return the second two rows only and not the first two as the first two have a phone number that is used for two different customers.
The second two rows have the same phone number but also the same customer so I would want them both returned as they are for two different product policies,

MattNorman88 wrote:

pietlinden wrote:
Something like this? (I may be oversimplifying something...)
-- exclude when phone # refers to multiple people.
-- exclude when phone # refers to multiple people.
SELECT * FROM CustomerPolicies cp
WHERE cp.Mobile NOT IN
(SELECT Mobile 
FROM CustomerPolicies
GROUP BY Mobile
HAVING COUNT(DISTINCT(FirstName))>1);
Thanks pietlinden,
Nice and simple and does exactly what I need.

Here's another possibility. The difference is that it only does a single table scan and some "Lazy Spools". The Actual Execution Plan indicates that should be about twice as fast but I never trust the execution plan to be correct there. I just don't have enough data for a performance test so, if you have the time to do so, please try it out and let us know how they both worked for you.

   WITH cteEnumerate AS
(
    SELECT *,NameCnt = COUNT(*) OVER (PARTITION BY Mobile,FirstName,LastName)
    FROM dbo.CustomerPolicies
)
 SELECT * FROM cteEnumerate WHERE NameCnt > 1
;

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004749 More actions · Answer 10

Looking a bit closer at the code, there' may be a fly in the ointment of pietlinden's code. Give the following a run and see.

[EDIT] I DISCOVERED THE CODE THAT I POSTED BELOW DOES NOT WORK FOR ALL SITUATIONS BUT THE MOD TO PIETLINDEN'S CODE DOES HANDLE ALL SITUATIONS. PLEASE SEE THE POST FOLLOWING THIS ONE!

I left my code here as an example of a mistake to avoid.

--===== Create the modified test data
   DROP TABLE IF EXISTS dbo.CustomerPolicies;
 CREATE TABLE dbo.CustomerPolicies
        (
         Mobile         NVARCHAR(20)
        ,WorkTelephone  NVARCHAR(20)
        ,HomeTelephone  NVARCHAR(20)
        ,FirstName      NVARCHAR(30)
        ,LastName       NVARCHAR(30)
        ,PolicyNumber   INT
        )
;
 INSERT INTO CustomerPolicies
        (Mobile,WorkTelephone,HomeTelephone,FirstName,LastName,PolicyNumber)
 VALUES  ('07972234564','N/A','N/A','Jon' ,'Evans' ,32323232)
        ,('07972234564','N/A','N/A','Jon' ,'Jones' ,98755432) --Added this row
      --,('07972234564','N/A','N/A','Tom' ,'Davies',34343434) --Removed this
        ,('07234322322','N/A','N/A','Will','Jones' ,12121212)
        ,('07234322322','N/A','N/A','Will','Jones' ,13131313)
;
--===== This code only looks at the first name and so produces
     -- the wrong answer according to your request.
SELECT * FROM CustomerPolicies cp
WHERE cp.Mobile NOT IN
(SELECT Mobile 
FROM CustomerPolicies
GROUP BY Mobile
HAVING COUNT(DISTINCT(FirstName))>1)
;
PRINT 'EDIT.  UPON FURTHER TESTING, THIS CODE DOES NOT WORK FOR ALL SITUATIONS';
PRINT '       SEE THE POST BELOW FOR PROOF OF THAT AND THAT THE MOD TO PIETLINDEN"S CODE DOES!';
   WITH cteEnumerate AS
(
    SELECT *,NameCnt = COUNT(*) OVER (PARTITION BY Mobile,FirstName,LastName)
    FROM dbo.CustomerPolicies
)
 SELECT * FROM cteEnumerate WHERE NameCnt > 1
;

If it turns out that pietlinden's is faster, then you need to change his "Having" clause to include a concatenation of FirstName and LastName and it'll work just fine. Like this. Just remember to test it for performance again because concatenation can be a little expensive.

SELECT * FROM CustomerPolicies cp
WHERE cp.Mobile NOT IN
(SELECT Mobile 
FROM CustomerPolicies
GROUP BY Mobile
HAVING COUNT(DISTINCT(FirstName+LastName))>1)
;

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004749 More actions · Answer 11

AND, looking a bit closer, my code doesn't work correctly using the following data but the mod to pietlinden's code does, so use his code with the mod.

Here's the code I used to discover my error and that the mod to pietlinden's code works. Use the mod to HIS code and not mine.

--===== Create the modified test data
   DROP TABLE IF EXISTS dbo.CustomerPolicies;
 CREATE TABLE dbo.CustomerPolicies
        (
         Mobile         NVARCHAR(20)
        ,WorkTelephone  NVARCHAR(20)
        ,HomeTelephone  NVARCHAR(20)
        ,FirstName      NVARCHAR(30)
        ,LastName       NVARCHAR(30)
        ,PolicyNumber   INT
        )
;
 INSERT INTO CustomerPolicies
        (Mobile,WorkTelephone,HomeTelephone,FirstName,LastName,PolicyNumber)
 VALUES  ('07972234564','N/A','N/A','Jon' ,'Evans' ,32323232)
        ,('07972234564','N/A','N/A','Jon' ,'Jones' ,98755432) --Added this row
        ,('07972234564','N/A','N/A','Tom' ,'Davies',34343434) 
        ,('07972234564','N/A','N/A','Tom' ,'Davies',34343434) --Added this row
        ,('07234322322','N/A','N/A','Will','Jones' ,12121212)
        ,('07234322322','N/A','N/A','Will','Jones' ,13131313)

;
--===== This code does NOT work correctly for this example so don't use it.
     -- And my apologies for thinking that it worked before.
   WITH cteEnumerate AS
(
    SELECT *,NameCnt = COUNT(*) OVER (PARTITION BY Mobile,FirstName,LastName)
    FROM dbo.CustomerPolicies
)
 SELECT * FROM cteEnumerate WHERE NameCnt > 1
;
--===== The mod to pietlindens' code DOES work so use it instead.
SELECT * FROM CustomerPolicies cp
WHERE cp.Mobile NOT IN
(SELECT Mobile 
FROM CustomerPolicies
GROUP BY Mobile
HAVING COUNT(DISTINCT(FirstName+LastName))>1)
;

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

ScottPletcher SSC Guru Points: 101244 More actions · Answer 12

Jeff Moden wrote:

AND, looking a bit closer, my code doesn't work correctly using the following data but the mod to pietlinden's code does, so use his code with the mod.

Here's the code I used to discover my error and that the mod to pietlinden's code works. Use the mod to HIS code and not mine.

--===== Create the modified test data
   DROP TABLE IF EXISTS dbo.CustomerPolicies;
 CREATE TABLE dbo.CustomerPolicies
        (
         Mobile         NVARCHAR(20)
        ,WorkTelephone  NVARCHAR(20)
        ,HomeTelephone  NVARCHAR(20)
        ,FirstName      NVARCHAR(30)
        ,LastName       NVARCHAR(30)
        ,PolicyNumber   INT
        )
;
 INSERT INTO CustomerPolicies
        (Mobile,WorkTelephone,HomeTelephone,FirstName,LastName,PolicyNumber)
 VALUES  ('07972234564','N/A','N/A','Jon' ,'Evans' ,32323232)
        ,('07972234564','N/A','N/A','Jon' ,'Jones' ,98755432) --Added this row
        ,('07972234564','N/A','N/A','Tom' ,'Davies',34343434) 
        ,('07972234564','N/A','N/A','Tom' ,'Davies',34343434) --Added this row
        ,('07234322322','N/A','N/A','Will','Jones' ,12121212)
        ,('07234322322','N/A','N/A','Will','Jones' ,13131313)

;
--===== This code does NOT work correctly for this example so don't use it.
     -- And my apologies for thinking that it worked before.
   WITH cteEnumerate AS
(
    SELECT *,NameCnt = COUNT(*) OVER (PARTITION BY Mobile,FirstName,LastName)
    FROM dbo.CustomerPolicies
)
 SELECT * FROM cteEnumerate WHERE NameCnt > 1
;
--===== The mod to pietlindens' code DOES work so use it instead.
SELECT * FROM CustomerPolicies cp
WHERE cp.Mobile NOT IN
(SELECT Mobile 
FROM CustomerPolicies
GROUP BY Mobile
HAVING COUNT(DISTINCT(FirstName+LastName))>1)
;

That code is not safe either. That's because two (or more) different FirstName and LastName separately could be the same once they are concatenated.

Some examples:

Ric Edelman / Rice Delman

Hank Nudson / Han Knudson

etc.

I suggest adding a vertical bar, say, between the names:

...DISTINCT(FirstName+'|'+LastName)...

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".