SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Distinct of two columns?


Distinct of two columns?

Author
Message
Tacy Highland
Tacy Highland
Ten Centuries
Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)

Group: General Forum Members
Points: 1328 Visits: 518
Hello all,

Hoping someone out there can help with this because I'm just not seeing the answer for some reason, and it seems like it should be a fairly easy one. If I can just get some ideas on the approach, I can work through it....

I have one table with trip information, i.e. tripID, pickup address, dropoff address for each trip for a day. For a typical rider, they will have 2 trips, one outbound and one returning, and for the return trip generally the dropoff address and pickup address will just be swapped from the outbound trip. Most, but not all. I need to find the distinct trips between pickup address and dropoff address --even if they're reversed as dropoff to pickup. Trying to identify just the unique trips between two addresses (regardless of whether they are pickup or dropoff address).

In other words, John Doe shows up in this table/dataset with two records, one trip with pickup address of 1234 5th ave s and dropoff address of 789 10th ave se, and one record with pickup address of 789 10th ave se and dropoff address of 1234 5th ave s. I am only looking for the distinct combination of these two addresses once.

Tried concatenating the pickup and dropoff address into another field in one temp table, swapping the dropoff and pickup to concatenate again for another temp table, and then get distinct but that didn't quite work as expected.
I'm not sure why I'm not seeing the answer here.

Anyone have any ideas?
aaron.reese
aaron.reese
SSCertifiable
SSCertifiable (6.4K reputation)SSCertifiable (6.4K reputation)SSCertifiable (6.4K reputation)SSCertifiable (6.4K reputation)SSCertifiable (6.4K reputation)SSCertifiable (6.4K reputation)SSCertifiable (6.4K reputation)SSCertifiable (6.4K reputation)

Group: General Forum Members
Points: 6385 Visits: 1028
Tracey,

I think you will need to post DDL and data for us to help you.Otherwise the short answer is "Because they don't match so are not distinct"

Could you build a CTE that includes a RANK() OVER(Partition by Rider, Date ORDER BY TripID )

You can then join the CTE to itself


SELECT
CTE1.*
,CTE2.*
FROM
CTE CTE1
LEFT JOIN
CTE CTE2 ON CTE2.Rider = CTE1.Rider
AND CTE2.Date = CTE1.Date
AND CTE2.Rank > CTE1.Rank
AND CTE2.Pickup = CTE1.Dropoff
AND CTE2.Dropoff = CTE1.Pickup
ORDER BY
CTE1.Rider
,CTE1.Date
,CTE1.Rank


This will give you all of the trips and the return trip if it exists

You will get multiple joins if the rider can do the same round trip more than once in a day but this should at least indicate where the round trips are and why they are not displaying correctly in your distinct
Grant Fritchey
Grant Fritchey
SSC Guru
SSC Guru (221K reputation)SSC Guru (221K reputation)SSC Guru (221K reputation)SSC Guru (221K reputation)SSC Guru (221K reputation)SSC Guru (221K reputation)SSC Guru (221K reputation)SSC Guru (221K reputation)

Group: General Forum Members
Points: 221515 Visits: 33556
How do you identify a return trip of the second address is different? You didn't mention it, but do you also have a RiderID or something along those lines? Otherwise, let's say two riders come along. RiderA goes from LocationA to LocationB. RiderB goes from LocationC to LocationB. Now, RiderA goes home, LocationB to LocationA. RiderB follows RiderA home, LocationB to LocationA. How would we differentiate those two? It would look like a single outbound with two inbound trips and the other outbound would have no inbound.

----------------------------------------------------
The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood...
Theodore Roosevelt

The Scary DBA
Author of: SQL Server Query Performance Tuning and SQL Server Execution Plans
Product Evangelist for Red Gate Software
drew.allen
drew.allen
SSC-Dedicated
SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)

Group: General Forum Members
Points: 36659 Visits: 13648
Isn't it as simple as a CASE expression?

SELECT CASE WHEN pickup_address < dropoff_address THEN pickup_address ELSE dropoff_address END AS address1,
CASE WHEN pickup_address < dropoff_address THEN dropoff_address ELSE pickup_address END AS address2
FROM YourTable


Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
How to post data/code on a forum to get the best help.
How to Post Performance Problems
Make sure that you include code in the appropriate IFCode tags, e.g. [code=sql]<your code here>[/code]. You can find the IFCode tags under the INSERT options when you are writing a post.
Tacy Highland
Tacy Highland
Ten Centuries
Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)

Group: General Forum Members
Points: 1328 Visits: 518

Thanks for the info. Maybe this will clarify:

My ultimate goal is to only find the distinct combination of two addresses in this table. (Regardless of the passenger or TripID or whether the addresses show up as outbound trip or return trip.)

I tried concatenating ((PAddress + ' ' + DAddress) [Trip]) to make a new column ("Trip") which should have distinct values but that doesn't help me to narrow down things as it will of course still show two records for each "trip" between two addresses. I only want to see one.
Here's some sample data: (sorry for the weird format spacing, not sure what's going on here)

CREATE TABLE #TRIPS


(TRIPID INT


,PICKUP_ADDRESS VARCHAR(50)


,DROPOFF_ADDRESS VARCHAR(50)


)


INSERT INTO #TRIPS





VALUES




(23084416


,'100 MELROSE AVE E'


,'915 NW 45TH ST')


INSERT INTO #TRIPS





VALUES




(23079056


,'1001 5TH AVE'


,'18100 NE 95TH ST')


INSERT INTO #TRIPS





VALUES




(23084870


,'3252 64TH AVE SW'


,'1524 S 328TH ST')


INSERT INTO #TRIPS





VALUES




(23059443


,'915 NW 45TH ST'


,'100 MELROSE AVE E')


INSERT INTO #TRIPS





VALUES




(23081731


,'7100 CALIFORNIA AVE SW'


,'1001 5TH AVE')


INSERT INTO #TRIPS





VALUES




(23086885


,'18100 NE 95TH ST'


,'1001 5TH AVE')


INSERT INTO #TRIPS





VALUES




(23064853


,'1524 S 328TH ST'


,'3252 64TH AVE SW')


INSERT INTO #TRIPS





VALUES




(23084523


,'1001 5TH AVE'


,'7100 CALIFORNIA AVE SW')


SELECT *


FROM #TRIPS


DROP TABLE #TRIPS

What I'm hoping to get as the resultset is something like this:

CREATE TABLE #EXPECTEDRESULTS


(PICKUP_ADDRESS VARCHAR(50)


,DROPOFF_ADDRESS VARCHAR(50)


)


INSERT INTO #EXPECTEDRESULTS





VALUES




('1001 5TH AVE'


,'7100 CALIFORNIA AVE SW')


INSERT INTO #EXPECTEDRESULTS





VALUES




('100 MELROSE AVE E'


,'915 NW 45TH ST')


INSERT INTO #EXPECTEDRESULTS





VALUES




('1001 5TH AVE'


,'18100 NE 95TH ST')


INSERT INTO #EXPECTEDRESULTS





VALUES




('3252 64TH AVE SW'


,'1524 S 328TH ST')



Notice that doesn't return every combination of the addresses, since most of these have two (outbound and return).
Does that make more sense?


Tacy Highland
Tacy Highland
Ten Centuries
Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)

Group: General Forum Members
Points: 1328 Visits: 518
Drew.allen - I think that might be it!

I dumped the CASE data into a temp table and added a SELECT DISTINCT ADDRESS1, ADDRESS2 to get the distinct list and I think this works. I used this query against the sample data I posted and it returns the same address sets as my ExpectedResults table.

I never would have thought to use < to compare.

Here's another challenge on this, should you all choose to accept it.....

What if there were additional columns in the table, such as City, Lat, and Lon for each record? So expand the original table from TripID, Pickup_Address, Dropoff_Address to TripID, Pickup_Address, PickupCity, Pickup_State, Pickup_Lat, Pickup_Lon, Dropoff_Address, Dropoff_City, Dropoff_State, Dropoff_Lat, Dropoff_Lon.

I can't imaging that the < compare would still work here, would it?
drew.allen
drew.allen
SSC-Dedicated
SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)

Group: General Forum Members
Points: 36659 Visits: 13648
tacy.highland - Tuesday, December 19, 2017 9:43 AM
Drew.allen - I think that might be it!

I dumped the CASE data into a temp table and added a SELECT DISTINCT ADDRESS1, ADDRESS2 to get the distinct list and I think this works. I used this query against the sample data I posted and it returns the same address sets as my ExpectedResults table.

I never would have thought to use < to compare.

Here's another challenge on this, should you all choose to accept it.....

What if there were additional columns in the table, such as City, Lat, and Lon for each record? So expand the original table from TripID, Pickup_Address, Dropoff_Address to TripID, Pickup_Address, PickupCity, Pickup_State, Pickup_Lat, Pickup_Lon, Dropoff_Address, Dropoff_City, Dropoff_State, Dropoff_Lat, Dropoff_Lon.

I can't imaging that the < compare would still work here, would it?

Yes, it does. You just have to make sure that you are using the same field(s) for all of the comparisons. Since you have latitude and longitude, I would be inclined to use those, since they would be less subject to variability. That is, the same address can be given several different ways, and it might be hard to determine that they are actually the same, but a latitude and longitude will always be the same within a certain degree of accuracy.

Drew


J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
How to post data/code on a forum to get the best help.
How to Post Performance Problems
Make sure that you include code in the appropriate IFCode tags, e.g. [code=sql]<your code here>[/code]. You can find the IFCode tags under the INSERT options when you are writing a post.
Tacy Highland
Tacy Highland
Ten Centuries
Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)

Group: General Forum Members
Points: 1328 Visits: 518
How would the CASE statement look? Would it be a case for each column, something like this?:

SELECT CASE WHEN pickup_address < dropoff_address THEN pickup_address ELSE dropoff_address END AS address1,
CASE WHEN pickup_address < dropoff_address THEN dropoff_address ELSE pickup_address END AS address2,
CASE WHEN pickup_city < dropoff_city THEN pickup_city ELSE dropoff_city END as city1,
CASE WHEN pickup_city < dropoff_city THEN dropoff_city ELSE pickup_city END as city2,
CASE WHEN pickup_Lat < dropoff_Lat THEN pickup_Lat ELSE dropoff_Lat END as Lat1,
........

FROM YourTable

Lynn Pettis
Lynn Pettis
SSC Guru
SSC Guru (223K reputation)SSC Guru (223K reputation)SSC Guru (223K reputation)SSC Guru (223K reputation)SSC Guru (223K reputation)SSC Guru (223K reputation)SSC Guru (223K reputation)SSC Guru (223K reputation)

Group: General Forum Members
Points: 223565 Visits: 40412
tacy.highland - Tuesday, December 19, 2017 9:22 AM

Thanks for the info. Maybe this will clarify:

My ultimate goal is to only find the distinct combination of two addresses in this table. (Regardless of the passenger or TripID or whether the addresses show up as outbound trip or return trip.)

I tried concatenating ((PAddress + ' ' + DAddress) [Trip]) to make a new column ("Trip") which should have distinct values but that doesn't help me to narrow down things as it will of course still show two records for each "trip" between two addresses. I only want to see one.
Here's some sample data: (sorry for the weird format spacing, not sure what's going on here)

CREATE TABLE #TRIPS


(TRIPID INT


,PICKUP_ADDRESS VARCHAR(50)


,DROPOFF_ADDRESS VARCHAR(50)


)


INSERT INTO #TRIPS





VALUES




(23084416


,'100 MELROSE AVE E'


,'915 NW 45TH ST')


INSERT INTO #TRIPS





VALUES




(23079056


,'1001 5TH AVE'


,'18100 NE 95TH ST')


INSERT INTO #TRIPS





VALUES




(23084870


,'3252 64TH AVE SW'


,'1524 S 328TH ST')


INSERT INTO #TRIPS





VALUES




(23059443


,'915 NW 45TH ST'


,'100 MELROSE AVE E')


INSERT INTO #TRIPS





VALUES




(23081731


,'7100 CALIFORNIA AVE SW'


,'1001 5TH AVE')


INSERT INTO #TRIPS





VALUES




(23086885


,'18100 NE 95TH ST'


,'1001 5TH AVE')


INSERT INTO #TRIPS





VALUES




(23064853


,'1524 S 328TH ST'


,'3252 64TH AVE SW')


INSERT INTO #TRIPS





VALUES




(23084523


,'1001 5TH AVE'


,'7100 CALIFORNIA AVE SW')


SELECT *


FROM #TRIPS


DROP TABLE #TRIPS

What I'm hoping to get as the resultset is something like this:

CREATE TABLE #EXPECTEDRESULTS


(PICKUP_ADDRESS VARCHAR(50)


,DROPOFF_ADDRESS VARCHAR(50)


)


INSERT INTO #EXPECTEDRESULTS





VALUES




('1001 5TH AVE'


,'7100 CALIFORNIA AVE SW')


INSERT INTO #EXPECTEDRESULTS





VALUES




('100 MELROSE AVE E'


,'915 NW 45TH ST')


INSERT INTO #EXPECTEDRESULTS





VALUES




('1001 5TH AVE'


,'18100 NE 95TH ST')


INSERT INTO #EXPECTEDRESULTS





VALUES




('3252 64TH AVE SW'


,'1524 S 328TH ST')



Notice that doesn't return every combination of the addresses, since most of these have two (outbound and return).
Does that make more sense?



So, something like this:


CREATE TABLE #TRIPS
(TRIPID INT
,PICKUP_ADDRESS VARCHAR(50)
,DROPOFF_ADDRESS VARCHAR(50)
);

INSERT INTO #TRIPS ([TRIPID],[PICKUP_ADDRESS],[DROPOFF_ADDRESS])
VALUES
(23084416,'100 MELROSE AVE E','915 NW 45TH ST')
, (23079056,'1001 5TH AVE','18100 NE 95TH ST')
, (23084870,'3252 64TH AVE SW','1524 S 328TH ST')
, (23059443,'915 NW 45TH ST','100 MELROSE AVE E')
, (23081731,'7100 CALIFORNIA AVE SW','1001 5TH AVE')
, (23086885,'18100 NE 95TH ST','1001 5TH AVE')
, (23064853,'1524 S 328TH ST','3252 64TH AVE SW')
, (23084523,'1001 5TH AVE','7100 CALIFORNIA AVE SW');


SELECT DISTINCT
[ca1].[Address1]
,[ca1].[Address2]
FROM
#TRIPS t
CROSS APPLY (SELECT MIN(dt.[Address]), MAX(dt.[Address]) FROM (VALUES (t.[PICKUP_ADDRESS]),(t.[DROPOFF_ADDRESS]))dt([Address]))ca1([Address1],[Address2]);

DROP TABLE #TRIPS;




Cool
Lynn Pettis

For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here or when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here and here
Managing Transaction Logs

SQL Musings from the Desert Fountain Valley SQL (My Mirror Blog)
Lynn Pettis
Lynn Pettis
SSC Guru
SSC Guru (223K reputation)SSC Guru (223K reputation)SSC Guru (223K reputation)SSC Guru (223K reputation)SSC Guru (223K reputation)SSC Guru (223K reputation)SSC Guru (223K reputation)SSC Guru (223K reputation)

Group: General Forum Members
Points: 223565 Visits: 40412
Hello? Any feedback?

Cool
Lynn Pettis

For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here or when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here and here
Managing Transaction Logs

SQL Musings from the Desert Fountain Valley SQL (My Mirror Blog)
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum








































































































































































SQLServerCentral


Search