Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

Grouping by pairs, order agnostic Expand / Collapse
Author
Message
Posted Monday, February 17, 2014 11:36 AM


SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Yesterday @ 7:41 AM
Points: 41, Visits: 377
I'm guessing there's probably a simple and elegant way to accomplish this in T-SQL, just not sure how.

I have a large quantity of information regarding travel routes - lots of pairs involving origin and destination IDs. My end goal is to be able to group by route, regardless of direction. This would mean that a trip from location 1 to location 2, and a return trip from 2 to 1, would both fall into the same group.

My best idea for accomplishing this so far is to create a table containing origins and destinations in both orders, and then assign a route ID for each:

Origin | Destination | RouteID
-------+-------------+---------
1 | 2 | 1
2 | 1 | 1
2 | 3 | 2
3 | 2 | 2
1 | 3 | 3
3 | 1 | 3

Populating the origin and destination columns via a cartesian join is simple, but calculating the RouteID column efficiently is eluding me. Is there a better way to do this other than using loops?


_______________________________________________________________________________________________
Bob Pusateri
Microsoft Certified Master: SQL Server 2008

Blog: The Outer Join
Twitter: @SQLBob
Post #1542237
Posted Monday, February 17, 2014 12:56 PM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Yesterday @ 2:24 PM
Points: 11,990, Visits: 11,007
Can you provide some ddl and sample data? It sounds like your origin and destination columns are derived? If your pattern of data is consistent maybe you can just a case expression?

case when Origin < Destination then Origin else Destination end as RouteID

I suspect it isn't really quite that straightforward but I bet there is something in the base table to indicate if it is return trip or not.


_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Moden's splitter.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs
Understanding and Using APPLY (Part 1)
Understanding and Using APPLY (Part 2)
Post #1542252
Posted Monday, February 17, 2014 5:27 PM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: 2 days ago @ 6:03 PM
Points: 3,590, Visits: 5,098
Sean Lange (2/17/2014)
Can you provide some ddl and sample data? It sounds like your origin and destination columns are derived? If your pattern of data is consistent maybe you can just a case expression?

case when Origin < Destination then Origin else Destination end as RouteID

I suspect it isn't really quite that straightforward but I bet there is something in the base table to indicate if it is return trip or not.


Other than the fact that "as RouteID" wouldn't be valid in a GROUP BY clause, I think Sean's proposed solution hits the mark.

You might also want to take a look at this article: Departures from Origins and Arrivals at Destinations . While it doesn't address this specific problem, you might find in it some interesting solutions to other problems you'll likely encounter.



My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!
Post #1542315
Posted Monday, February 17, 2014 5:41 PM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: 2 days ago @ 6:03 PM
Points: 3,590, Visits: 5,098
Ooops! Silly me, I just realized why that won't work. Try this (assuming this is what Sean meant):

WITH SampleData (Origin, Destination) AS
(
SELECT 1, 2
UNION ALL SELECT 2, 1
UNION ALL SELECT 2, 3
UNION ALL SELECT 3, 2
UNION ALL SELECT 1, 3
UNION ALL SELECT 3, 1
UNION ALL SELECT 1, 4
UNION ALL SELECT 4, 1
)
SELECT d1=MIN(Origin), d2=MAX(Destination)
FROM SampleData
GROUP BY CASE WHEN Origin < Destination THEN Origin ELSE Destination END;


But I think this will work (combine origin and destination in whatever fashion makes sense for you):

WITH SampleData (Origin, Destination) AS
(
SELECT 1, 2
UNION ALL SELECT 2, 1
UNION ALL SELECT 2, 3
UNION ALL SELECT 3, 2
UNION ALL SELECT 1, 3
UNION ALL SELECT 3, 1
UNION ALL SELECT 1, 4
UNION ALL SELECT 4, 1
)
SELECT d1=MIN(Origin), d2=MAX(Destination)
FROM SampleData a
CROSS APPLY
(
SELECT d1=RIGHT(10000+Origin, 4)+RIGHT(10000+Destination, 4)
,d2=RIGHT(10000+Destination, 4)+RIGHT(10000+Origin, 4)
) b
GROUP BY CASE WHEN Origin < Destination THEN d1 ELSE d2 END;


I'd assume that your origins/destinations are either the 3 or 4 character airport codes, so a straight concatenate should do the trick.



My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!
Post #1542317
Posted Monday, February 17, 2014 7:12 PM


SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Yesterday @ 7:41 AM
Points: 41, Visits: 377
Thanks for all the tips! I came up with a multi-step solution that so far seems to get me what I need, and I'm happy to share. I'll do a blog post with some better examples and sample data shortly.

So for a table "TripData" with columns "Origin" and "Destination", we can see all trips (directionally) with the following query:

select distinct Origin, Destination
from TripData;

To express these trips uniformly regardless of direction, some case statements are necessary to do some swapping of their order. Origin and Destination are both integers, so some simple numeric comparisons do the trick. Distinct ensures we see each route only once.

select distinct
case when Origin<=Destination then Origin
when Origin>Destination then Destination
end as PtA,
case when Origin<=Destination then Destination
when Origin>Destination then Origin
end as PtB
from TripData;

Now that we have distinct routes, we can assign each a RouteID.
SELECT a.PtA, a.PtB, ROW_NUMBER() over (order by a.PtA, a.PtB) AS RouteID
into dbo.RouteIDSTG
from (
select distinct
case when Origin<=Destination then Origin
when Origin>Destination then Destination
end as PtA,
case when Origin<=Destination then Destination
when Origin>Destination then Origin
end as PtB
from TripData
)a;

With that done, we can create a lookup table that contains each trip (directionally) along with the route ID for easy joining.
WITH AllPairs as (
SELECT Origin, Destination,
Case when Origin<=Destination then Origin
when Origin>Destination then Destination
end as PtA,
case when Origin<=Destination then Destination
when Origin>Destination then Origin
end as PtB
from TripData
)
SELECT distinct ap.Origin, ap.Destination, r.RouteID
into RouteIDMaster
from AllPairs ap
inner join dbo.RouteIDSTG r on r.PtA = ap.PtA and r.PtB = ap.PtB;

Now the TripData table can be joined to RouteIDMaster on Origin and Destination columns and grouped by RouteID.



_______________________________________________________________________________________________
Bob Pusateri
Microsoft Certified Master: SQL Server 2008

Blog: The Outer Join
Twitter: @SQLBob
Post #1542331
Posted Monday, February 17, 2014 7:20 PM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: 2 days ago @ 6:03 PM
Points: 3,590, Visits: 5,098
Looks like an awful lot of work and an awful lot of DISTINCTs. Wouldn't this be simpler?

WITH SampleData (Origin, Destination) AS
(
SELECT 1, 2
UNION ALL SELECT 2, 1
UNION ALL SELECT 2, 3
UNION ALL SELECT 3, 2
UNION ALL SELECT 1, 3
UNION ALL SELECT 3, 1
UNION ALL SELECT 1, 4
UNION ALL SELECT 4, 1
)
SELECT Origin, Destination, RouteID
FROM
(
SELECT d1=MIN(Origin), d2=MAX(Destination), RouteID=ROW_Number() OVER (ORDER BY (SELECT NULL))
FROM SampleData a
CROSS APPLY
(
SELECT d1=RIGHT(10000+Origin, 4)+RIGHT(10000+Destination, 4)
,d2=RIGHT(10000+Destination, 4)+RIGHT(10000+Origin, 4)
) b
GROUP BY CASE WHEN Origin < Destination THEN d1 ELSE d2 END
) a
CROSS APPLY
(
SELECT Origin, Destination
FROM SampleData
WHERE d1 IN (Origin, Destination) AND d2 IN (Origin, Destination)
) b;





My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!
Post #1542333
Posted Monday, February 17, 2014 8:27 PM


SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Yesterday @ 7:41 AM
Points: 41, Visits: 377
Your code is definitely simpler. But the performance (at least on my machine) is lacking. I just stopped it after letting it run for 10 minutes. My table has about 1.2M rows in it. I'll try tweaking it and see what develops!

_______________________________________________________________________________________________
Bob Pusateri
Microsoft Certified Master: SQL Server 2008

Blog: The Outer Join
Twitter: @SQLBob
Post #1542339
Posted Monday, February 17, 2014 8:32 PM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: 2 days ago @ 6:03 PM
Points: 3,590, Visits: 5,098
With some DDL including indexing on your table it might offer a clue.


My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!
Post #1542343
Posted Monday, February 17, 2014 9:29 PM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: 2 days ago @ 6:03 PM
Points: 3,590, Visits: 5,098
You're right! That was kind of doggy wasn't it? Try this instead:

CREATE TABLE dbo.OriginsDestinations
(
Origin INT
,Destination INT
,PRIMARY KEY (Origin, Destination)
);

WITH Tally (n) AS
(
SELECT TOP 500000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM sys.all_columns a CROSS JOIN sys.all_columns b
)
INSERT INTO dbo.OriginsDestinations
SELECT n, n+1
FROM Tally
UNION ALL
SELECT n+1, n
FROM Tally;

SET STATISTICS TIME ON;
WITH OD AS
(
SELECT d1=MIN(Origin), d2=MAX(Destination), RouteID=ROW_Number() OVER (ORDER BY (SELECT NULL))
FROM dbo.OriginsDestinations a
CROSS APPLY
(
SELECT d1=RIGHT(1000000+Origin, 6)+RIGHT(1000000+Destination, 6)
,d2=RIGHT(1000000+Destination, 6)+RIGHT(1000000+Origin, 6)
) b
GROUP BY CASE WHEN Origin < Destination THEN d1 ELSE d2 END
)
SELECT d1, d2, RouteID
INTO #Temp
FROM OD
UNION ALL
SELECT d2, d1, RouteID
FROM OD
SET STATISTICS TIME OFF;

GO
DROP TABLE #Temp;


I get these timing results:

 SQL Server Execution Times:
CPU time = 8454 ms, elapsed time = 12722 ms.

(1000000 row(s) affected)






My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!
Post #1542352
Posted Tuesday, February 18, 2014 9:31 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Yesterday @ 2:24 PM
Points: 11,990, Visits: 11,007
Thanks Dwain. You were right in what you thought I was going for. Seems that this issue is in good hands now.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Moden's splitter.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs
Understanding and Using APPLY (Part 1)
Understanding and Using APPLY (Part 2)
Post #1542652
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse