Aggregating query help

Question

Post reply

Aggregating query help

elogen

SSC Eights!

Points: 839
More actions
September 26, 2012 at 6:16 am

#393255

I have a table of possible trip routes like the following:
ID RouteID RouteName Origin Destination
1 1 ReturnTrip A B
2 1 ReturnTrip B C
3 1 ReturnTrip C B
4 1 ReturnTrip B A
5 2 ReturnTrip A B
6 2 ReturnTrip B C
7 2 ReturnTrip C B
8 2 ReturnTrip B A
9 3 ReturnTrip A D
10 3 ReturnTrip D C
11 3 ReturnTrip C D
12 3 ReturnTrip D A
13 4 ReturnTrip A D
14 4 ReturnTrip D E
15 4 ReturnTrip E C
16 4 ReturnTrip C E
17 4 ReturnTrip E D
18 4 ReturnTrip D A
I'm basically starting off at point A and ending up at point C but along the way I have the option of stopping off at X number of in between points.
So RouteID 1, I'm going from A to C via B and returning from C to A via B
Notice how RouteID 1 and RouteID 2 are the same routes, so I want to condense them into a single row and call it "ReturnTripViaB".
RouteID 3 is the same trip however I'm using a different route (going via point D). I want to treat this one as a separate row and call it "ReturnTripViaD".
RouteID 4 is the same trip however I'm stopping off at multiple points (D and E) along the way between A and C
So the resultant table should look like:
ID RouteName
1 ReturnTripViaB
2 ReturnTripViaD
3 ReturnTripViaDE
Also, since I want to keep the relationship between my aggregated table and my original table, I need another table
like the following:
ID RouteName RouteNameConcatRouteID
1 ReturnTripViaB ReturnTrip-1
2 ReturnTripViaB ReturnTrip-2
3 ReturnTripViaD ReturnTrip-3
4 ReturnTripViaDE ReturnTrip-4
I'm having trouble coming up with a query that will do this aggregation for me. Is anyone able to help?
Thanks very much.

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply

Eugene Elutin SSC Guru Points: 59322 More actions · Answer 1

May I ask you to provide a bit more details?

Here you can read tips, which should help to attract relevant answers to your questions:

http://www.sqlservercentral.com/articles/Best+Practices/61537/

Please help us to help you!

_____________________________________________
"The only true wisdom is in knowing you know nothing"
"O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
(So many miracle inventions provided by MS to us...)

How to post your question to get the best and quick help[/url]

aaron.reese SSChampion Points: 13498 More actions · Answer 2

create table test.elogen

(

ID int identity(1,1),

RouteID int,

Routename nvarchar(20),

Origin nvarchar(1),

Destination nvarchar(1)

)

insert into test.elogen values (1,'ReturnTrip','A','B')

insert into test.elogen values (1,'ReturnTrip','B','C')

insert into test.elogen values (1,'ReturnTrip','C','B')

insert into test.elogen values (1,'ReturnTrip','B','A')

insert into test.elogen values (2,'ReturnTrip','A','B')

insert into test.elogen values (2,'ReturnTrip','B','C')

insert into test.elogen values (2,'ReturnTrip','C','B')

insert into test.elogen values (2,'ReturnTrip','B','A')

insert into test.elogen values (3,'ReturnTrip','A','D')

insert into test.elogen values (3,'ReturnTrip','D','C')

insert into test.elogen values (3,'ReturnTrip','C','D')

insert into test.elogen values (3,'ReturnTrip','D','A')

insert into test.elogen values (4,'ReturnTrip','A','D')

insert into test.elogen values (4,'ReturnTrip','D','E')

insert into test.elogen values (4,'ReturnTrip','E','C')

insert into test.elogen values (4,'ReturnTrip','C','E')

insert into test.elogen values (4,'ReturnTrip','E','D')

insert into test.elogen values (4,'ReturnTrip','D','A')

I've been at this for an hour now and it is really hard with the data you have provided.

This would indicate that there is an issue with the structure of the data.:-P

The challenge is trying to identify the journey sequence. For Example in route 1 you go from B to C and B to A. SQL is not very good at identifying that you should go from B-C before B-A.

select

X.ID,

Y.ID

FROM

test.elogen X

join

test.elogen Y on x.routeid = y.routeid

and x.destination = y.origin

--and x.origin <> y.destination

including the inequality above causes the first of the return steps to be dropped, excluding the inequality means that you join the outward step to the inward step

You really need to have a step sequence indicator (index based on RouteID and stepnumber). I wouldn't want to rely in the identity column (ID) to enforce this sequence.

In the example data given, you always visit the outward points on the return journey, is this always the case or could you go

A-B-C-A (i.e. skip B on the way back)

If not you could indicate whether a leg is outward or inward, this would get over the sequence confusion above.

elogen SSC Eights! Points: 839 More actions · Answer 3

Thanks aaron,

Yes I fogot to include a vital column Direction which would indicate whether the journey was an Inbound or an Outbound.

So the code should look like this:

create table test.elogen

(

ID int identity(1,1),

RouteID int,

Routename nvarchar(20),

Origin nvarchar(1),

Destination nvarchar(1),

Direction nvarchar(2)

)

insert into test.elogen values (1,'ReturnTrip','A','B','OB')

insert into test.elogen values (1,'ReturnTrip','B','C','OB')

insert into test.elogen values (1,'ReturnTrip','C','B','IB')

insert into test.elogen values (1,'ReturnTrip','B','A','IB')

insert into test.elogen values (2,'ReturnTrip','A','B','OB')

insert into test.elogen values (2,'ReturnTrip','B','C','OB')

insert into test.elogen values (2,'ReturnTrip','C','B','IB')

insert into test.elogen values (2,'ReturnTrip','B','A','IB')

insert into test.elogen values (3,'ReturnTrip','A','D','OB')

insert into test.elogen values (3,'ReturnTrip','D','C','OB')

insert into test.elogen values (3,'ReturnTrip','C','D','IB')

insert into test.elogen values (3,'ReturnTrip','D','A','IB')

insert into test.elogen values (4,'ReturnTrip','A','D','OB')

insert into test.elogen values (4,'ReturnTrip','D','E','OB')

insert into test.elogen values (4,'ReturnTrip','E','C','OB')

insert into test.elogen values (4,'ReturnTrip','C','E','IB')

insert into test.elogen values (4,'ReturnTrip','E','D','IB')

insert into test.elogen values (4,'ReturnTrip','D','A','IB')

In the example data given, you always visit the outward points on the return journey, is this always the case or could you go
A-B-C-A (i.e. skip B on the way back)
If not you could indicate whether a leg is outward or inward, this would get over the sequence confusion above.

You would always have to visit the outward points on the return journey so you cannot skip B in the example.

So would the sequence query now look like:

select

X.ID,

Y.ID

FROM

test.elogen X

join

test.elogen Y on x.routeid = y.routeid

and x.destination = y.origin

AND X.direction = Y.direction

and x.origin <> y.destination

I guess I'm still not sure how to formulate the aggregation query.

Thanks

Dwain Camps SSC Guru Points: 86908 More actions · Answer 4

Try this. Same set up data as Aaron but different temp table name.

create table #Trips

(

ID int identity(1,1),

RouteID int,

Routename nvarchar(20),

Origin nvarchar(1),

Destination nvarchar(1)

)

insert into #Trips values (1,'ReturnTrip','A','B')

insert into #Trips values (1,'ReturnTrip','B','C')

insert into #Trips values (1,'ReturnTrip','C','B')

insert into #Trips values (1,'ReturnTrip','B','A')

insert into #Trips values (2,'ReturnTrip','A','B')

insert into #Trips values (2,'ReturnTrip','B','C')

insert into #Trips values (2,'ReturnTrip','C','B')

insert into #Trips values (2,'ReturnTrip','B','A')

insert into #Trips values (3,'ReturnTrip','A','D')

insert into #Trips values (3,'ReturnTrip','D','C')

insert into #Trips values (3,'ReturnTrip','C','D')

insert into #Trips values (3,'ReturnTrip','D','A')

insert into #Trips values (4,'ReturnTrip','A','D')

insert into #Trips values (4,'ReturnTrip','D','E')

insert into #Trips values (4,'ReturnTrip','E','C')

insert into #Trips values (4,'ReturnTrip','C','E')

insert into #Trips values (4,'ReturnTrip','E','D')

insert into #Trips values (4,'ReturnTrip','D','A')

The two queries that you want take advantage of the fact you can use the origins to find the intermediate stops, but using only the first 1/2 of them (less one):

;WITH MyTrips AS (

SELECT DISTINCT RouteName=Routename + 'Via' +

SUBSTRING((

SELECT Origin + ''

FROM #Trips b

WHERE a.RouteID = b.RouteID

ORDER BY ID

FOR XML PATH(''), TYPE).value('.', 'VARCHAR(MAX)')

,2, ((SELECT COUNT(*) FROM #Trips b WHERE a.RouteID = b.RouteID)/2)-1)

FROM #Trips a

GROUP BY RouteID, Routename)

SELECT ID=ROW_NUMBER() OVER (ORDER BY (SELECT NULL)), RouteName

FROM MyTrips

;WITH MyTrips AS (

SELECT RouteName=Routename + 'Via' +

SUBSTRING((

SELECT Origin + ''

FROM #Trips b

WHERE a.RouteID = b.RouteID

ORDER BY ID

FOR XML PATH(''), TYPE).value('.', 'VARCHAR(MAX)')

,2, ((SELECT COUNT(*) FROM #Trips b WHERE a.RouteID = b.RouteID)/2)-1)

,RouteConcat=Routename + '-' + CAST(RouteID AS VARCHAR)

FROM #Trips a

GROUP BY RouteID, Routename)

SELECT ID=ROW_NUMBER() OVER (ORDER BY (SELECT NULL)), RouteName, RouteConcat

FROM MyTrips

DROP TABLE #Trips

Please let me know if this works for you.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Dwain Camps SSC Guru Points: 86908 More actions · Answer 5

Note that one other thought occurred to me. You could have another route that's identical to other routes except that the origin/final destination are different, such as this:

ID RouteID RouteName Origin Destination

1 5 ReturnTrip E B

2 5 ReturnTrip B C

3 5 ReturnTrip C B

4 5 ReturnTrip B E

This can be handled in the code provided above by including another subquery to include the origin/final destination as part of the new route name. Like this:

create table #Trips

(

ID int identity(1,1),

RouteID int,

Routename nvarchar(20),

Origin nvarchar(1),

Destination nvarchar(1)

)

insert into #Trips values (1,'ReturnTrip','A','B')

insert into #Trips values (1,'ReturnTrip','B','C')

insert into #Trips values (1,'ReturnTrip','C','B')

insert into #Trips values (1,'ReturnTrip','B','A')

insert into #Trips values (2,'ReturnTrip','A','B')

insert into #Trips values (2,'ReturnTrip','B','C')

insert into #Trips values (2,'ReturnTrip','C','B')

insert into #Trips values (2,'ReturnTrip','B','A')

insert into #Trips values (3,'ReturnTrip','A','D')

insert into #Trips values (3,'ReturnTrip','D','C')

insert into #Trips values (3,'ReturnTrip','C','D')

insert into #Trips values (3,'ReturnTrip','D','A')

insert into #Trips values (4,'ReturnTrip','A','D')

insert into #Trips values (4,'ReturnTrip','D','E')

insert into #Trips values (4,'ReturnTrip','E','C')

insert into #Trips values (4,'ReturnTrip','C','E')

insert into #Trips values (4,'ReturnTrip','E','D')

insert into #Trips values (4,'ReturnTrip','D','A')

insert into #Trips values (5,'ReturnTrip','E','B')

insert into #Trips values (5,'ReturnTrip','B','C')

insert into #Trips values (5,'ReturnTrip','C','B')

insert into #Trips values (5,'ReturnTrip','B','E')

;WITH MyTrips AS (

SELECT DISTINCT RouteName=Routename +

(SELECT TOP 1 Origin FROM #Trips b WHERE a.RouteID = b.RouteID ORDER BY ID) +

'Via' +

SUBSTRING((

SELECT Origin + ''

FROM #Trips b

WHERE a.RouteID = b.RouteID

ORDER BY ID

FOR XML PATH(''), TYPE).value('.', 'VARCHAR(MAX)')

,2, ((SELECT COUNT(*) FROM #Trips b WHERE a.RouteID = b.RouteID)/2)-1)

FROM #Trips a

GROUP BY RouteID, Routename)

SELECT ID=ROW_NUMBER() OVER (ORDER BY (SELECT NULL)), RouteName

FROM MyTrips

;WITH MyTrips AS (

SELECT RouteName=Routename +

(SELECT TOP 1 Origin FROM #Trips b WHERE a.RouteID = b.RouteID ORDER BY ID) +

'Via' +

SUBSTRING((

SELECT Origin + ''

FROM #Trips b

WHERE a.RouteID = b.RouteID

ORDER BY ID

FOR XML PATH(''), TYPE).value('.', 'VARCHAR(MAX)')

,2, ((SELECT COUNT(*) FROM #Trips b WHERE a.RouteID = b.RouteID)/2)-1)

,RouteConcat=Routename + '-' + CAST(RouteID AS VARCHAR)

FROM #Trips a

GROUP BY RouteID, Routename)

SELECT ID=ROW_NUMBER() OVER (ORDER BY (SELECT NULL)), RouteName, RouteConcat

FROM MyTrips

DROP TABLE #Trips

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Dwain Camps SSC Guru Points: 86908 More actions · Answer 6

aaron.reese (9/26/2012)
The challenge is trying to identify the journey sequence. For Example in route 1 you go from B to C and B to A. SQL is not very good at identifying that you should go from B-C before B-A.

I used the ID to tell me this.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Dwain Camps SSC Guru Points: 86908 More actions · Answer 7

Since the two CTEs are virtually identical, here's another option that may be a little cleaner (put the second one into a temp table):

SELECT RouteName=Routename +

(SELECT TOP 1 Origin FROM #Trips b WHERE a.RouteID = b.RouteID ORDER BY ID) +

'Via' +

SUBSTRING((

SELECT Origin + ''

FROM #Trips b

WHERE a.RouteID = b.RouteID

ORDER BY ID

FOR XML PATH(''), TYPE).value('.', 'VARCHAR(MAX)')

,2, ((SELECT COUNT(*) FROM #Trips b WHERE a.RouteID = b.RouteID)/2)-1)

,RouteConcat=Routename + '-' + CAST(RouteID AS VARCHAR)

INTO #Temp

FROM #Trips a

GROUP BY RouteID, Routename

SELECT ID=ROW_NUMBER() OVER (ORDER BY (SELECT NULL)), RouteName

FROM (SELECT DISTINCT RouteName FROM #Temp) a

SELECT ID=ROW_NUMBER() OVER (ORDER BY (SELECT NULL)), RouteName, RouteConcat

FROM #Temp

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

elogen SSC Eights! Points: 839 More actions · Answer 8

Wow thanks for the query dwain,

it seems to work beautifully!

I'm just going to play around with some other scenarios that I've found with the data I'm given.

It seems that a route can only be considered identical if it contains the same number of legs as well.

So lets say we added:

insert into #Trips VALUES (2,'ReturnTrip','K','C')

Now Route 2 cannot be considered the same as Route 1.

Your query seems to handle this case as well, but I'm not sure if the above information would simply the query any further?

Thanks for your help.

Dwain Camps SSC Guru Points: 86908 More actions · Answer 9

elogen (9/26/2012)
Wow thanks for the query dwain,
it seems to work beautifully!
I'm just going to play around with some other scenarios that I've found with the data I'm given.
It seems that a route can only be considered identical if it contains the same number of legs as well.
So lets say we added:
insert into #Trips VALUES (2,'ReturnTrip','K','C')
Now Route 2 cannot be considered the same as Route 1.
Your query seems to handle this case as well, but I'm not sure if the above information would simply the query any further?
Thanks for your help.

I'd be happy if the new case is handled already! Doubtful it can be simplified further.

There are possibilities I think that it may fail if a return route doesn't exactly retrace the same steps, especially if it is through different way points. That may end up being quite complext to solve.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

elogen SSC Eights! Points: 839 More actions · Answer 10

There are possibilities I think that it may fail if a return route doesn't exactly retrace the same steps, especially if it is through different way points. That may end up being quite complext to solve.

Hmmm, I think I may need to handle the case as well. :w00t:

Say I modified Route 2 to this:

insert into #Trips values (2,'ReturnTrip','B','Q')

so that the return location is different to the start location.

The query currently treats Route2 the same as Route1. Is there some way this can be handled?

Note that I don't need the "Via" bit in the RouteName to reflect exactly what it should be, as long as a different route name is generated then it should be fine.

Thanks.

Dwain Camps SSC Guru Points: 86908 More actions · Answer 11

elogen (9/26/2012)
There are possibilities I think that it may fail if a return route doesn't exactly retrace the same steps, especially if it is through different way points. That may end up being quite complext to solve.
Hmmm, I think I may need to handle the case as well. :w00t:
Say I modified Route 2 to this:
insert into #Trips values (2,'ReturnTrip','B','Q')
so that the return location is different to the start location.
The query currently treats Route2 the same as Route1. Is there some way this can be handled?
Note that I don't need the "Via" bit in the RouteName to reflect exactly what it should be, as long as a different route name is generated then it should be fine.
Thanks.

Since the query is dependent on the sequence (order) that you insert additional stops into a route, can you resend all the INSERTs for the new route 2?

With that I can analyze your new requirement correctly.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

elogen SSC Eights! Points: 839 More actions · Answer 12

There are possibilities I think that it may fail if a return route doesn't exactly retrace the same steps, especially if it is through different way points. That may end up being quite complext to solve.

Damn, I think I may need to handle the case where the return point is different to the starting point.

Eg: if I changed Route 2's end point to:

insert into #Trips values (2,'ReturnTrip','B','Q')

Then Route 2 should be considered different to Route 1.

Also the query doesn't seem to work for 'One Way' trips,

insert into #Trips values (6,'OneWayTrip','A','B')

insert into #Trips values (7,'OneWayTrip','A','B')

It returns: "Invalid length parameter passed to the SUBSTRING function."

Note that the RouteName doesn't absolutely have to have the "Via...." portion, I just need it to be sufficiently different to distinguish one route from another.

Is there a way that this can be done?

Thanks.

elogen SSC Eights! Points: 839 More actions · Answer 13

Sorry posted twice, thought I'd lost the 1st reply...

elogen SSC Eights! Points: 839 More actions · Answer 14

Ok,

To test the scenario that route 1 and route 2 start at the same origin but finish at different desitionations:

insert into #Trips values (1,'ReturnTrip','A','B')

insert into #Trips values (1,'ReturnTrip','B','C')

insert into #Trips values (1,'ReturnTrip','C','B')

insert into #Trips values (1,'ReturnTrip','B','A')

insert into #Trips values (2,'ReturnTrip','A','B')

insert into #Trips values (2,'ReturnTrip','B','C')

insert into #Trips values (2,'ReturnTrip','C','B')

insert into #Trips values (2,'ReturnTrip','B','Q')

To test the scenario that OneWay routes work:

insert into #Trips values (6,'OneWayTrip','A','B')

insert into #Trips values (7,'OneWayTrip','A','B')

This should be treated as the same route

insert into #Trips values (6,'OneWayTrip','A','B')

insert into #Trips values (7,'OneWayTrip','A','C')

These should be treated as 2 different routes