Aggregating query help

  • I have a table of possible trip routes like the following:

    ID RouteID RouteName Origin Destination

    1 1 ReturnTrip A B

    2 1 ReturnTrip B C

    3 1 ReturnTrip C B

    4 1 ReturnTrip B A

    5 2 ReturnTrip A B

    6 2 ReturnTrip B C

    7 2 ReturnTrip C B

    8 2 ReturnTrip B A

    9 3 ReturnTrip A D

    10 3 ReturnTrip D C

    11 3 ReturnTrip C D

    12 3 ReturnTrip D A

    13 4 ReturnTrip A D

    14 4 ReturnTrip D E

    15 4 ReturnTrip E C

    16 4 ReturnTrip C E

    17 4 ReturnTrip E D

    18 4 ReturnTrip D A

    I'm basically starting off at point A and ending up at point C but along the way I have the option of stopping off at X number of in between points.

    So RouteID 1, I'm going from A to C via B and returning from C to A via B

    Notice how RouteID 1 and RouteID 2 are the same routes, so I want to condense them into a single row and call it "ReturnTripViaB".

    RouteID 3 is the same trip however I'm using a different route (going via point D). I want to treat this one as a separate row and call it "ReturnTripViaD".

    RouteID 4 is the same trip however I'm stopping off at multiple points (D and E) along the way between A and C

    So the resultant table should look like:

    ID RouteName

    1 ReturnTripViaB

    2 ReturnTripViaD

    3 ReturnTripViaDE

    Also, since I want to keep the relationship between my aggregated table and my original table, I need another table

    like the following:

    ID RouteName RouteNameConcatRouteID

    1 ReturnTripViaB ReturnTrip-1

    2 ReturnTripViaB ReturnTrip-2

    3 ReturnTripViaD ReturnTrip-3

    4 ReturnTripViaDE ReturnTrip-4

    I'm having trouble coming up with a query that will do this aggregation for me. Is anyone able to help?

    Thanks very much.

  • May I ask you to provide a bit more details?

    Here you can read tips, which should help to attract relevant answers to your questions:

    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    Please help us to help you!

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • create table test.elogen

    (

    ID int identity(1,1),

    RouteID int,

    Routename nvarchar(20),

    Origin nvarchar(1),

    Destination nvarchar(1)

    )

    insert into test.elogen values (1,'ReturnTrip','A','B')

    insert into test.elogen values (1,'ReturnTrip','B','C')

    insert into test.elogen values (1,'ReturnTrip','C','B')

    insert into test.elogen values (1,'ReturnTrip','B','A')

    insert into test.elogen values (2,'ReturnTrip','A','B')

    insert into test.elogen values (2,'ReturnTrip','B','C')

    insert into test.elogen values (2,'ReturnTrip','C','B')

    insert into test.elogen values (2,'ReturnTrip','B','A')

    insert into test.elogen values (3,'ReturnTrip','A','D')

    insert into test.elogen values (3,'ReturnTrip','D','C')

    insert into test.elogen values (3,'ReturnTrip','C','D')

    insert into test.elogen values (3,'ReturnTrip','D','A')

    insert into test.elogen values (4,'ReturnTrip','A','D')

    insert into test.elogen values (4,'ReturnTrip','D','E')

    insert into test.elogen values (4,'ReturnTrip','E','C')

    insert into test.elogen values (4,'ReturnTrip','C','E')

    insert into test.elogen values (4,'ReturnTrip','E','D')

    insert into test.elogen values (4,'ReturnTrip','D','A')

    I've been at this for an hour now and it is really hard with the data you have provided.

    This would indicate that there is an issue with the structure of the data.:-P

    The challenge is trying to identify the journey sequence. For Example in route 1 you go from B to C and B to A. SQL is not very good at identifying that you should go from B-C before B-A.

    select

    X.ID,

    Y.ID

    FROM

    test.elogen X

    join

    test.elogen Y on x.routeid = y.routeid

    and x.destination = y.origin

    --and x.origin <> y.destination

    including the inequality above causes the first of the return steps to be dropped, excluding the inequality means that you join the outward step to the inward step

    You really need to have a step sequence indicator (index based on RouteID and stepnumber). I wouldn't want to rely in the identity column (ID) to enforce this sequence.

    In the example data given, you always visit the outward points on the return journey, is this always the case or could you go

    A-B-C-A (i.e. skip B on the way back)

    If not you could indicate whether a leg is outward or inward, this would get over the sequence confusion above.

  • Thanks aaron,

    Yes I fogot to include a vital column Direction which would indicate whether the journey was an Inbound or an Outbound.

    So the code should look like this:

    create table test.elogen

    (

    ID int identity(1,1),

    RouteID int,

    Routename nvarchar(20),

    Origin nvarchar(1),

    Destination nvarchar(1),

    Direction nvarchar(2)

    )

    insert into test.elogen values (1,'ReturnTrip','A','B','OB')

    insert into test.elogen values (1,'ReturnTrip','B','C','OB')

    insert into test.elogen values (1,'ReturnTrip','C','B','IB')

    insert into test.elogen values (1,'ReturnTrip','B','A','IB')

    insert into test.elogen values (2,'ReturnTrip','A','B','OB')

    insert into test.elogen values (2,'ReturnTrip','B','C','OB')

    insert into test.elogen values (2,'ReturnTrip','C','B','IB')

    insert into test.elogen values (2,'ReturnTrip','B','A','IB')

    insert into test.elogen values (3,'ReturnTrip','A','D','OB')

    insert into test.elogen values (3,'ReturnTrip','D','C','OB')

    insert into test.elogen values (3,'ReturnTrip','C','D','IB')

    insert into test.elogen values (3,'ReturnTrip','D','A','IB')

    insert into test.elogen values (4,'ReturnTrip','A','D','OB')

    insert into test.elogen values (4,'ReturnTrip','D','E','OB')

    insert into test.elogen values (4,'ReturnTrip','E','C','OB')

    insert into test.elogen values (4,'ReturnTrip','C','E','IB')

    insert into test.elogen values (4,'ReturnTrip','E','D','IB')

    insert into test.elogen values (4,'ReturnTrip','D','A','IB')

    In the example data given, you always visit the outward points on the return journey, is this always the case or could you go

    A-B-C-A (i.e. skip B on the way back)

    If not you could indicate whether a leg is outward or inward, this would get over the sequence confusion above.

    You would always have to visit the outward points on the return journey so you cannot skip B in the example.

    So would the sequence query now look like:

    select

    X.ID,

    Y.ID

    FROM

    test.elogen X

    join

    test.elogen Y on x.routeid = y.routeid

    and x.destination = y.origin

    AND X.direction = Y.direction

    and x.origin <> y.destination

    I guess I'm still not sure how to formulate the aggregation query.

    Thanks

  • Try this. Same set up data as Aaron but different temp table name.

    create table #Trips

    (

    ID int identity(1,1),

    RouteID int,

    Routename nvarchar(20),

    Origin nvarchar(1),

    Destination nvarchar(1)

    )

    insert into #Trips values (1,'ReturnTrip','A','B')

    insert into #Trips values (1,'ReturnTrip','B','C')

    insert into #Trips values (1,'ReturnTrip','C','B')

    insert into #Trips values (1,'ReturnTrip','B','A')

    insert into #Trips values (2,'ReturnTrip','A','B')

    insert into #Trips values (2,'ReturnTrip','B','C')

    insert into #Trips values (2,'ReturnTrip','C','B')

    insert into #Trips values (2,'ReturnTrip','B','A')

    insert into #Trips values (3,'ReturnTrip','A','D')

    insert into #Trips values (3,'ReturnTrip','D','C')

    insert into #Trips values (3,'ReturnTrip','C','D')

    insert into #Trips values (3,'ReturnTrip','D','A')

    insert into #Trips values (4,'ReturnTrip','A','D')

    insert into #Trips values (4,'ReturnTrip','D','E')

    insert into #Trips values (4,'ReturnTrip','E','C')

    insert into #Trips values (4,'ReturnTrip','C','E')

    insert into #Trips values (4,'ReturnTrip','E','D')

    insert into #Trips values (4,'ReturnTrip','D','A')

    The two queries that you want take advantage of the fact you can use the origins to find the intermediate stops, but using only the first 1/2 of them (less one):

    ;WITH MyTrips AS (

    SELECT DISTINCT RouteName=Routename + 'Via' +

    SUBSTRING((

    SELECT Origin + ''

    FROM #Trips b

    WHERE a.RouteID = b.RouteID

    ORDER BY ID

    FOR XML PATH(''), TYPE).value('.', 'VARCHAR(MAX)')

    ,2, ((SELECT COUNT(*) FROM #Trips b WHERE a.RouteID = b.RouteID)/2)-1)

    FROM #Trips a

    GROUP BY RouteID, Routename)

    SELECT ID=ROW_NUMBER() OVER (ORDER BY (SELECT NULL)), RouteName

    FROM MyTrips

    ;WITH MyTrips AS (

    SELECT RouteName=Routename + 'Via' +

    SUBSTRING((

    SELECT Origin + ''

    FROM #Trips b

    WHERE a.RouteID = b.RouteID

    ORDER BY ID

    FOR XML PATH(''), TYPE).value('.', 'VARCHAR(MAX)')

    ,2, ((SELECT COUNT(*) FROM #Trips b WHERE a.RouteID = b.RouteID)/2)-1)

    ,RouteConcat=Routename + '-' + CAST(RouteID AS VARCHAR)

    FROM #Trips a

    GROUP BY RouteID, Routename)

    SELECT ID=ROW_NUMBER() OVER (ORDER BY (SELECT NULL)), RouteName, RouteConcat

    FROM MyTrips

    DROP TABLE #Trips

    Please let me know if this works for you.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Note that one other thought occurred to me. You could have another route that's identical to other routes except that the origin/final destination are different, such as this:

    ID RouteID RouteName Origin Destination

    1 5 ReturnTrip E B

    2 5 ReturnTrip B C

    3 5 ReturnTrip C B

    4 5 ReturnTrip B E

    This can be handled in the code provided above by including another subquery to include the origin/final destination as part of the new route name. Like this:

    create table #Trips

    (

    ID int identity(1,1),

    RouteID int,

    Routename nvarchar(20),

    Origin nvarchar(1),

    Destination nvarchar(1)

    )

    insert into #Trips values (1,'ReturnTrip','A','B')

    insert into #Trips values (1,'ReturnTrip','B','C')

    insert into #Trips values (1,'ReturnTrip','C','B')

    insert into #Trips values (1,'ReturnTrip','B','A')

    insert into #Trips values (2,'ReturnTrip','A','B')

    insert into #Trips values (2,'ReturnTrip','B','C')

    insert into #Trips values (2,'ReturnTrip','C','B')

    insert into #Trips values (2,'ReturnTrip','B','A')

    insert into #Trips values (3,'ReturnTrip','A','D')

    insert into #Trips values (3,'ReturnTrip','D','C')

    insert into #Trips values (3,'ReturnTrip','C','D')

    insert into #Trips values (3,'ReturnTrip','D','A')

    insert into #Trips values (4,'ReturnTrip','A','D')

    insert into #Trips values (4,'ReturnTrip','D','E')

    insert into #Trips values (4,'ReturnTrip','E','C')

    insert into #Trips values (4,'ReturnTrip','C','E')

    insert into #Trips values (4,'ReturnTrip','E','D')

    insert into #Trips values (4,'ReturnTrip','D','A')

    insert into #Trips values (5,'ReturnTrip','E','B')

    insert into #Trips values (5,'ReturnTrip','B','C')

    insert into #Trips values (5,'ReturnTrip','C','B')

    insert into #Trips values (5,'ReturnTrip','B','E')

    ;WITH MyTrips AS (

    SELECT DISTINCT RouteName=Routename +

    (SELECT TOP 1 Origin FROM #Trips b WHERE a.RouteID = b.RouteID ORDER BY ID) +

    'Via' +

    SUBSTRING((

    SELECT Origin + ''

    FROM #Trips b

    WHERE a.RouteID = b.RouteID

    ORDER BY ID

    FOR XML PATH(''), TYPE).value('.', 'VARCHAR(MAX)')

    ,2, ((SELECT COUNT(*) FROM #Trips b WHERE a.RouteID = b.RouteID)/2)-1)

    FROM #Trips a

    GROUP BY RouteID, Routename)

    SELECT ID=ROW_NUMBER() OVER (ORDER BY (SELECT NULL)), RouteName

    FROM MyTrips

    ;WITH MyTrips AS (

    SELECT RouteName=Routename +

    (SELECT TOP 1 Origin FROM #Trips b WHERE a.RouteID = b.RouteID ORDER BY ID) +

    'Via' +

    SUBSTRING((

    SELECT Origin + ''

    FROM #Trips b

    WHERE a.RouteID = b.RouteID

    ORDER BY ID

    FOR XML PATH(''), TYPE).value('.', 'VARCHAR(MAX)')

    ,2, ((SELECT COUNT(*) FROM #Trips b WHERE a.RouteID = b.RouteID)/2)-1)

    ,RouteConcat=Routename + '-' + CAST(RouteID AS VARCHAR)

    FROM #Trips a

    GROUP BY RouteID, Routename)

    SELECT ID=ROW_NUMBER() OVER (ORDER BY (SELECT NULL)), RouteName, RouteConcat

    FROM MyTrips

    DROP TABLE #Trips


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • aaron.reese (9/26/2012)


    The challenge is trying to identify the journey sequence. For Example in route 1 you go from B to C and B to A. SQL is not very good at identifying that you should go from B-C before B-A.

    I used the ID to tell me this.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Since the two CTEs are virtually identical, here's another option that may be a little cleaner (put the second one into a temp table):

    SELECT RouteName=Routename +

    (SELECT TOP 1 Origin FROM #Trips b WHERE a.RouteID = b.RouteID ORDER BY ID) +

    'Via' +

    SUBSTRING((

    SELECT Origin + ''

    FROM #Trips b

    WHERE a.RouteID = b.RouteID

    ORDER BY ID

    FOR XML PATH(''), TYPE).value('.', 'VARCHAR(MAX)')

    ,2, ((SELECT COUNT(*) FROM #Trips b WHERE a.RouteID = b.RouteID)/2)-1)

    ,RouteConcat=Routename + '-' + CAST(RouteID AS VARCHAR)

    INTO #Temp

    FROM #Trips a

    GROUP BY RouteID, Routename

    SELECT ID=ROW_NUMBER() OVER (ORDER BY (SELECT NULL)), RouteName

    FROM (SELECT DISTINCT RouteName FROM #Temp) a

    SELECT ID=ROW_NUMBER() OVER (ORDER BY (SELECT NULL)), RouteName, RouteConcat

    FROM #Temp


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Wow thanks for the query dwain,

    it seems to work beautifully!

    I'm just going to play around with some other scenarios that I've found with the data I'm given.

    It seems that a route can only be considered identical if it contains the same number of legs as well.

    So lets say we added:

    insert into #Trips VALUES (2,'ReturnTrip','K','C')

    Now Route 2 cannot be considered the same as Route 1.

    Your query seems to handle this case as well, but I'm not sure if the above information would simply the query any further?

    Thanks for your help.

  • elogen (9/26/2012)


    Wow thanks for the query dwain,

    it seems to work beautifully!

    I'm just going to play around with some other scenarios that I've found with the data I'm given.

    It seems that a route can only be considered identical if it contains the same number of legs as well.

    So lets say we added:

    insert into #Trips VALUES (2,'ReturnTrip','K','C')

    Now Route 2 cannot be considered the same as Route 1.

    Your query seems to handle this case as well, but I'm not sure if the above information would simply the query any further?

    Thanks for your help.

    I'd be happy if the new case is handled already! Doubtful it can be simplified further.

    There are possibilities I think that it may fail if a return route doesn't exactly retrace the same steps, especially if it is through different way points. That may end up being quite complext to solve.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • There are possibilities I think that it may fail if a return route doesn't exactly retrace the same steps, especially if it is through different way points. That may end up being quite complext to solve.

    Hmmm, I think I may need to handle the case as well. :w00t:

    Say I modified Route 2 to this:

    insert into #Trips values (2,'ReturnTrip','B','Q')

    so that the return location is different to the start location.

    The query currently treats Route2 the same as Route1. Is there some way this can be handled?

    Note that I don't need the "Via" bit in the RouteName to reflect exactly what it should be, as long as a different route name is generated then it should be fine.

    Thanks.

  • elogen (9/26/2012)


    There are possibilities I think that it may fail if a return route doesn't exactly retrace the same steps, especially if it is through different way points. That may end up being quite complext to solve.

    Hmmm, I think I may need to handle the case as well. :w00t:

    Say I modified Route 2 to this:

    insert into #Trips values (2,'ReturnTrip','B','Q')

    so that the return location is different to the start location.

    The query currently treats Route2 the same as Route1. Is there some way this can be handled?

    Note that I don't need the "Via" bit in the RouteName to reflect exactly what it should be, as long as a different route name is generated then it should be fine.

    Thanks.

    Since the query is dependent on the sequence (order) that you insert additional stops into a route, can you resend all the INSERTs for the new route 2?

    With that I can analyze your new requirement correctly.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • There are possibilities I think that it may fail if a return route doesn't exactly retrace the same steps, especially if it is through different way points. That may end up being quite complext to solve.

    Damn, I think I may need to handle the case where the return point is different to the starting point.

    Eg: if I changed Route 2's end point to:

    insert into #Trips values (2,'ReturnTrip','B','Q')

    Then Route 2 should be considered different to Route 1.

    Also the query doesn't seem to work for 'One Way' trips,

    insert into #Trips values (6,'OneWayTrip','A','B')

    insert into #Trips values (7,'OneWayTrip','A','B')

    It returns: "Invalid length parameter passed to the SUBSTRING function."

    Note that the RouteName doesn't absolutely have to have the "Via...." portion, I just need it to be sufficiently different to distinguish one route from another.

    Is there a way that this can be done?

    Thanks.

  • Sorry posted twice, thought I'd lost the 1st reply...

  • Ok,

    To test the scenario that route 1 and route 2 start at the same origin but finish at different desitionations:

    insert into #Trips values (1,'ReturnTrip','A','B')

    insert into #Trips values (1,'ReturnTrip','B','C')

    insert into #Trips values (1,'ReturnTrip','C','B')

    insert into #Trips values (1,'ReturnTrip','B','A')

    insert into #Trips values (2,'ReturnTrip','A','B')

    insert into #Trips values (2,'ReturnTrip','B','C')

    insert into #Trips values (2,'ReturnTrip','C','B')

    insert into #Trips values (2,'ReturnTrip','B','Q')

    To test the scenario that OneWay routes work:

    insert into #Trips values (6,'OneWayTrip','A','B')

    insert into #Trips values (7,'OneWayTrip','A','B')

    This should be treated as the same route

    insert into #Trips values (6,'OneWayTrip','A','B')

    insert into #Trips values (7,'OneWayTrip','A','C')

    These should be treated as 2 different routes

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply