T-SQL puzzler - given a cell in an N-dimensional space, return the cells that are inline with the given cell along each axis

  • dwain.c (11/18/2013)


    Laurence Neville (11/18/2013)


    dwain.c - nope, I have nothing against dynamic SQL!

    I have to award the prize to mickeyT's solution - it is the fastest of all the contributions.

    Thanks everyone ... I was really banging my head with this one, and when I posted on StackOverflow most responses were "it can't be done in SQL". I guess I was asking in the wrong place!

    Even though you chose MickyT's solution over mine (congratulations Micky!), I am still gratified whenever I hear someone say "it can't be done in SQL" and yet here are multiple solutions showing how it can.

    At least you now know where to come next time.

    Ditto, there are very few things that I seen that someone here hasn't managed to find a workable solution to.

    On the performance front (I haven't compared them yet), I thought Dwain's was coming out in front. Have you scaled up the dimensions?

  • No, I was testing using my example data. mickey's query has a relatively simple plan with mostly index seeks. Running both mickey's and dwain's resulted in 19% for the former and 81% for the latter. I enclose a shot of the 2 actual execution plans - query #1 is mickey's with 2 & 3 belonging to dwain's.

  • Laurence Neville (11/19/2013)


    No, I was testing using my example data. mickey's query has a relatively simple plan with mostly index seeks. Running both mickey's and dwain's resulted in 19% for the former and 81% for the latter. I enclose a shot of the 2 actual execution plans - query #1 is mickey's with 2 & 3 belonging to dwain's.

    If you're talking plan costs here, as I said earlier, you can't always rely on them to be indicators of performance (specifically speed).

    IO counts are better. A 1,000,000 row test harness where you actually time them (elapsed time) is generally the best. Sometimes though, SQL will parallelize a query causing CPU time to exceed elapsed time and in those cases you may need to consider which is the most important resource to conserve.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • I thought I'd do a performance test of mine and Dwain's on a larger test set (1,000,000 cells).

    Basically the results show that while mine is quicker, Dwain's (once reusable Dynamic SQL is created) has less IO cost.

    Here's the test data setup

    truncate table #Cells;

    truncate table #AxisPositions;

    truncate table #cellAxes;

    with e100 as (

    select row_number() OVER (ORDER BY (SELECT NULL)) N from (values (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) e1(N), (values (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) e2(N)

    )

    ,cells as (

    select row_number() OVER (ORDER BY (SELECT NULL)) CellID, CAST(RAND(CAST(NEWID() AS VARBINARY(100))) * 1000 AS INT) CellValue

    from e100 X, e100 Y, e100 Z

    )

    insert into #Cells (CellID, CellValue)

    select * from Cells;

    with e100 as (

    select row_number() OVER (ORDER BY (SELECT NULL)) N from (values (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) e1(N), (values (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) e2(N)

    )

    insert into #AxisPositions (AxisCode, PositionOnAxis)

    select 'X', N from e100

    union all

    select 'Y', N from e100

    union all

    select 'Z', N from e100

    ;

    with e100 as (

    select row_number() OVER (ORDER BY (SELECT NULL)) N from (values (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) e1(N), (values (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) e2(N)

    )

    ,cellAxes as (

    select row_number() OVER (ORDER BY (SELECT NULL)) CellID, X.N XN, Y.N YN, Z.N ZN

    from e100 X, e100 Y, e100 Z

    )

    select * into #tempCA from cellAxes;

    insert into #cellAxes (CellID, AxisCode, PositionOnAxis)

    select cellid, 'X' AxisCode, XN PositionOnAxis from #tempCA

    union all

    select cellid, 'Y' AxisCode, YN PositionOnAxis from #tempCA

    union all

    select cellid, 'Z' AxisCode, ZN PositionOnAxis from #tempCA;

    drop table #tempCA;

    And the results on my machine

    Timings:

    MickyT : 733

    Dwain Dynamic (build 1 off cost): 2333

    Dwain Dynamic (run): 2236

    Dwain 3D : 2190ms

    IO:

    MickyT :

    Table '#CellAxes_000000000136'. Scan count 19, logical reads 14293, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Dwain Dynamic :

    Build Dynamic (1 off cost)

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#CellAxes_000000000136'. Scan count 81, logical reads 60255, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Run Dynamic

    Table '#CellAxes_000000000136'. Scan count 2, logical reads 6699, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Ramping it up to 4 dimensions (1500625 cells)

    truncate table #Cells;

    truncate table #AxisPositions;

    truncate table #cellAxes;

    with e35 as (

    select row_number() OVER (ORDER BY (SELECT NULL)) N from (values (0),(0),(0),(0),(0)) e1(N), (values (0),(0),(0),(0),(0),(0),(0)) e2(N)

    )

    ,cells as (

    select row_number() OVER (ORDER BY (SELECT NULL)) CellID, CAST(RAND(CAST(NEWID() AS VARBINARY(100))) * 1000 AS INT) CellValue

    from e35 X, e35 Y, e35 Z, e35 T

    )

    insert into #Cells (CellID, CellValue)

    select * from Cells;

    with e35 as (

    select row_number() OVER (ORDER BY (SELECT NULL)) N from (values (0),(0),(0),(0),(0)) e1(N), (values (0),(0),(0),(0),(0),(0),(0)) e2(N)

    )

    insert into #AxisPositions (AxisCode, PositionOnAxis)

    select 'X', N from e35

    union all

    select 'Y', N from e35

    union all

    select 'Z', N from e35

    union all

    select 'T', N from e35

    ;

    with e35 as (

    select row_number() OVER (ORDER BY (SELECT NULL)) N from (values (0),(0),(0),(0),(0)) e1(N), (values (0),(0),(0),(0),(0),(0),(0)) e2(N)

    )

    ,cellAxes as (

    select row_number() OVER (ORDER BY (SELECT NULL)) CellID, X.N XN, Y.N YN, Z.N ZN, T.N TN

    from e35 X, e35 Y, e35 Z, e35 T

    )

    select * into #tempCA from cellAxes;

    insert into #cellAxes (CellID, AxisCode, PositionOnAxis)

    select cellid, 'X' AxisCode, XN PositionOnAxis from #tempCA

    union all

    select cellid, 'Y' AxisCode, YN PositionOnAxis from #tempCA

    union all

    select cellid, 'Z' AxisCode, ZN PositionOnAxis from #tempCA

    union all

    select cellid, 'T' AxisCode, TN PositionOnAxis from #tempCA;

    drop table #tempCA;

    Timings :

    MickyT : 1443ms

    Dwain Dynamic (build 1 off cost): 4650ms

    Dwain Dynamic (run): 5030ms

    IO:

    MickyT :

    Table '#CellAxes_000000000136'. Scan count 19, logical reads 27211, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Dwain Dynamic :

    Build Dynamic (1 off cost)

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#CellAxes_000000000136'. Scan count 81, logical reads 120546, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Run Dynamic

    Table '#CellAxes_000000000136'. Scan count 2, logical reads 13398, physical reads 0, read-ahead reads 7, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

  • Nice follow up Micky!

    I have one question though, for the 4D case. Are both queries returning the same row counts?

    I was a little unsure about what I did with the IN filter at the end of my dynamic query for that case, and whether the results are then correct or not.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • dwain.c (11/20/2013)


    Nice follow up Micky!

    I have one question though, for the 4D case. Are both queries returning the same row counts?

    I was a little unsure about what I did with the IN filter at the end of my dynamic query for that case, and whether the results are then correct or not.

    The results came out identical, so no problem with the IN filter:-) The dynamic query came out looking like this

    WITH RowsToAxes AS

    (

    SELECT CellID

    ,T=MAX(CASE WHEN AxisCode = 'T' THEN PositionOnAxis END)

    ,X=MAX(CASE WHEN AxisCode = 'X' THEN PositionOnAxis END)

    ,Y=MAX(CASE WHEN AxisCode = 'Y' THEN PositionOnAxis END)

    ,Z=MAX(CASE WHEN AxisCode = 'Z' THEN PositionOnAxis END)

    FROM #CellAxes

    GROUP BY CellID

    )

    SELECT b.AxisCode, b.CellID

    FROM

    (

    SELECT CellID

    ,a.T,a.X,a.Y,a.Z

    ,T1,X1,Y1,Z1

    ,C=T1+X1+Y1+Z1

    FROM RowsToAxes a

    CROSS APPLY

    (

    SELECT

    T1=CASE T WHEN a.T THEN 1 ELSE 0 END

    ,X1=CASE X WHEN a.X THEN 1 ELSE 0 END

    ,Y1=CASE Y WHEN a.Y THEN 1 ELSE 0 END

    ,Z1=CASE Z WHEN a.Z THEN 1 ELSE 0 END

    FROM RowsToAxes

    WHERE CellID = @CellID

    ) b

    ) a

    CROSS APPLY

    (

    VALUES

    (CASE C WHEN 4 THEN CellID ELSE (1-T1)*Cellid END, 'T')

    ,(CASE C WHEN 4 THEN CellID ELSE (1-X1)*Cellid END, 'X')

    ,(CASE C WHEN 4 THEN CellID ELSE (1-Y1)*Cellid END, 'Y')

    ,(CASE C WHEN 4 THEN CellID ELSE (1-Z1)*Cellid END, 'Z')

    ) b (CellID, AxisCode)

    WHERE C IN (3,4) AND b.CellID <> 0;

  • mickyT (11/20/2013)


    dwain.c (11/20/2013)


    Nice follow up Micky!

    I have one question though, for the 4D case. Are both queries returning the same row counts?

    I was a little unsure about what I did with the IN filter at the end of my dynamic query for that case, and whether the results are then correct or not.

    The results came out identical, so no problem with the IN filter:-) The dynamic query came out looking like this

    WITH RowsToAxes AS

    (

    SELECT CellID

    ,T=MAX(CASE WHEN AxisCode = 'T' THEN PositionOnAxis END)

    ,X=MAX(CASE WHEN AxisCode = 'X' THEN PositionOnAxis END)

    ,Y=MAX(CASE WHEN AxisCode = 'Y' THEN PositionOnAxis END)

    ,Z=MAX(CASE WHEN AxisCode = 'Z' THEN PositionOnAxis END)

    FROM #CellAxes

    GROUP BY CellID

    )

    SELECT b.AxisCode, b.CellID

    FROM

    (

    SELECT CellID

    ,a.T,a.X,a.Y,a.Z

    ,T1,X1,Y1,Z1

    ,C=T1+X1+Y1+Z1

    FROM RowsToAxes a

    CROSS APPLY

    (

    SELECT

    T1=CASE T WHEN a.T THEN 1 ELSE 0 END

    ,X1=CASE X WHEN a.X THEN 1 ELSE 0 END

    ,Y1=CASE Y WHEN a.Y THEN 1 ELSE 0 END

    ,Z1=CASE Z WHEN a.Z THEN 1 ELSE 0 END

    FROM RowsToAxes

    WHERE CellID = @CellID

    ) b

    ) a

    CROSS APPLY

    (

    VALUES

    (CASE C WHEN 4 THEN CellID ELSE (1-T1)*Cellid END, 'T')

    ,(CASE C WHEN 4 THEN CellID ELSE (1-X1)*Cellid END, 'X')

    ,(CASE C WHEN 4 THEN CellID ELSE (1-Y1)*Cellid END, 'Y')

    ,(CASE C WHEN 4 THEN CellID ELSE (1-Z1)*Cellid END, 'Z')

    ) b (CellID, AxisCode)

    WHERE C IN (3,4) AND b.CellID <> 0;

    That's still pretty cool even though mine wasn't the swiftest. 🙂

    What I was a bit unsure about was this:

    WHERE C IN (3,4) AND b.CellID <> 0;

    And whether it should instead be:

    WHERE C IN (2,3,4) AND b.CellID <> 0;

    If the latter was correct, the CASE statements above would also require some adjustments.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • dwain.c (11/20/2013)

    That's still pretty cool even though mine wasn't the swiftest. 🙂

    What I was a bit unsure about was this:

    WHERE C IN (3,4) AND b.CellID <> 0;

    And whether it should instead be:

    WHERE C IN (2,3,4) AND b.CellID <> 0;

    If the latter was correct, the CASE statements above would also require some adjustments.

    Very cool indeed and a very interesting problem to work on.

    If I am following your logic correctly, to get the correct cells all the coordinate values have to be the same except maybe one, therefore 3,4 is right for four dimensions and 4,5 for 5 dimensions and so on for other dimensions. Funny that I counted differences in mine and you counted matches in yours:-D

  • You are correct about the dimension logic - to be "inline", cells must have all coordinate values except 1 the same as the given cell. In other words, there is only 1 dimension that can vary - the dimension along which you can "walk" to visit "inline" cells. This is true regardless of the number of dimensions.

Viewing 9 posts - 16 through 23 (of 23 total)

You must be logged in to reply to this topic. Login to reply