Home Forums SQL Server 2008 SQL Server 2008 - General Optimize query for finding nearest airport based on latitude and longitude using STDistance RE: Optimize query for finding nearest airport based on latitude and longitude using STDistance

  • Anytime you start asking questions about performance, it helps to have some test data.

    So I created a table of Airports from the attachment. It's not a very good table because it only has about 2700 airports with geospatial coordinates but it will do. I got it here: http://www.partow.net/miscellaneous/airportdatabase/

    Next I had to create data for a flight. I did this by assuming the plane flies in a straight line (from Manila to Pune, India) and captures 1000 data points.

    DECLARE @ORILat FLOAT

    ,@ORILongFLOAT

    ,@DESLatFLOAT

    ,@DESLongFLOAT

    ,@segmentsINT

    SELECT @ORILat= 14.50861111-- MNL (Manila)

    ,@ORILong= 121.0194444

    ,@DESLat= 18.58194444-- PNQ (Pune, India)

    ,@DESLong= 73.91944444

    ,@segments= 1000

    DECLARE @Log_Details TABLE

    (Log_Detail_ID INT,Log_Detail_Latitude FLOAT,Log_Detail_Longitude FLOAT,Log_Detail_Altitude INT)

    ;WITH Nbrs_3(n) AS (SELECT 1 UNION SELECT 0)

    ,Nbrs_2 (n) AS (SELECT 1 FROM Nbrs_3 n1 CROSS JOIN Nbrs_3 n2)

    ,Nbrs_1 (n) AS (SELECT 1 FROM Nbrs_2 n1 CROSS JOIN Nbrs_2 n2)

    ,Nbrs_0 (n) AS (SELECT 1 FROM Nbrs_1 n1 CROSS JOIN Nbrs_1 n2)

    ,Nbrs (n) AS (SELECT 1 FROM Nbrs_0 n1 CROSS JOIN Nbrs_0 n2)

    ,Tally (n) AS (SELECT ROW_NUMBER() OVER (ORDER BY n) As n FROM Nbrs)

    INSERT @Log_Details

    SELECT TOP (@segments) 1

    ,@ORILat + (@DESLat - @OriLat) * n / @segments

    ,@ORILong - (@ORILong - @DESLong) * n / @segments

    ,1000 + (1000*(n % 15))

    FROM Tally

    The next step, of which I will not bore you with the gory details was to simulate the haversine formula for calcuating distance from spatial coordinates because I'm running in SQL 2005 so don't have access to that neat STDistance function. You'll see that on the ORDER BY clause below (it is close but not perfect).

    The code below runs the 1000 points against all airports and then vs. airports only within a narrow range of latitude and longitude.

    -- Convert degrees to radians

    DECLARE @b-2 FLOAT

    SELECT @b-2 = ACOS(-1.)/180.

    SET STATISTICS TIME ON

    SELECT *

    FROM @Log_Details

    CROSS APPLY (

    SELECT TOP(1) ICAO, [Name],LatDec,LongDec

    FROM Airports

    ORDER BY 6731*2*ATN2(SQRT(SQUARE(SIN(@b*(Log_Detail_Latitude-LatDec)/2.)) + COS(LatDec*@b)*COS(Log_Detail_Latitude*@b)*SQUARE(SIN(@b*(Log_Detail_Longitude-LongDec)/2.))), SQRT(1.-(SQUARE(SIN(@b*(Log_Detail_Latitude-LatDec)/2.)) + COS(LatDec*@b)*COS(Log_Detail_Latitude*@b)*SQUARE(SIN(@b*(Log_Detail_Longitude-LongDec)/2.)))))) x

    SET STATISTICS TIME OFF

    DECLARE @LatDiff FLOAT, @LongDiff FLOAT

    SELECT @LatDiff = 0.5*ABS(@OriLat - @DesLat), @LongDiff = 0.5*ABS(@OriLong - @DesLong)

    SET STATISTICS TIME ON

    SELECT *

    FROM @Log_Details

    CROSS APPLY (

    SELECT TOP(1) ICAO, [Name], LatDec, LongDec

    FROM Airports

    WHERE LatDec BETWEEN Log_Detail_Latitude - @LatDiff AND Log_Detail_Latitude + @LatDiff and

    LongDec BETWEEN Log_Detail_Longitude - @LongDiff AND Log_Detail_Longitude + @LongDiff

    ORDER BY 6731*2*ATN2(SQRT(SQUARE(SIN(@b*(Log_Detail_Latitude-LatDec)/2.)) + COS(LatDec*@b)*COS(Log_Detail_Latitude*@b)*SQUARE(SIN(@b*(Log_Detail_Longitude-LongDec)/2.))), SQRT(1.-(SQUARE(SIN(@b*(Log_Detail_Latitude-LatDec)/2.)) + COS(LatDec*@b)*COS(Log_Detail_Latitude*@b)*SQUARE(SIN(@b*(Log_Detail_Longitude-LongDec)/2.)))))) x

    SET STATISTICS TIME OFF

    -- Now create an index and time it again

    CREATE INDEX latlong ON Airports (LatDec, LongDec)

    SET STATISTICS TIME ON

    SELECT *

    FROM @Log_Details

    CROSS APPLY (

    SELECT TOP(1) ICAO, [Name], LatDec, LongDec

    FROM Airports

    WHERE LatDec BETWEEN Log_Detail_Latitude - @LatDiff AND Log_Detail_Latitude + @LatDiff and

    LongDec BETWEEN Log_Detail_Longitude - @LongDiff AND Log_Detail_Longitude + @LongDiff

    ORDER BY 6731*2*ATN2(SQRT(SQUARE(SIN(@b*(Log_Detail_Latitude-LatDec)/2.)) + COS(LatDec*@b)*COS(Log_Detail_Latitude*@b)*SQUARE(SIN(@b*(Log_Detail_Longitude-LongDec)/2.))), SQRT(1.-(SQUARE(SIN(@b*(Log_Detail_Latitude-LatDec)/2.)) + COS(LatDec*@b)*COS(Log_Detail_Latitude*@b)*SQUARE(SIN(@b*(Log_Detail_Longitude-LongDec)/2.)))))) x

    SET STATISTICS TIME OFF

    The last step creates an index on Airports latitude and longitude and runs the second query again.

    The timing results are:

    (1000 row(s) affected)

    SQL Server Execution Times:

    CPU time = 7613 ms, elapsed time = 7676 ms.

    (1000 row(s) affected)

    SQL Server Execution Times:

    CPU time = 686 ms, elapsed time = 687 ms.

    (1000 row(s) affected)

    SQL Server Execution Times:

    CPU time = 265 ms, elapsed time = 609 ms.

    As you can see, limiting the airports in the range of the search helped significantly. I think I did it more or less the way Jeff was suggesting. The index helped even more.

    No doubt you'll want to check that the airports along the path are correct but they start with MNL and end at PNQ.

    This version appears to operate just slightly faster and you could use it if you don't need to return the rest of the stuff about the closest airport (just the name).

    SELECT l.*

    ,(SELECT TOP(1) ICAO

    FROM Airports

    WHERE LatDec BETWEEN Log_Detail_Latitude - @LatDiff AND Log_Detail_Latitude + @LatDiff and

    LongDec BETWEEN Log_Detail_Longitude - @LongDiff AND Log_Detail_Longitude + @LongDiff

    ORDER BY 6731*2*ATN2(SQRT(SQUARE(SIN(@b*(Log_Detail_Latitude-LatDec)/2.)) + COS(LatDec*@b)*COS(Log_Detail_Latitude*@b)*SQUARE(SIN(@b*(Log_Detail_Longitude-LongDec)/2.))), SQRT(1.-(SQUARE(SIN(@b*(Log_Detail_Latitude-LatDec)/2.)) + COS(LatDec*@b)*COS(Log_Detail_Latitude*@b)*SQUARE(SIN(@b*(Log_Detail_Longitude-LongDec)/2.)))))

    ) as Airport

    FROM @Log_Details l

    Query plan cost is just slightly better than the others.

    Edit: Note that the longitude difference is not corrected properly for trans-Pacific crossings. That is the added complexity that Jeff mentioned. Just trying to show the performance comparison here.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St