Selecting n Closest (Geographic) Rows

  • Hello there!

    This is a high-level explanation.

    I have two sets of data. One set , let's say, is employer locations. The other set, let's say, is employees' home locations. With a single select statement, I want to select the employer location details with the two (2) closest empoyees and their home locations to listed employer location.

    It's (2) now but could just as easily be closest (5) or closest (10) later as requirements change. I have the lat/long on each row in each data set and I'm using the "Great Circle" formula to determine the approx. geographic distance between rows.

    I do not want to use a CURSOR (blech!) over top of the employer location data set to process one row at a time, but this self-imposed single statement restriction has me tied in knots.

    Thanks in advance for any direction.

  • You could use the function on the link below to calculate the distance from work to employee home in a select statement, and just select the top 2 based on distance.

    Great Circle Distance Function - Haversine Formula

    http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=81360

    select

    top 2

    Distance = dbo.F_GREAT_CIRCLE_DISTANCE

    (a.Work_Latitude, a.Work_Longitude1, a.Home_Latitude, a.Home_Longitude2 ),

    a.*

    from

    MyTable a

    order by

    1 asc

  • Hi Robert

    If you are using geographies you could use try the following. Unfortunately in 2008 it doesn't appear to make use of spatial indexes, meaning it doesn't scale very well. However in 2012 the spatial index will be used if you follow the rules specified in technet

    Data setup:

    create table #employer (

    id int identity primary key,

    placeofbusiness Geography

    );

    create table #employee (

    id int identity primary key,

    homelocation geography

    );

    with randomLocations as (

    SELECT lat = RAND(CAST(NEWID() AS VARBINARY)) * 2.0,

    long = RAND(CAST(NEWID() AS VARBINARY)) * 2.0

    FROM (VALUES (0), (0), (0), (0), (0), (0), (0), (0), (0), (0)) e1 (N)

    --,(VALUES (0), (0), (0), (0), (0), (0), (0), (0), (0), (0)) e2 (N) --scale up

    )

    INSERT INTO #employer (placeofbusiness)

    SELECT Geography::Point(lat, long, 4326)

    FROM randomLocations;

    CREATE SPATIAL INDEX employer_sdx ON #employer(placeofbusiness) USING GEOGRAPHY_GRID;

    with randomLocations as (

    SELECT lat = RAND(CAST(NEWID() AS VARBINARY)) * 2.0,

    long = RAND(CAST(NEWID() AS VARBINARY)) * 2.0

    FROM (VALUES (0), (0), (0), (0), (0), (0), (0), (0), (0), (0)) e1 (N)

    ,(VALUES (0), (0), (0), (0), (0), (0), (0), (0), (0), (0)) e2 (N)

    --,(VALUES (0), (0), (0), (0), (0), (0), (0), (0), (0), (0)) e3 (N) --scale up

    --,(VALUES (0), (0), (0), (0), (0), (0), (0), (0), (0), (0)) e4 (N) --scale up

    )

    INSERT INTO #employee (homelocation)

    SELECT Geography::Point(lat, long, 4326)

    FROM randomLocations;

    CREATE SPATIAL INDEX employee_sdx ON #employee(homelocation) USING GEOGRAPHY_GRID;

    Nearest Neighbour Query

    select e1.id employer_id, e2.id employee_id, e2.homelocation.STDistance(e1.placeofbusiness)

    from #employer e1

    cross apply (

    select top(10) e3.id, e3.homelocation

    from #employee e3 where e3.homelocation.STDistance(e1.placeofbusiness) is not null

    order by e3.homelocation.STDistance(e1.placeofbusiness)

    ) e2

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply