Need help with basic spatial query

  • I have a table with events that took place at specific x,y coordinates and another table with a listing of our locations and their coordinates. I need to match each event found in the first table with the nearest location found in the second table. I'm having trouble wrapping my brain around the spatial concepts, so I would like some help writing the query. Basically, each table looks like this:

    Events table

    id INT,

    event_name VARCHAR(50),

    coordinates GEOMETRY

    Locations table

    id INT,

    location_name VARCHAR(50),

    coordinates GEOMETRY

    My first stab at the query looked like this:

    SELECT e.id, e.event_name,

    (SELECT TOP 1 l.location_name FROM Locations l WHERE coordinates.STDistance(geometry::Point(e.coordinates.STY, e.coordinates.STX, 0) IS NOT NULL

    ORDER BY coordinates.STDistance(geometry::Point(e.coordinates.STY, e.coordinates.STX, 0) ASC)

    FROM Events e,

    GROUP BY e.id, e.event_name

    Am I missing something simple, or is my understanding of these concepts really off-base?

  • Mark Harley (7/17/2014)


    I have a table with events that took place at specific x,y coordinates and another table with a listing of our locations and their coordinates. I need to match each event found in the first table with the nearest location found in the second table. I'm having trouble wrapping my brain around the spatial concepts, so I would like some help writing the query. Basically, each table looks like this:

    Events table

    id INT,

    event_name VARCHAR(50),

    coordinates GEOMETRY

    Locations table

    id INT,

    location_name VARCHAR(50),

    coordinates GEOMETRY

    My first stab at the query looked like this:

    SELECT e.id, e.event_name,

    (SELECT TOP 1 l.location_name FROM Locations l WHERE coordinates.STDistance(geometry::Point(e.coordinates.STY, e.coordinates.STX, 0) IS NOT NULL

    ORDER BY coordinates.STDistance(geometry::Point(e.coordinates.STY, e.coordinates.STX, 0) ASC)

    FROM Events e,

    GROUP BY e.id, e.event_name

    Am I missing something simple, or is my understanding of these concepts really off-base?

    You are close and a couple of small changes should get you there. There is no need to deconstruct and reconstruct your geometries in the query.

    This query will not perform the best in 2008 as it will not use spatial indexes, however in 2012 it should use them. You will need to make sure the SRIDs in your geometries are the same.

    SELECT e.id,

    e.event_name,

    l.location_name,

    l.Distance

    FROM Events e

    CROSS APPLY (

    SELECT TOP 1 loc.location_name, loc.coordinates.STDistance(e.coordinates) Distance

    FROM Locations loc

    WHERE loc.coordinates IS NOT NULL AND

    loc.coordinates.STDistance(e.coordinates) < 1000

    ORDER BY loc.coordinates.STDistance(e.coordinates) ASC

    ) l

  • I've confirmed that this does indeed work, but it can take an ungodly amount of time. I have 125,000+ events and 175,000+ locations to work through and the query takes hours to run, which is unusual for such a small number of rows. Is there something I can do to improve this?

    Cross Apply will run through every location row for each event row, correct? If so, I imagine that's the bottleneck.

  • What do the indexes on your tables look like?

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • BTW...

    mickyT (7/17/2014)


    This query will not perform the best in 2008 as it will not use spatial indexes, however in 2012 it should use them. You will need to make sure the SRIDs in your geometries are the same.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Ya, I know things won't work as well in 2008 as they would in 2012. I'm hoping to find money in the budget to upgrade, but it won't happen this year.

    My event data actually comes from oracle and is stored in a temp table, which I believe means that we can't add an index to it. And since we're using 2008, we can't add an index to the geometry column in our locations table. Am I wrong about either of those?

  • Mark Harley (7/23/2014)


    My event data actually comes from oracle and is stored in a temp table, which I believe means that we can't add an index to it. And since we're using 2008, we can't add an index to the geometry column in our locations table. Am I wrong about either of those?

    A temp table as in one that uses # or ## in the name? I've added indexes to those plenty of times before. The only issue I've run into is when I'm running the same query in multiple places that the indexes won't create because one already exists with that name. The solution to that is to drop the table after each run, or not to run it in multiple places.

    As far as the geometry column goes, I'm not sure. But I'm more concerned with the other columns used in the query that might not be spatial related. What kind of indexes do you have on those? Or are both tables non-indexed heaps?

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Quick thought, you are using GEOMETRY ( Cartesian / planar ) for geographic operations, must be careful here!

    😎

  • Sorry, we're using a table variable and not a temp table. You can't add an index to a table variable, can you? We are pulling additional data from the locations table aside from the geometry column, but I don't believe there any indexes defined.

  • Eirikur Eiriksson (7/23/2014)


    Quick thought, you are using GEOMETRY ( Cartesian / planar ) for geographic operations, must be careful here!

    😎

    I think this is a matter of necessity. In our event data, the event positions are recorded as a LAT and LON, while our locations table contains X and Y coordinates. We're converting the lat/lon to X/Y, which is why we use the geometry type instead of geography. Is that what you were referring to?

  • Mark Harley (7/23/2014)


    Eirikur Eiriksson (7/23/2014)


    Quick thought, you are using GEOMETRY ( Cartesian / planar ) for geographic operations, must be careful here!

    😎

    I think this is a matter of necessity. In our event data, the event positions are recorded as a LAT and LON, while our locations table contains X and Y coordinates. We're converting the lat/lon to X/Y, which is why we use the geometry type instead of geography. Is that what you were referring to?

    You should be doing this the other way around, taking the X/Y to lat/lon, otherwise the distance calculation is somewhat meaningless. The distance unit in GEOGRAPHY is meter, it is a relative coordinate (grid) value in the GEOMETRY.

    😎

  • Mark Harley (7/23/2014)


    Sorry, we're using a table variable and not a temp table. You can't add an index to a table variable, can you? We are pulling additional data from the locations table aside from the geometry column, but I don't believe there any indexes defined.

    I believe Gail Shaw proved that you could. But let me pull a few links.

    Here's Gail's blog on the issue [/url](half-way down).

    And then there's this.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Mark Harley (7/23/2014)


    I've confirmed that this does indeed work, but it can take an ungodly amount of time. I have 125,000+ events and 175,000+ locations to work through and the query takes hours to run, which is unusual for such a small number of rows. Is there something I can do to improve this?

    Cross Apply will run through every location row for each event row, correct? If so, I imagine that's the bottleneck.

    Unfortunately it's not that unusual for this to take hours. In effect the query ends up being a cross join, which means that you produce around 21,875,000,000 rows to do comparisons on. There is unlikely to be any difference between using a join and a cross apply for this query.

    Sorry, we're using a table variable and not a temp table.

    It would be worthwhile either making this a temporary or an actual table.

    As for other methods, there may be a couple of ways to improve the performance of this in 2008. Here's one possibility.

    Using the following as a test set to make sure index was used

    CREATE TABLE Location (

    id INT IDENTITY(1,1) NOT NULL PRIMARY KEY,

    location_name VARCHAR(50),

    coordinates Geometry

    );

    CREATE TABLE #Events (

    id INT IDENTITY(1,1) NOT NULL PRIMARY KEY,

    event_name VARCHAR(50),

    coordinates Geometry

    );

    WITH RandomCoord AS (

    SELECT RAND(CAST(NEWID() AS VARBINARY)) * 100000.0 X,

    RAND(CAST(NEWID() AS VARBINARY)) * 100000.0 Y

    ),

    cTally AS ( -- Count to 100,000

    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) N

    FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) e1(N),

    (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) e2(N),

    (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) e3(N),

    (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) e4(N),

    (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) e5(N)

    )

    INSERT INTO Location (location_name, coordinates)

    SELECT 'Location ' + CAST(N AS VARCHAR(10)), Geometry::Point(X, Y, 0)

    FROM cTally CROSS APPLY (SELECT X, Y FROM RandomCoord) r;

    WITH RandomCoord AS (

    SELECT RAND(CAST(NEWID() AS VARBINARY)) * 100000.0 X,

    RAND(CAST(NEWID() AS VARBINARY)) * 100000.0 Y

    ),

    cTally AS ( -- Count to 100,000

    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) N

    FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) e1(N),

    (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) e2(N),

    (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) e3(N),

    (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) e4(N),

    (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) e5(N)

    )

    INSERT INTO #Events (event_name, coordinates)

    SELECT 'Event ' + CAST(N AS VARCHAR(10)), Geometry::Point(X, Y, 0)

    FROM cTally CROSS APPLY (SELECT X, Y FROM RandomCoord) r;

    CREATE SPATIAL INDEX loc_sidx ON Location (coordinates)

    USING GEOMETRY_GRID WITH (

    BOUNDING_BOX =(950000, 4500000, 3500000, 7000000),

    GRIDS =(LEVEL_1 = HIGH,LEVEL_2 = MEDIUM,LEVEL_3 = MEDIUM,LEVEL_4 = MEDIUM),

    CELLS_PER_OBJECT = 4

    );

    Here's the query. It assumes that you only want to consider events that are within a certain distance. In this case I picked 200

    SELECT location_id, event_id, location_name, event_name, Distance

    FROM (

    SELECT l.id location_id, e.id event_id,

    l.location_name, e.event_name,

    l.coordinates.STDistance(e.coordinates) Distance,

    ROW_NUMBER() OVER (PARTITION BY e.id ORDER BY l.coordinates.STDistance(e.coordinates)) R

    FROM Location l

    INNER JOIN (

    SELECT id, event_name, coordinates.STBuffer(200) buffered, coordinates FROM #events

    ) e ON e.buffered.STContains(l.coordinates) = 1

    ) d

    WHERE R = 1;

    On my desktop still takes quite some time (I'll put it up when it finishes :-D), but importantly it produced a plan using the spatial index.

    Edit: This query doesn't appear as if it will perform any better. It's been running for almost 3 hours without a result. Reducing the buffer area may improve the performance, but difficult to say at the moment.

    Second Edit: Forgot to partition the row_number to event

  • Quick suggestion using mickyT's data, mind you, as I posted before, that this is not accurate when using geometry.

    😎

    SELECT

    *

    FROM dbo.Location L

    OUTER APPLY #Events E

    WHERE L.coordinates.STBuffer(10000).STIntersects(E.coordinates) = 1

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply