Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««12

Questions regarding Spatial Data performance Expand / Collapse
Author
Message
Posted Sunday, November 3, 2013 12:20 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Yesterday @ 1:18 PM
Points: 6,056, Visits: 8,342
Hi Razzgor,

Thanks for posting the plan. Like I said, I'm not a spatial expert - but I do know a fair bit about plans. So let's start with some generic stuff.

1. Don't trust the percentages in the plan. These are based on the estimated cost of the operator, not on the actual cost (which is not measured per operator). If slow processing is caused by bad estimates, these percentages are based on those bad estimates. They will not show where the real pain is.

2. The table scan is not a problem. The query has no filter on the #fact_table, so it will need to read all rows. Scanning is the fastest and most efficient way to fetch all rows. The fact that it's a table scan, not a (clustered) index scan, proves that you have no clustered index on the table - for a permanent table almost always a bad thing, for a temp table not so much, though it can be. But in this case not the problem.

3. Outer joins give the optimizer much less freedom than inner joins. Do you really need this to be an outer join? Even if that is the case, I would still like to know if you get much better performance from an inner join.

4. The way you built the query makes it hard for the optimizer to optimize. I don't know the datatype and contents of the IP_Converted column, but I do know that you convert it to varchar(20), then convert that (using STGeomFromText) to a point, which is then fed into the STWithin function. This means that whatever statistics the optimizer may have on IP_Converted is useless for this query. Do you really need this complex double converston?
Can you try adapting the procedure to add an extra column to the temp table that holds the result of the conversion (so the point, as geometry data type)? It can be a normal column, or a persisted computed column. Then build a spatial index on that column too. Does this help?

5. You wrote in the original post that your benchmark worked fine, but it doesn't work well from within the report. Is the code exactly the same, or are there differences? And how about the execution plans? (I don't know how your report works; you may have to use profiler or extended events to get the exact query).

6. The plan you posted will fetch each row from the temporary fact table (the top-right table scan), then do some arithmetic in the Compute Scalar (that's where the Geomerty value is calculated from the IP_Converted column). Then, for each row, the GEO2IP table is scanned (the clustered index scan bottom-right - this IS a big part of the problem); those rows are passed through a filter (I'm willing to bet that this includes the georange.STwithin calculation), and then the join is performed. For a query like this, I would much rather have seen a merge or hash join, but I think those are not possible for geometry/geography functions. A nested loops join like this, but using the spatial index in the bottom part (instead of the clustered index scan + filter), would probably be the best option. Which, of course, brings us back to your original question. ;)


Can you try to create a repro? I understand that the data, and maybe even the schema, is confidential. That doesn't hinder a repro. You can create new tables, with fake column names, put in some fake data (but similar enough to your real problem to demonstrate the behaviour). Then post that schema (as CREATE TABLE statements) and data (as INSERT statements) here. Don't go overboard with the sample data - if you can repro the issue with 10 rows, don't post 1,000.

Some other questions:
* How many rows are in each of the two tables involved?
* Are the values in IP_Converted all different? If not, how many distinct values are there? If it's low, you could try to rewrite the query to nudge the optimizer to a plan that will first order in IP_Convnerted and perform the join only once per distinct value.
* In an actual execution plan, what are the rowcounts (both estimated and actual) for all of the arrows? And what are the execution counts (both estimated and actual) for the operators on the bottom part of the nested loops join?
* If I understand the query, you have a bunch of points in the temp table (in IP_Converted), a bunch of areas in the geo_range, and now want to report eachh point with all areas that enclose the point. An idea I have might or might not help at all - depending on the actual data distribution. What if you first create a new geometry object that includes all the points in the temp table (no idea how to do that - like I said, I have no experience with Spatial), then use STOverlaps between that area and the data in GEO2IP to quickly weed out all rows that can never overlap with anything. (This would be useful if, for instance, GEO2IP covers the whole world and the temp table is always restricted to a small area, like a single state; it would be useful if the temp table can cover all of the area).

EDIT: Sorry for the overlaps with some previous questions; I somehow overlooked those message before posting mine.



Hugo Kornelis, SQL Server MVP
Visit my SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis
Post #1510921
Posted Sunday, November 3, 2013 1:41 PM
Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Today @ 2:04 PM
Points: 1,066, Visits: 3,138
Hi

Having another look at your query, the points you are creating for the fact table are in a straight line on a zero y coordinate, unless you are creating a point with zero height.
FROM #fact_table ip
LEFT JOIN GEO2IP tr ON geo_range.STWithin(geometry::STGeomFromText(
'POINT('+CONVERT(varchar(20),ip.IP_Converted)+' 0)',0)) = 1

So when you build a point it will look something like POINT(2155.21 0), this may be part of your problem.

Can you give us an example of a point that you are querying with and a geo_range from your geo2ip table
SELECT TOP 1 geometry::STGeomFromText('POINT('+CONVERT(varchar(20),ip.IP_Converted)+' 0)',0).ToString() FROM #fact_table;
SELECT TOP 1 geo_range.ToString() FROM geo2ip;



Post #1510929
Posted Sunday, November 3, 2013 2:40 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Wednesday, October 22, 2014 12:10 PM
Points: 7, Visits: 88
mickyT (11/3/2013)
Hi

Having another look at your query, the points you are creating for the fact table are in a straight line on a zero y coordinate, unless you are creating a point with zero height.
FROM #fact_table ip
LEFT JOIN GEO2IP tr ON geo_range.STWithin(geometry::STGeomFromText(
'POINT('+CONVERT(varchar(20),ip.IP_Converted)+' 0)',0)) = 1

So when you build a point it will look something like POINT(2155.21 0), this may be part of your problem.

Can you give us an example of a point that you are querying with and a geo_range from your geo2ip table
SELECT TOP 1 geometry::STGeomFromText('POINT('+CONVERT(varchar(20),ip.IP_Converted)+' 0)',0).ToString() FROM #fact_table;
SELECT TOP 1 geo_range.ToString() FROM geo2ip;





Yes, that's intentional. The geometry built from the IP ranges are LINESTRINGs so there is no height
Here's kind of a quick rundown on how the tables are structured:





@Hugo, thanks for the huge post full of information. I am reading through and will post a response once I organize my thoughts.
Post #1510933
Posted Sunday, November 3, 2013 5:01 PM
Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Today @ 2:04 PM
Points: 1,066, Visits: 3,138
Hi

Sorry for the delay getting back ... I see what you are doing now. While you can use geometry to do this, it probably isn't the quickest way to resolve the query. You would probably be better with
left outer join geo2ip on ip_convert between ip2startinteger and ip2endinteger



Also STIntersects is probably the best method to use for this (point on line), if you decide to carry on with geometries.

I did a bit of testing on this using some made up data. The one thing that did kill the query was when I used the following spatial index definition across my range.

create spatial index g2i_sdx on #geo2ip (geo_range) 
using geometry_grid
with (bounding_box = (0, -2, 2001000, 0), grids=(HIGH,MEDIUM,LOW,LOW));

So avoid using 0 in either the miny or maxy bounding box parameters.

This is the test script that I ran. I was suprised to see that the geometry query perform as well if not better than the int between query. So I've had to eat my words above.

-- fact table with 10000 points some overlapping some not
create table #fact_table (
id int identity primary key,
ip_converted int
);

with tally as (
select row_number() OVER (order by (select null)) N
from (values (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) E1 (N),
(values (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) E2 (N)
),
createFact as (
select CAST(RAND(CAST(NEWID() AS VARBINARY)) * 2000000.0 AS INT) ip_converted
from tally x, tally y
)
insert into #fact_table ( ip_converted)
select ip_converted
from createFact;

-- Create a grid table with 1,000,000 cells
create table #geo2ip (
id int identity primary key
,geo_range geometry
,ipstartint int
,ipendint int
);

with tallyPlus as (
select row_number() OVER (order by (select null)) N,
CAST(RAND(CAST(NEWID() AS VARBINARY)) * 1000000.0 AS INT) R
from (values (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) E1 (N),
(values (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) E2 (N),
(values (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) E3 (N)
),
createLine as (
select Geometry::STGeomFromText('LINESTRING ('
+ CAST(cast(x.R + y.R as int) AS VARCHAR(10)) + ' 0, ' +
+ CAST(cast(x.R + y.R + 2 as int) AS VARCHAR(10)) + ' 0)',0) geo_range,
cast(x.R + y.R as int) ipstartint,
cast(x.R + y.R + 2 as int) ipendint
from tallyPlus x, tallyPlus y
)
insert into #geo2ip (geo_range, ipstartint, ipendint)
select geo_range, ipstartint, ipendint
from createLine;

create spatial index g2i_sdx on #geo2ip (geo_range)
using geometry_grid
with (bounding_box = (0, -2, 2001000, 1), grids=(HIGH,MEDIUM,LOW,LOW));

create index g2i_ip_idx on #geo2ip (ipstartint, ipendint);
select f.id,
g.id,
ip_converted
from #fact_table f
left outer join #geo2ip g on geo_range.STIntersects(
geometry::STGeomFromText('POINT('+CONVERT(varchar(20),f.IP_Converted)+' 0)',0)
) = 1;

select f.id,
g.id,
ip_converted
from #fact_table f
left outer join #geo2ip g on ip_converted between ipstartint and ipendint;

drop table #fact_table;
drop table #geo2ip;


Post #1510944
Posted Sunday, November 3, 2013 5:10 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Wednesday, October 22, 2014 12:10 PM
Points: 7, Visits: 88
Thanks, I will ad STIntersects to my list of things to test out. And yeah, BETWEEN on integer comparison absolutely kills performance, it was the reason I started doing this experimental stuff with spatial data in the first place. I'm not sure, but I am pretty sure that I set the miny and maxy bounding box to -1 and 1 respectively.
Post #1510947
Posted Monday, November 4, 2013 3:55 PM
Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Today @ 2:04 PM
Points: 1,066, Visits: 3,138
Hi

I been concerned that there has to be a better way of getting the results that you want. You may want to try out the following, it performs extremely well compared to the geometry option.
with ctetally as (
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) N
FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) e1 (N), --10
(VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) e2 (N), --100
(VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) e3 (N), --1000
(VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) e4 (N), --10000
(VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) e5 (N) --100000
),
expandGeo as (
select ipstartint + N ipint, g.id
from #geo2ip g
cross apply (SELECT TOP ((ipendint - ipstartint) + 1) N - 1 N FROM ctetally) c
)
select f.id,
g.id,
ip_converted
from #fact_table f
left outer join expandGeo g on f.ip_converted = g.ipint;

Post #1511321
« Prev Topic | Next Topic »

Add to briefcase ««12

Permissions Expand / Collapse