September 24, 2014 at 1:34 pm
Thinking about this a bit more. If you replace the distance calculation in the current version with the haversine formula, you should get a reasonably accurate median.
So replace
denomsum = SUM(1 / SQRT(POWER(@X - point.STX,2) + POWER(@Y - point.STY,2))),
X = SUM((point.STX * (1 / SQRT(POWER(@X - point.STX,2) + POWER(@Y - point.STY,2)))) / @denomsum),
Y = SUM((point.STY * (1 / SQRT(POWER(@X - point.STX,2) + POWER(@Y - point.STY,2)))) / @denomsum)
with
denomsum = SUM(1 / (ACOS( SIN(@Y)*SIN(point.STY) + COS(@Y)*COS(point.STY)*COS(@Y-@X) ) * 6371)),
X = SUM((point.STX * (1 / (ACOS( SIN(@Y)*SIN(point.STY) + COS(@Y)*COS(point.STY)*COS(@Y-@X) ) * 6371))) / @denomsum),
Y = SUM((point.STY * (1 / (ACOS( SIN(@Y)*SIN(point.STY) + COS(@Y)*COS(point.STY)*COS(@Y-@X) ) * 6371))) / @denomsum)
I haven't tested this yet, so I hope I got it right.
Edit: Here's a handy site http://www.movable-type.co.uk/scripts/latlong.html
September 24, 2014 at 2:31 pm
There is a big difference in the results when doing the calculation on geometry or geography, look at this examples and results
Geometry
USE tempdb;
GO
DECLARE @GEOM TABLE
(
GNAME VARCHAR(10) NOT NULL
,GPoint GEOMETRY NOT NULL
);
INSERT INTO @GEOM(GNAME,GPoint)
VALUES
('3,1' ,GEOMETRY::Point (3,1,0))
,('1,25' ,GEOMETRY::Point (1,25,0))
,('18,30' ,GEOMETRY::Point (18,30,0))
,('27,16' ,GEOMETRY::Point (27,6,0))
,('32,6' ,GEOMETRY::Point (32,16,0))
,('15,0.5',GEOMETRY::Point (15,0.5,0))
,('5,15' ,GEOMETRY::Point (5,15,0));
;WITH BASE_AGGREGATION AS
(
SELECT
geometry::ConvexHullAggregate(GPoint) AS AGPoints
FROM @GEOM G
)
SELECT
'Centre Point' AS P_LABEL
,BA.AGPoints.STCentroid ().STBuffer(0.5)
FROM BASE_AGGREGATION BA
UNION ALL
SELECT
'Convex'
,BA.AGPoints
FROM BASE_AGGREGATION BA
UNION ALL
SELECT
G.GNAME
,G.GPoint.STBuffer(0.3)
FROM @GEOM G;
Geography
USE tempdb;
GO
DECLARE @GEOM TABLE
(
GNAME VARCHAR(10) NOT NULL
,GPoint GEOGRAPHY NOT NULL
);
INSERT INTO @GEOM(GNAME,GPoint)
VALUES
('3,1' ,GEOGRAPHY::Point (3,1,4326))
,('1,25' ,GEOGRAPHY::Point (1,25,4326))
,('18,30' ,GEOGRAPHY::Point (18,30,4326))
,('27,16' ,GEOGRAPHY::Point (27,6,4326))
,('32,6' ,GEOGRAPHY::Point (32,16,4326))
,('15,0.5',GEOGRAPHY::Point (15,0.5,4326))
,('5,15' ,GEOGRAPHY::Point (5,15,4326));
;WITH BASE_AGGREGATION AS
(
SELECT
GEOGRAPHY::ConvexHullAggregate(GPoint) AS AGPoints
FROM @GEOM G
)
SELECT
'Centre Point' AS P_LABEL
,BA.AGPoints.EnvelopeCenter ().STBuffer(50000)
FROM BASE_AGGREGATION BA
UNION ALL
SELECT
'Convex'
,BA.AGPoints
FROM BASE_AGGREGATION BA
UNION ALL
SELECT
G.GNAME
,G.GPoint.STBuffer(50000)
FROM @GEOM G;
Geometry results
Geography (Mercator projection) results
Edit: Added (Mercator projection)
September 24, 2014 at 2:54 pm
I would expect that given the large variation of lats/longs. All of my data points will be within 1 degree latitude and longitude so I think i can get away with it.
September 24, 2014 at 3:18 pm
busraker (9/24/2014)
I would expect that given the large variation of lats/longs. All of my data points will be within 1 degree latitude and longitude so I think i can get away with it.
Guess that would be fine, the example is quite exaggerated
September 24, 2014 at 6:02 pm
The function doesn't work anyway, even for straight up geometric medians of non-geographic multi-points. I calculated the geometric median for a few thousand sample test multi-points from my database, and summed the distance versus a random point to make sure the random point was never a better solution. Many failed the test. For example with this test data set:
DECLARE @sampleGeom AS udttPDXGeom
INSERT INTO @sampleGeom
VALUES
(GEOMETRY::Point(122.688,45.515,0))
,(GEOMETRY::Point(122.685,45.517,0))
,(GEOMETRY::Point(122.48,45.527,0))
,(GEOMETRY::Point(122.558,45.549,0))
,(GEOMETRY::Point(122.683,45.522,0));
Against a Random Point:
WITH pts AS (
SELECT
geompt AS geomPoint,
dbo.udfPDXGeomMedian(@sampleGeom, 100) AS geomMedian,
GEOMETRY::Point(122.6719651,45.5218728,0) AS geomTest
FROM
@sampleGeom)
,dist AS (
SELECT
geomPoint.STDistance(geomMedian) DistanceToMedian,
geomPoint.STDistance(geomTest) DistanceToTest,
pts.*
FROM
pts)
SELECT
COUNT(*) iPoints,
SUM(DistanceToMedian) MedianTotal,
SUM(DistanceToTest) TestTotal
FROM
dist
The results favored the test point significantly despite how many iterations I chose:
iPointsMedianTotalTestTotal
50.4182644767160040.351580012335012
For reference this is my UDTT and Function:
CREATE TYPE [dbo].[udttPDXGeom] AS TABLE(
[geompt] [geometry] NULL
)
GO
And final Function:
CREATE FUNCTION [dbo].[udfPDXGeomMedian] (@geomPts udttPDXGeom READONLY, @iIterations NUMERIC)
RETURNS GEOMETRY AS
BEGIN
DECLARE @tempXFLOAT = 0.0;
DECLARE @tempYFLOAT = 0.0;
DECLARE @denomsumFLOAT;
SELECT@tempx = EXP(SUM(LOG(geompt.STX))/COUNT(*)),
@tempy = EXP(SUM(LOG(geompt.STY))/COUNT(*)) FROM @geomPts
WHILE @iIterations > 0
BEGIN
SET @iIterations -= 1;
SELECT @denomsum = SUM(1 / (ACOS( SIN(@tempY)*SIN(geompt.STY) + COS(@tempY)*COS(geompt.STY)*COS(@tempY-@tempX) ) * 6371))
FROM @geomPts
SELECT @tempX = SUM((geompt.STX * (1 / (ACOS( SIN(@tempY)*SIN(geompt.STY) + COS(@tempY)*COS(geompt.STY)*COS(@tempY-@tempX) ) * 6371)))
/ @denomsum),
@tempY = SUM((geompt.STY * (1 / (ACOS( SIN(@tempY)*SIN(geompt.STY) + COS(@tempY)*COS(geompt.STY)*COS(@tempY-@tempX) ) * 6371))) / @denomsum)
FROM @geomPts
END
RETURN GEOMETRY::Point(@tempx, @tempy,0)
END
September 24, 2014 at 7:40 pm
**Insert bad word here**, I've either translated the python wrong or it doesn't work.
I've just tried it with some of my real data rather than the mocked up points I did before and it ends up getting worse.
I'm investigating at the moment and will get back.
September 24, 2014 at 8:38 pm
I've removed the rcte version as that appeared to be behaving badly. The looped version appears to work OK.
To avoid issues where distances in the procedure get calculated badly I replaced them with the built in function. This does make it go slower.
I tested the following by creating a circle of points (132) 100 km around a point.
Origin 2000000, 5000000
For Geometry the X was bang on. The Y was bang on.
For Geography the Long was 99m out and the Lat 780m out.
Geometry Version:
DECLARE @tempX float = 0.0;
DECLARE @tempY float = 0.0;
DECLARE @iterations int = 50;
DECLARE @denomsum float;
SELECT @tempX = EXP(SUM(LOG(geom.STX))/COUNT(*)),
@tempY = EXP(SUM(LOG(geom.STY))/COUNT(*))
FROM testmedian;
WHILE @iterations > 0
BEGIN
SET @iterations -= 1;
SELECT @denomsum = SUM(1 / geom.STDistance(Geometry::Point(@tempx, @tempy,geom.STSrid)))
FROM testmedian;
SELECT @tempX = SUM((geom.STX * (1 / geom.STDistance(Geometry::Point(@tempx, @tempy,geom.STSrid)))) / @denomsum),
@tempy = SUM((geom.STY * (1 / geom.STDistance(Geometry::Point(@tempx, @tempy,geom.STSrid)))) / @denomsum)
FROM testmedian;
END;
select @tempx, @tempy
Geography Version:DECLARE @tempX float = 0.0;
DECLARE @tempY float = 0.0;
DECLARE @iterations int = 50;
DECLARE @denomsum float;
SELECT @tempX = 0,
@tempY = 0
WHILE @iterations > 0
BEGIN
SET @iterations -= 1;
SELECT @denomsum = SUM(1 / Geog.STDistance(Geography::Point(@tempy,@tempx,geog.STSrid)))
FROM testmedian;
SELECT @tempX = SUM((geog.Long * (1 / Geog.STDistance(Geography::Point(@tempy,@tempx,geog.STSrid)))) / @denomsum),
@tempy = SUM((geog.Lat * (1 / Geog.STDistance(Geography::Point(@tempy,@tempx,geog.STSrid)))) / @denomsum)
FROM testmedian;
END;
select @tempx, @tempy
September 25, 2014 at 5:03 pm
I got the geometry version working beautifully after getting rid of the division by zeroes. I also had to ABS the longitude and then flip it back after:
CREATE TYPE [dbo].[udttPDXGeom] AS TABLE(
[geompt] [geometry] NULL
)
CREATE FUNCTION [dbo].[udfPDXGeomMedian] (@geomPts udttPDXGeom READONLY, @iIterations NUMERIC)
RETURNS GEOMETRY AS
BEGIN
DECLARE @tempXFLOAT = 0.0;
DECLARE @tempYFLOAT = 0.0;
DECLARE @denomsumFLOAT;
DECLARE @g2 udttPDXGeom
INSERT INTO @g2 SELECT * FROM @geomPts
SELECT@tempx = EXP(SUM(LOG(geompt.STX))/COUNT(*)),
@tempy = EXP(SUM(LOG(geompt.STY))/COUNT(*)) FROM @geomPts
WHILE @iIterations > 0
BEGIN
SET @iIterations -= 1;
SELECT @denomsum = SUM(1 / geompt.STDistance(Geometry::Point(@tempx, @tempy,geompt.STSrid)))
FROM @g2 WHERE geompt.STDistance(Geometry::Point(@tempx, @tempy,geompt.STSrid)) <> 0;
SELECT @tempX = SUM((geompt.STX * (1 / geompt.STDistance(Geometry::Point(@tempx, @tempy,geompt.STSrid)))) / @denomsum),
@tempy = SUM((geompt.STY * (1 / geompt.STDistance(Geometry::Point(@tempx, @tempy,geompt.STSrid)))) / @denomsum)
FROM @g2 WHERE geompt.STDistance(Geometry::Point(@tempx, @tempy,geompt.STSrid)) <> 0;
END
RETURN GEOMETRY::Point(@tempx, @tempy,0)
END
I was able to process 1489 sets with 8641 total locations in 4:19 which is totally fine. I'll see if I can get the geography version going as well.
September 25, 2014 at 5:11 pm
That's excellent
September 25, 2014 at 5:50 pm
Add the Geography Version:
CREATE TYPE [dbo].[udttPDXGeog] AS TABLE(
[geogpt] [geography] NULL
ALTER FUNCTION [dbo].[udfPDXGeogMedian] (@geogPts udttPDXGeog READONLY, @iIterations NUMERIC)
RETURNS GEOGRAPHY AS
BEGIN
DECLARE @tempXFLOAT = 0.0;
DECLARE @tempYFLOAT = 0.0;
DECLARE @denomsumFLOAT;
DECLARE @sridNUMERIC;
DECLARE @g2 udttPDXGeog;
INSERT INTO @g2 SELECT * FROM @geogPts
SELECT@tempx = EXP(SUM(LOG(ABS(geogpt.Long)))/COUNT(*)) * SIGN(SUM(geogpt.Long)) ,
@tempy = EXP(SUM(LOG(ABS(geogpt.Lat)))/COUNT(*)) * SIGN(SUM(geogpt.Lat)) ,
@srid = geogpt.STSrid FROM @g2 GROUP BY geogpt.STSrid
WHILE @iIterations > 0
BEGIN
SET @iIterations -= 1;
SELECT @denomsum = SUM(1 / geogpt.STDistance(GEOGRAPHY::Point(@tempy, @tempx, geogpt.STSrid)))
FROM @g2 WHERE geogpt.STDistance(GEOGRAPHY::Point(@tempy, @tempx, geogpt.STSrid)) <> 0;
SELECT @tempX = SUM((geogpt.Long * (1 / geogpt.STDistance(GEOGRAPHY::Point(@tempy, @tempx,geogpt.STSrid)))) / @denomsum),
@tempy = SUM((geogpt.Lat * (1 / geogpt.STDistance(GEOGRAPHY::Point( @tempy, @tempx,geogpt.STSrid)))) / @denomsum)
FROM @g2 WHERE geogpt.STDistance(GEOGRAPHY::Point(@tempy, @tempx, geogpt.STSrid)) <> 0;
END
RETURN GEOGRAPHY::Point(@tempy, @tempx, @srid)
END
Thanks for all of your help!
Viewing 10 posts - 16 through 25 (of 25 total)
You must be logged in to reply to this topic. Login to reply
This website stores cookies on your computer.
These cookies are used to improve your website experience and provide more personalized services to you, both on this website and through other media.
To find out more about the cookies we use, see our Privacy Policy