Find the difference between 2 values

  • Experts, I am wondering if I can get some help with what I am trying to accomplish here. 

    I have 2 tables. 1st table contains zipcode information and a latitude and longitude range (data is from US post office). For example,  for this given zip code 15926 the range of latitude is between 40.04557300  and 40.04555000 and the range of longitude is between -78.84349600  and  -78.84388300. We also have another table "Contact" which we use to keep customers' data with their complete address. We also have a match process where we generate latitude and longitude by putting in the address, city, st, county and zip and it generates lat and long data for us. However, recently we have been seeing that the values the system is generating is off. Off by 10 miles and sometimes more. What I want to do is I want to create a process so for any given zipcode, I match the value which I have with a US postal code lat and long and generate a value with how much difference there is between 2 values. Please let me know if my question confuses anyone?

  • http://www.sqlservercentral.com/scripts/T-SQL/65739/

    BUT, you should be aware of several things that affect the accuracy of the data you're using. 1 - zip codes are not rectangular, (examples: http://www.aboutzipcode.com/statemaps/Ohio-zip-code-map.png) so if you're just looking for min/max of lat and long, that doesn't tell you anything about the shape. If you want to find out if a point is within the zipcode, you'd have to do some sort of shape map analysis. 2 - I am totally ignorant of how you get the data from the USPS into your table, and you should know that some software will try to plot things but if unable will default to the center point of the nearest zip/county/state/country. There is almost always data in our system that plots itself either in the geographical center of the US or in the ocean about 100 miles west of the Congo, because it defaults all the way back to 0,0. So, your data may be the issue, but the software loading it might be the cause. 3 - sometimes there are city/state associated with zip, and more than one, so some software also chooses the first one, which is not correct, but not horribly wrong if you're just trying to geomap things. We do get irritated clients who call us and tell us that their address isn't in the right city/state, and we can't really fix it because it defaults to the zip-associated data.

    All that said, maybe RBarryYoung's work in the first link above will allow you to identify some suspect data to review and see just how sticky this wicket is. Good luck!

    -------------------------------------------------------------------------------------------------------------------------------------
    Please follow Best Practices For Posting On Forums to receive quicker and higher quality responses

  • jonathan.crawford - Friday, June 22, 2018 3:52 PM

    http://www.sqlservercentral.com/scripts/T-SQL/65739/

    BUT, you should be aware of several things that affect the accuracy of the data you're using. 1 - zip codes are not rectangular, (examples: http://www.aboutzipcode.com/statemaps/Ohio-zip-code-map.png) so if you're just looking for min/max of lat and long, that doesn't tell you anything about the shape. If you want to find out if a point is within the zipcode, you'd have to do some sort of shape map analysis. 2 - I am totally ignorant of how you get the data from the USPS into your table, and you should know that some software will try to plot things but if unable will default to the center point of the nearest zip/county/state/country. There is almost always data in our system that plots itself either in the geographical center of the US or in the ocean about 100 miles west of the Congo, because it defaults all the way back to 0,0. So, your data may be the issue, but the software loading it might be the cause. 3 - sometimes there are city/state associated with zip, and more than one, so some software also chooses the first one, which is not correct, but not horribly wrong if you're just trying to geomap things. We do get irritated clients who call us and tell us that their address isn't in the right city/state, and we can't really fix it because it defaults to the zip-associated data.

    All that said, maybe RBarryYoung's work in the first link above will allow you to identify some suspect data to review and see just how sticky this wicket is. Good luck!

    I just glanced at the code referenced by Jonathon and it looks like it is possible to enhance that code by converting it to an itvf and using it in a cross apply instead of a select column list.  If I have some time I will see if I can do that.

  • NewBornDBA2017 - Friday, June 22, 2018 1:27 PM

    Experts, I am wondering if I can get some help with what I am trying to accomplish here. 

    I have 2 tables. 1st table contains zipcode information and a latitude and longitude range (data is from US post office). For example,  for this given zip code 15926 the range of latitude is between 40.04557300  and 40.04555000 and the range of longitude is between -78.84349600  and  -78.84388300. We also have another table "Contact" which we use to keep customers' data with their complete address. We also have a match process where we generate latitude and longitude by putting in the address, city, st, county and zip and it generates lat and long data for us. However, recently we have been seeing that the values the system is generating is off. Off by 10 miles and sometimes more. What I want to do is I want to create a process so for any given zipcode, I match the value which I have with a US postal code lat and long and generate a value with how much difference there is between 2 values. Please let me know if my question confuses anyone?

    This is a classic case for using the geography spatial data type and the spatial functions.
    😎

  • NewBornDBA2017 - Friday, June 22, 2018 1:27 PM

    Experts, I am wondering if I can get some help with what I am trying to accomplish here. 

    I have 2 tables. 1st table contains zipcode information and a latitude and longitude range (data is from US post office). For example,  for this given zip code 15926 the range of latitude is between 40.04557300  and 40.04555000 and the range of longitude is between -78.84349600  and  -78.84388300. We also have another table "Contact" which we use to keep customers' data with their complete address. We also have a match process where we generate latitude and longitude by putting in the address, city, st, county and zip and it generates lat and long data for us. However, recently we have been seeing that the values the system is generating is off. Off by 10 miles and sometimes more. What I want to do is I want to create a process so for any given zipcode, I match the value which I have with a US postal code lat and long and generate a value with how much difference there is between 2 values. Please let me know if my question confuses anyone?

    Instead of reinventing the wheel, have you thought about buying this one commercial service? I happen to like Melissa Data, but there are other sources. The advantage of course of not writing your own code is somebody else's legally responsible, will devote an entire staff to maintain the data, and actually send you updates as the data changes (do you really really want to monitor the changes in ZIP Codes and addresses in North America on your own?).

    Please post DDL and follow ANSI/ISO standards when asking for help. 

  • Lynn Pettis - Friday, June 22, 2018 4:14 PM

    jonathan.crawford - Friday, June 22, 2018 3:52 PM

    http://www.sqlservercentral.com/scripts/T-SQL/65739/

    BUT, you should be aware of several things that affect the accuracy of the data you're using. 1 - zip codes are not rectangular, (examples: http://www.aboutzipcode.com/statemaps/Ohio-zip-code-map.png) so if you're just looking for min/max of lat and long, that doesn't tell you anything about the shape. If you want to find out if a point is within the zipcode, you'd have to do some sort of shape map analysis. 2 - I am totally ignorant of how you get the data from the USPS into your table, and you should know that some software will try to plot things but if unable will default to the center point of the nearest zip/county/state/country. There is almost always data in our system that plots itself either in the geographical center of the US or in the ocean about 100 miles west of the Congo, because it defaults all the way back to 0,0. So, your data may be the issue, but the software loading it might be the cause. 3 - sometimes there are city/state associated with zip, and more than one, so some software also chooses the first one, which is not correct, but not horribly wrong if you're just trying to geomap things. We do get irritated clients who call us and tell us that their address isn't in the right city/state, and we can't really fix it because it defaults to the zip-associated data.

    All that said, maybe RBarryYoung's work in the first link above will allow you to identify some suspect data to review and see just how sticky this wicket is. Good luck!

    I just glanced at the code referenced by Jonathon and it looks like it is possible to enhance that code by converting it to an itvf and using it in a cross apply instead of a select column list.  If I have some time I will see if I can do that.

    According to the comments, here's an improved version of the function.

    EDIT. I would encourage the use of the spatial functions added to T-SQL on 2008.

    Create Function calculateDistance(
       @Lat1 Float,
       @Lon1 Float,
       @Lat2 Float,
       @Lon2 Float,
       @Unit varchar(5)
    ) Returns TABLE WITH SCHEMABINDING
    /*
       Faster way to calculate distance in miles using Latitude & Longitude.
    This is accurate in miles to about 4.5 decimal places, except for very small distances.

    Ref: The formula is derived from the Spherical Law of Cosines,

    --RBarryYoung, 31-Jan-2009
    ,      KM: 17-Feb-2009
    --Luis Cazares, 25-Jun-2018

    tests:
    select Distance FROM dbo.calculateDistance(31.0, -93.0, 31.1, -93.0, 'Miles')   --should be 6.9169 miles
    select Distance FROM dbo.calculateDistance(31.0, -93.0, 31.0, -93.1, 'Miles')   --should be 5.9290 miles
    select Distance FROM dbo.calculateDistance(20.0, -93.0, 20.0, -93.1, 'Miles')   --should be 6.4998 miles
    select Distance FROM dbo.calculateDistance(40.0, -93.0, 40.0, -93.1, 'Miles')   --should be 5.2987 miles
    */
    AS RETURN
       WITH CTE AS(
         SELECT sin(@Lat1/RadiansFactor) * sin(@Lat2/RadiansFactor)
          + cos(@Lat1/RadiansFactor) * cos(@Lat2/RadiansFactor) * cos((@Lon2-@Lon1)/RadiansFactor)
           AS Pre
        FROM (SELECT 180./PI() AS RadiansFactor)rf --(converts Lat/Lon degrees to spherical radians)
       )
       SELECT mu.Factor * acos( CASE WHEN ABS(CTE.Pre) > 1 THEN SIGN(CTE.Pre) ELSE CTE.Pre END) AS Distance
       FROM CTE
       CROSS JOIN (VALUES(6357.0, 'Kms'), (3963.0,'Miles'))mu(Factor, Unit)
       WHERE mu.Unit = @Unit
    GO

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Luis Cazares - Monday, June 25, 2018 11:26 AM

    Lynn Pettis - Friday, June 22, 2018 4:14 PM

    jonathan.crawford - Friday, June 22, 2018 3:52 PM

    http://www.sqlservercentral.com/scripts/T-SQL/65739/

    BUT, you should be aware of several things that affect the accuracy of the data you're using. 1 - zip codes are not rectangular, (examples: http://www.aboutzipcode.com/statemaps/Ohio-zip-code-map.png) so if you're just looking for min/max of lat and long, that doesn't tell you anything about the shape. If you want to find out if a point is within the zipcode, you'd have to do some sort of shape map analysis. 2 - I am totally ignorant of how you get the data from the USPS into your table, and you should know that some software will try to plot things but if unable will default to the center point of the nearest zip/county/state/country. There is almost always data in our system that plots itself either in the geographical center of the US or in the ocean about 100 miles west of the Congo, because it defaults all the way back to 0,0. So, your data may be the issue, but the software loading it might be the cause. 3 - sometimes there are city/state associated with zip, and more than one, so some software also chooses the first one, which is not correct, but not horribly wrong if you're just trying to geomap things. We do get irritated clients who call us and tell us that their address isn't in the right city/state, and we can't really fix it because it defaults to the zip-associated data.

    All that said, maybe RBarryYoung's work in the first link above will allow you to identify some suspect data to review and see just how sticky this wicket is. Good luck!

    I just glanced at the code referenced by Jonathon and it looks like it is possible to enhance that code by converting it to an itvf and using it in a cross apply instead of a select column list.  If I have some time I will see if I can do that.

    According to the comments, here's an improved version of the function.

    EDIT. I would encourage the use of the spatial functions added to T-SQL on 2008.

    Create Function calculateDistance(
       @Lat1 Float,
       @Lon1 Float,
       @Lat2 Float,
       @Lon2 Float,
       @Unit varchar(5)
    ) Returns TABLE WITH SCHEMABINDING
    /*
       Faster way to calculate distance in miles using Latitude & Longitude.
    This is accurate in miles to about 4.5 decimal places, except for very small distances.

    Ref: The formula is derived from the Spherical Law of Cosines,

    --RBarryYoung, 31-Jan-2009
    ,      KM: 17-Feb-2009
    --Luis Cazares, 25-Jun-2018

    tests:
    select Distance FROM dbo.calculateDistance(31.0, -93.0, 31.1, -93.0, 'Miles')   --should be 6.9169 miles
    select Distance FROM dbo.calculateDistance(31.0, -93.0, 31.0, -93.1, 'Miles')   --should be 5.9290 miles
    select Distance FROM dbo.calculateDistance(20.0, -93.0, 20.0, -93.1, 'Miles')   --should be 6.4998 miles
    select Distance FROM dbo.calculateDistance(40.0, -93.0, 40.0, -93.1, 'Miles')   --should be 5.2987 miles
    */
    AS RETURN
       WITH CTE AS(
         SELECT sin(@Lat1/RadiansFactor) * sin(@Lat2/RadiansFactor)
          + cos(@Lat1/RadiansFactor) * cos(@Lat2/RadiansFactor) * cos((@Lon2-@Lon1)/RadiansFactor)
           AS Pre
        FROM (SELECT 180./PI() AS RadiansFactor)rf --(converts Lat/Lon degrees to spherical radians)
       )
       SELECT mu.Factor * acos( CASE WHEN ABS(CTE.Pre) > 1 THEN SIGN(CTE.Pre) ELSE CTE.Pre END) AS Distance
       FROM CTE
       CROSS JOIN (VALUES(6357.0, 'Kms'), (3963.0,'Miles'))mu(Factor, Unit)
       WHERE mu.Unit = @Unit
    GO

    Thank, Luis.  I just wasn't able to find the time this weekend.  Fairly busy weekend out of the house.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply