Zip code search without using #temptable

  • This is code for doing a zip code radius search that seems to work well. I wonder if it is possible to eliminate using the #Temp table in it though to make it quicker? My table sizes are growing and I want to avoid having performance issues in the future. Thanks.

    --===== Declare some obviously-named local variables

    DECLARE @StartLat FLOAT,

    @StartLon FLOAT,

    @MaxNorthLat FLOAT,

    @MaxSouthLat FLOAT,

    @DegreesPerMile FLOAT

    ;

    --===== Precalculate the Lat/Long for the ZipCode we're looking for and then

    -- figure out the Latitude optimization from that using the search radius

    SELECT @DegreesPerMile = 360/24859.82, --24859.82 = Circumference of the Earth

    @StartLat = Latitude,

    @StartLon = Longitude,

    @MaxNorthLat = Latitude + (@radius*@DegreesPerMile),

    @MaxSouthLat = Latitude - (@radius*@DegreesPerMile)

    FROM dbo.ZipCodes

    WHERE ZipCode = @zip;

    --========== done precalculating

    -- Create CTE

    WITH

    cteFindDistances AS

    ( --=== Find the optimized band of ZipCodes and calculate their distance using the hi-speed iTVF. - This code thanks to Jeff Moden

    SELECT d.Distance, z.ZipCode, z.State

    FROM dbo.ZipCodes z

    CROSS APPLY dbo.fnDistanceITVF(@StartLat, @StartLon, z.Latitude, z.Longitude) d

    WHERE z.Latitude >= @MaxSouthLat

    AND z.Latitude <= @MaxNorthLat

    ) --=== Return only those within the desired radius

    -- Use the CTE

    SELECT Distance, ZipCode

    INTO #TempZip

    FROM cteFindDistances

    WHERE Distance <= @radius

    SELECT * FROM

    (

    SELECT

    TotalRows = COUNT(*) OVER(),

    RowNum = ROW_NUMBER()OVER (ORDER BY T.Distance),

    P.ProviderID, P.Title, P.Company, P.City, P.State, DistanceInMiles = T.Distance

    FROM dbo.Providers P WITH (NOLOCK)

    INNER JOIN #TempZip T ON T.ZipCode = P.Zip

    )

    AS XYZ

    WHERE RowNum BETWEEN @startRow AND (@startRow + @rowsPerPage) - 1

    ORDER BY RowNum ASC

    DROP TABLE #TempZip

  • Would have to see dbo.fnDistanceITVF to be able to determine if it would wrap in well.

    In some ways, the bigger your underlying data sets the better the #tmp performs, not worse. However, using an index of some form on the intermediary table may be effective for you. Would want to see the .sqlplan before advising that though.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • This is the ITVF:

    [dbo].[fnDistanceITVF]

    --===== Returns distance when passed two points, latitude and longitude

    -- Converted to a high performance iTVF by Jeff Moden - 24 Jun 2011

    (

    @Lat1 FLOAT,

    @Lon1 FLOAT,

    @Lat2 FLOAT,

    @Lon2 FLOAT

    )

    RETURNS TABLE

    AS

    RETURN WITH

    cteFirstPass AS

    ( --=== Calculate the GREAT CIRCLE distance in Radians using the Haversine Formula

    SELECT fp1Temp = (SIN(RADIANS(@Lat1)) * SIN(RADIANS(@Lat2)))

    + (COS(RADIANS(@Lat1)) * COS(RADIANS(@Lat2)))

    * COS(RADIANS(@Lon2) - RADIANS(@Lon1))

    ),

    cteSecondPass AS

    ( --=== This makes a correction not accounted for in the Haversine Formula above

    SELECT fp2Temp = CASE WHEN fp1Temp > 1 THEN 1 WHEN fp1Temp < -1 THEN -1 ELSE fp1Temp END

    FROM cteFirstPass

    )

    --=== This does the final calculation for distance

    -- where 3958.75586574 is the Radius of the Earth in miles (as some see it)

    SELECT Distance = 3958.75586574 * ACOS(fp2Temp)

    FROM cteSecondPass

  • So the basic idea of this is to find the # of applicable businesses within a radius from a zip code. Got it.

    ... <deletes a bunch of stuff>

    Can we see the schema for dbo.Providers and dbo.ZipCodes?

    A lot of my recommendations will depend on your index setups on the Latitude information and the provider's table's ability to connect to the indexes on ZipCodes.

    General Thought: You can with some reasonability pre-calc the Latitude boundaries and use that as a filter after a join to providers. That will thin down the possible entries significanly, but not overly significantly. You've just picked up every provider in a (ex: 25 mile) stripe through the country.

    From there you're filtering again based on distance to remove a bunch of stuff after a distance calcualtion to deal with the longitude. Depending on the multipliers here (let's say 10 providers/zip code) you're reducing from 10,000 zipcodes vs. 100,000 providers in a mid-step.

    Really... I'd have to tinker with it. See some .sqlplans. Check the data heuristics.

    However, if Providers is the one growing (I can't see Zip Codes changing much)... I'd avoid removing that temp table, actually. It's saving you from a lot of headaches. Pre-Build #TempZip to have a Unique Clustered Index on ZipCode and make sure your other tables have a covering index on ZipCode as the leading column to assist in seeks.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Thanks Craig, that's helpful to know that the temp table may be saving me headaches.

    Here are the structures/indexes:

    [dbo].[Providers](

    [ProviderID] [int] IDENTITY(1,1) NOT NULL, -- PK, Unique, indexed, non-clustered

    [FirstName] [nvarchar](100) NULL,

    [LastName] [nvarchar](100) NULL,

    [Address1] [nvarchar](100) NULL,

    [Address2] [nvarchar](100) NULL,

    [City] [nvarchar](100) NULL,

    [State] [nvarchar](100) NULL,

    [Zip] [nvarchar](10) NULL -- Indexed, Non-unique, clustered

    [dbo].[ZipCodes](

    [ZipCode] [nvarchar](10) NOT NULL,

    [Latitude] [decimal](18, 8) NULL,

    [Longitude] [decimal](18, 8) NULL,

    [City] [nvarchar](200) NULL,

    [State] [nvarchar](200) NULL,

    [County] [nvarchar](200) NULL,

    Index: Latitude (ASC), Longitude (ASC) -- Non-unique, Clustered

    Index: ZipCode -- PK, unique, non-clustered

  • You clustered ZipCodes on Lat and Lon? You don't use a zipcode shortcut for city/state addressing I take it. Most of your access to that table is by Latitude search, then, I suppose?

    More surprising to me is Providers is already clustered on ZipCode. That's a very unique data pattern to expect, where your searches by zip code hit the tables more often than other read. However, if it's your pattern, excellent setup. 🙂

    Yeah, in this case, I'd personally just look at indexing #TempZip. If you need to optimize before the Providers search even further, I'd review ways to speed up the pre-#temp reduction by distance, such as squaring it off with @maxLon/@minLon too so you dealt with a square instead of a stripe of ZipCodes, and then used distance to remove the corners. My guess is if you wrap everything in time controls, though, that's a minimal concern.

    Under normal circumstances there's a problem with a query when we look to optimize it. In this case you're pre-optimizing. There's such a thing as over-engineering, and we may be approaching that. What's the runtime on this query? Are you seeing specific wait types that have you concerned? Have you tried going into dev and duplicating your providers table x10, or x100, and seen if the result return time becomes unacceptable? Is there something in the Execution Plan that's triggering a concern?


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Thanks very much Craig. Your knowledge is very impressive. - Matt

    - The providers table has 60k records. Performance now is good, but I may increase the table size to 500k records.

    - Most searches on the Providers table are done by ZipCode (radius search) or by City/State. If searching by City/State, I lookup the corresponding zipcode and then search on that. Occasional searches on Providers are by LastName.

    A) Not sure where the clustered index should be on ZipCodes. Maybe not on Lat/Long?

    B) >> "My guess is if you wrap everything in time controls, though, that's a minimal concern." Can you explain what you mean by "time controls?"

    C) Pre-indexing #TempZip .... not sure how to do that? Create an index on it once the table is created?

  • matt6749 (11/24/2014)


    Thanks very much Craig. Your knowledge is very impressive. - Matt

    Only until one of the big kids around here steps in and reminds me I'm a noob. 😉

    - The providers table has 60k records. Performance now is good, but I may increase the table size to 500k records.

    Then start there. In dev generate up a 500k table, even if you just replicate the exact same records 10x. Then side by side create equivalent queries and test them. See if you get a significant performance degradation. If you do, snag the .sqlplan (see the link in my signature for index/tuning help for a walkthrough how if you're unfamiliar) and we can work against that.

    - Most searches on the Providers table are done by ZipCode (radius search) or by City/State. If searching by City/State, I lookup the corresponding zipcode and then search on that. Occasional searches on Providers are by LastName.

    That makes sense, then. Always go for your most common approach. You're working around the concern with a zip code list so your city/state searches shouldn't be too bad. You'll still probably end up with scans but that's a weight of upkeep vs. usage value that you'll need to determine downstream, once your data load levels off and you can get usage method details from the logs.

    A) Not sure where the clustered index should be on ZipCodes. Maybe not on Lat/Long?

    This is a question for after reviewing your most common reasons for accessing the ZipCodes table. If you usually attach ZipCodes after searching for stuff as a join to bring along extra data (for example, you only store zip codes in your provider's address tables) then it being the clustered would make sense. If you only usually come to the ZipCodes table to search for ZipCodes by distances, then leave it right where it is.

    B) >> "My guess is if you wrap everything in time controls, though, that's a minimal concern." Can you explain what you mean by "time controls?"

    C) Pre-indexing #TempZip .... not sure how to do that? Create an index on it once the table is created?

    Sorry, example for both. Notice the Print statements and the adjustment in the middle for CREATE TABLE #TempZip:

    --===== Declare some obviously-named local variables

    DECLARE @StartLat FLOAT,

    @StartLon FLOAT,

    @MaxNorthLat FLOAT,

    @MaxSouthLat FLOAT,

    @DegreesPerMile FLOAT

    ;

    -- Timer Tests

    DECLARE @StartTime DATETIME2

    SET @StartTime = GETDATE()

    --===== Precalculate the Lat/Long for the ZipCode we're looking for and then

    -- figure out the Latitude optimization from that using the search radius

    SELECT @DegreesPerMile = 360/24859.82, --24859.82 = Circumference of the Earth

    @StartLat = Latitude,

    @StartLon = Longitude,

    @MaxNorthLat = Latitude + (@radius*@DegreesPerMile),

    @MaxSouthLat = Latitude - (@radius*@DegreesPerMile)

    FROM dbo.ZipCodes

    WHERE ZipCode = @zip;

    PRINT 'Done Determining MaxLatitudes time in ms:' + CONVERT( VARCHAR(30), DATEDIFF( ms, @StartTime, GETDATE()))

    SET @StartTime = GETDATE()

    IF OBJECT_ID('tempdb..#TempZip') IS NOT NULL

    DROP TABLE #TempZip

    CREATE TABLE #TempZip

    (ZipCode INT NOT NULL UNIQUE CLUSTERED,

    Distance NUMERIC (38,20) NOT NULL -- Or whatever the datatype should be

    )

    /*Alternate Create Table:

    CREATE TABLE #TempZip

    (ZipCode INT NOT NULL,

    Distance NUMERIC (38,20) NOT NULL -- Or whatever the datatype should be

    )

    CREATE UNIQUE CLUSTERED INDEX idx_TempZip ON #TempZip

    (ZipCode)

    */

    --========== done precalculating

    PRINT 'Create table #TempZip time in ms:' + CONVERT( VARCHAR(30), DATEDIFF( ms, @StartTime, GETDATE()))

    SET @StartTime = GETDATE()

    -- Create CTE

    WITH

    cteFindDistances AS

    ( --=== Find the optimized band of ZipCodes and calculate their distance using the hi-speed iTVF. - This code thanks to Jeff Moden

    SELECT d.Distance, z.ZipCode, z.State

    FROM dbo.ZipCodes z

    CROSS APPLY dbo.fnDistanceITVF(@StartLat, @StartLon, z.Latitude, z.Longitude) d

    WHERE z.Latitude >= @MaxSouthLat

    AND z.Latitude <= @MaxNorthLat

    ) --=== Return only those within the desired radius

    -- Use the CTE

    INSERT INTO #TempZip

    SELECT ZipCode, Distance

    FROM cteFindDistances

    WHERE Distance <= @radius

    PRINT 'DataLoad into #TempZip time in ms:' + CONVERT( VARCHAR(30), DATEDIFF( ms, @StartTime, GETDATE()))

    SET @StartTime = GETDATE()

    SELECT * FROM

    (

    SELECT

    TotalRows = COUNT(*) OVER(),

    RowNum = ROW_NUMBER()OVER (ORDER BY T.Distance),

    P.ProviderID, P.Title, P.Company, P.City, P.State, DistanceInMiles = T.Distance

    FROM dbo.Providers P WITH (NOLOCK)

    INNER JOIN #TempZip T ON T.ZipCode = P.Zip

    )

    AS XYZ

    WHERE RowNum BETWEEN @startRow AND (@startRow + @rowsPerPage) - 1

    ORDER BY RowNum ASC

    PRINT 'DataReturn time in ms:' + CONVERT( VARCHAR(30), DATEDIFF( ms, @StartTime, GETDATE()))

    SET @StartTime = GETDATE()

    DROP TABLE #TempZip

    EDIT: Forgot the VARCHAR(30) converts


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • This is immensely helpful. Thanks a lot Craig. - Matt

  • No problem. If you get stuck after doing some experimentation just come on back and we'll help you with whatever specific point is causing you a headache.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply