Correct index strategy for BETWEEN

  • Two tables:

    GeoIpLocation (Id, City, Country, Latitude, Longitude; PK: Id; 2m rows)

    GeoIpBlock (IpFrom, IpTo, GeoIpLocation_Id; PK: IdFrom, IdTo; 100k rows)

    Simple concept: get location for given IP. So:

    SELECTTOP 1

    GeoIpLocation.Country,

    GeoIpLocation.City

    FROMGeoIPLocation

    INNER JOIN GeoIPBlock

    ON GeoIPBlock.GeoIpLocation_Id = GeoIPLocation.Id

    WHERE@nIpNumber BETWEEN GeoIPBlock.IpFrom AND GeoIPBlock.IpTo

    Result: disaster.

    The between operator uses the index on PK on GeoIpBlock, which results on half the table for first part of between and half the table for the second part of between. A bit better query is with FORCESCAN on GeoIPBlock, but it still runs slow, as it scans a lot of records.

    Question: Is there a better way to index this kind of data?

  • It would be good to post the table and current index creation scripts.

    This may not be the final solution but consider working on the query in two parts.

    SELECT TOP 1 @ID = GeoIpLocation_Id FROM GeoIPBlock WHERE @nIpNumber BETWEEN GeoIPBlock.IpFrom AND GeoIPBlock.IpTo

    SELECTTOP 1

    GeoIpLocation.Country,

    GeoIpLocation.City

    FROMGeoIPLocation

    WHERE GeoIPLocation.Id = @ID

    The first part looks like it will result in a table scan but if you alter it and there is an index on IPfrom thenyou may be able to get closer

    WHERE GeoIPBlock.IpFrom >= @nIpNumber AND GeoIPBlock.IpTo <= @nIpNumber

    Try create a non clustered index on IPFrom and IpTo and enure the ID column is in that index. That way it should see to the IPFrom, scan until it hits the first row meeting the IPTo and grab the ID, all in the one index structure.

    CREATE NONCLUSTERED INDEX IPFromTo ON GeoIPBlock(IpFrom, IPTo, ID)

    The second part of the query should be quick as it's looking for one row based on @ID.

  • Have you tried index on IpTo ?

  • We really need to see the DDL for the table and its indexes.

    Also, can you explain what you mean by this:

    ..which results on half the table for first part of between and half the table for the second part of between

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • Please post at least the execution plan so we can see how the optimizer is resolving the query based on your structures.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Ok, I am sorry; here are the execution plans. The first one is without any hints, the second is with FORCESCAN on GeoIpBlock. FORCE ORDER hint produces the same plan, btw.

    The first query takes on average about 400ms, the second one about 200 ms.

  • Actual rows and Est rows are very different. Are your statistics up to date?

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • Phil Parkin (3/24/2015)


    Actual rows and Est rows are very different. Are your statistics up to date?

    Agreed. We don't have index definitions yet either:

    SELECT TOP 1

    l.fkTblCountry,

    l.sCity

    FROM tblGeoIPLocation l

    INNER JOIN tblGeoIPBlock b

    ON b.fkTblGeoIPLocation = l.pkTblGeoIPLocation

    WHERE @nIpNumber BETWEEN b.nIpFrom AND b.nIpTo

    Try these indexes -

    tblGeoIPLocation: index on pkTblGeoIPLocation include fkTblCountry, sCity

    tblGeoIPBlock: index on nIpFrom, nIpTo, fkTblGeoIPLocation

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Phil Parkin (3/24/2015)


    Actual rows and Est rows are very different. Are your statistics up to date?

    Yeah, estimated 1, actual 87000, that's a slight disparity. No wonder it assumed a scan would work well. It only thought it was returning a single row. I suspect you have auto_updatestats disabled. You might want to turn that on and get some statistics maintenance in place.

    Without the statistics, the optimizer will make very bad choices. Also, the fact that the plan shows a runtime value, but not a compile time value for @nlpNumber, I suspect that's a local variable, not a parameter. Once you have statistics, you may be better off with a parameter so that the optimizer can use specific values to retrieve the data.

    As to your index, I'd make it a compound key on nIpFrom and nIpTo with an INCLUDE of [Alzasoft].[dbo].[tblGeoIPLocation].pkTblGeoIPLocation, [Alzasoft].[dbo].[tblGeoIPLocation].fkTblCountry, [Alzasoft].[dbo].[tblGeoIPLocation].sCity. At least as an experiment to test things out. You may get index intersection on this, so two indexes, one for nIpFrom and one for nIpTo might work better. Testing is your friend.

    I didn't look at the plans with hints. You're going after hints when the statistics and the structure is off. That's not how hints should be used. They should be the last thing tried when we've exhausted all the standard approaches.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Hey Grant, in the Plan1 execution plan there's a NLIJ with "Expr1004" in the Outer References list. Any idea what the "Expr1004" is for and where it was constructed?

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Auto update for statistics is not disabled; they are even manually updated in a nightly maintenance job, however, the query does not want to use the PK index. I guess it has something to do with a fact that it's not a <column> BETWEEN <value1> AND <value2>, but it's <value> BETWEEN <column1> AND <column2>.

  • ChrisM@Work (3/24/2015)


    Hey Grant, in the Plan1 execution plan there's a NLIJ with "Expr1004" in the Outer References list. Any idea what the "Expr1004" is for and where it was constructed?

    I'm not sure. It doesn't appear elsewhere in the plan. I searched the XML.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Jakub.Janda (3/24/2015)


    Auto update for statistics is not disabled; they are even manually updated in a nightly maintenance job, however, the query does not want to use the PK index. I guess it has something to do with a fact that it's not a <column> BETWEEN <value1> AND <value2>, but it's <value> BETWEEN <column1> AND <column2>.

    Looking at the plan again, you'll note that the BETWEEN operation is being satisfied by the [Alzasoft].[dbo].[tblGeoIPBlock].[IX_tblGeoIPBlock_Location] index. The problem is the scan of the other table, tblGeoIPLocation. And that's because the statistics are radically off there. The estimated number of rows is 1. It's scanning 87000. That is the problem. Not the BETWEEN operation or the other index. That part of your structure is working fine. Try UPDATE STATISTICS WITH FULL SCAN on that primary key. See if that changes things. Your statistics are WAY off in there.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Here is an alternative to the between approach but it depends on what the data looks like in GepIPBlock.

    If GepIPBlock.IpFrom and IPTo represent range of ip's or subnets like 123.123.123.1 to 123.123.123.255 then you may be able to take a @nIPNumber like 123.123.123.54, strip the .54 off it and search GEOIpBlock where GepIPBlock.IpFrom LIKE '123.123.123.*' which should use a index seek.

    If GeopIPBlock has multiple rows fro the same subnet 123.123.123.??? then you could strip one character off the @nIPNumber at a time until you hit the row. This will be a small number of seeks but may be quicker than a scan.

  • As per previous post, This test reduces reads to 3 Logical reads.....

    DROP TABLE TEST

    GO

    CREATE TABLE TEST (ID INT IDENTITY(1,1), IPFROM VARCHAR(120), IPTO VARCHAR(120))

    GO

    INSERT INTO TEST(IPFROM) VALUES (1)

    GO -- Add one row to test table

    INSERT INTO TEST(IPFROM) SELECT IPFROM FROM TEST

    GO 15 -- Add loads more rows

    SELECT COUNT(1) FROM TEST

    -- CONVERT Identity Column to something that looks like an IP Range

    UPDATE TEST SET IPFROM = (ID*100)+100000000000, IPTO = (ID*100)+100000000099

    UPDATE TEST SET IPFROM = SUBSTRING(Ipfrom,1,3)+'.'+SUBSTRING(ipfrom,4,3)+'.'+SUBSTRING(ipfrom,7,3)+'.'+SUBSTRING(ipfrom,10,3)

    UPDATE TEST SET IPTO = SUBSTRING(Ipto,1,3)+'.'+SUBSTRING(ipto,4,3)+'.'+SUBSTRING(ipto,7,3)+'.'+SUBSTRING(ipto,10,3)

    SELECT TOP 1000 * FROM TEST ORDER BY ID DESC

    CREATE NONCLUSTERED INDEX TEST_IPFROM ON TEST(ipfrom, ipto, id)

    set statistics io on

    select * from test where '100.003.276.001' BETWEEN ipfrom and ipto

    select * from test where ipfrom LIKE '100.003.276.%' AND '100.003.276.001' BETWEEN ipfrom and ipto

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply