Query really slow with eager spool

  • Hi,
    I have a query which become slower as the number of item in the temp table 
    increase.
    Here's my code:
    CREATE table #regions (
     ipFloat float null,
     countryId int null,
    )
    insert into #regions (ipFloat) select distinct top 100 ipFloat from tbl_logs
    -- This is where the code gets really slow!
    update #regions set countryId = (select countryId from tbl_ip2countries 
    where ipFloat between ipFrom and ipTo)
    select * from #regions
    drop table #regions
    What it does is it takes some ip converted in a float number and look for 
    its associated countryId in a table. The Ip2countries table has more than 2 
    000 000 records. There's a clustered index on the primary key which is ipTo 
    and ipFrom (both float).
    If I take only 10 ip (select distinct top 10 ipFloat from tbl_logs), the 
    query executes in less than 1 second.
    If I take up to 100 ip, it takes more than 45 secondes. And in this case, I 
    have a index spool (eager spool) that consumes 83%. By the way, in the 
    execution plan, the arrow from the clustered index shows about 2 000 000 
    actual number of rows. But the arrow after the eager spool shows an actual 
    number of rows of more than 80 000 000! Is this normal? There's a filter 
    after that reduces the number of rows to only 100...
    If I take 1000 ip, it takes several minutes.
    Is there a way to reduces the eager spool or to reduce the number of 80 000 
    000 of actual rows? I don't understand what's going on here and why this is 
    so slow for few hundreds of rows.
    Any idea?
    Thanks
    Stephane
  • Can you edit your post so that it's more readable... I'm not sure you'll get much attention from anyone with that question.

  • Thanks! I didn't notice... I made a copy and paste from word?!

  • That's what I thaught.  Thanx for the quick fix .

  • Can you show us the some sample data and the required output from the update query... I'm sure there's a way to avoid the spool altogether.

  • I just did a new test using a cursor this time.

    CREATE table #regions (

     ipFloat float null,

     countryId int null)

    insert into #regions (ipFloat) select distinct top 500 ipFloat from tbl_logs

    declare @i float, @id int

    DECLARE cSE CURSOR READ_ONLY  

    FOR select ipFLoat from #regions

    OPEN cSE

    FETCH NEXT FROM cSE INTO @i

    WHILE @@FETCH_STATUS = 0

    BEGIN

     select @id = countryId from tbl_ip2countries where @i between ipFrom and ipTo

     update #regions set countryId = @id where ipFloat = @i

    FETCH NEXT FROM cSE INTO @i

    END

    CLOSE cSE

    DEALLOCATE cSE

    select * from #regions

    drop table #regions

    The results are quite surprising! For 100 ips, it took only 6 seconds instead of minutes.

    But I'm sure there's a way to get this done quickly whitout using a cursor. I just can't find it out.

    Thanks for any idea,

    Stephane

  • Stephane

    Your syntax doesn't look quite right.  I don't see how your update statement can work.  How about this (not tested, obviously):

    update r

    set countryId = c.countryId

    from #regions r join tbl_ip2countries c

    on r.ipFloat = c.ipFloat

    where r.ipFloat between c.ipFrom and c.ipTo

    John

  • The problem with that update is that I don't have the exact IP converted in float in tbl_ip2countries. I mean, I have only a range so Canada is assigned to 123.123.123.0 to 123.123.123.256. So I can't join on r.ipFloat = c.ipFloat.

    Thanks!

  • I have ip converted in float like this:

    1362268423

    3476946705

    1179659265

    1179726082

    3234155789

    3254497794

    3489515534

    3567080704

    1390476300

    1390490892

    1390832915

    1415938309

    1456668932

    3258127372

    3569568532

    3573210393

    I have to check in a lookup table (containing more than 2M rows) set up like this:

    ipFrom                 ipTo                   countryId

    ---------------------- ---------------------- -----------

    0                      33996343               95

    33996344               33996351               219

    33996352               50331647               95

    50331648               50331903               220

    50331904               50332159               220

    50332160               50332671               220

    50332672               50332927               220

    50332928               50333695               220

    50333696               50333951               220

    50333952               50334719               220

    So the #regions would look like this after:

    ipFloat                countryId

    ---------------------- -----------

    1362268423             70

    3476946705             38

    1179659265             38

    1179726082             38

    3234155789             38

    3254497794             70

    3489515534             38

    3567080704             199

    1390476300             70

    1390490892             70

    thanks

    Stephane

  • Stephane

    I still don't understand.  The sample data you gave shows ipFrom and ipTo having 8 digits, and ipFloat having 9.  So how can ipFloat ever be between ipFrom and ipTo?  Also, how do you convert an IP address into this format?  For instance, to use the Canada example, what do 123.123.123.0 and 123.123.123.256 become?  (By the way, the latter isn't a valid IP address.  I assume this is a typo?)

    John

  • Hi,

    Thanks for your fast answer!

    The ip is converted this way: For example, If the IP address 161.132.13.1, then the IP number is 2709785857.

    IP Number, X = 161 x (256*256*256) + 132 x (256*256) + 13 x (256) + 1 = 2709785857.

    You can have more info here: http://www.ip2location.com/docs/IP2Location_IP_Country_Region_City_Specification.pdf

    So it can have 8 or 9 digits.

    Sorry for the 256, it's obviously 255!

    Thanks

    Stephane

  • Stephane

    OK, that clears up the IP conversion algorithm.  I still don't think that your sample data from tbl_ip2countries and from #regions tally up with each other, but let's not worry about that for now.

    Try this:

    update r

    set countryId = c.countryId

    from #regions r join tbl_ip2countries c

    on r.ipFloat between c.ipFrom and c.ipTo

    Not sure how it would perform, but much better than a cursor, I should think.  You could experiment with using >= and <= operators instead of the BETWEEN function to see if that goes any faster.

    John

  • hum...

    That's really weird: For 10 ip, it takes 7-8 seconds. For 100, it takes about a minute which is slower than my problem at the beginning.

    BUT!

    If I try with 40 or 50 ip, it executes in a flash: 1 second!

    The difference is in uses table spool (lazy spool) in the first case but only clustered index seek in the second case.

    What the ...???

    By the way, in the prod environment, it's going to be up to 1000 ip.

    Stephane

  • I really don't get it...

    10-20-30 ip uses the lazy spool so it takes several seconds.

    40-50-60 ip uses the clustered index seek so it executes in a flash.

    70 and more also uses the lazy spool so it takes more than a minute.

    Is there a way to force the use of the index instead of the costly table spool?

    By the way, it's a clustered index SCAN with the table spool instead of a clustered index seek. What's the difference?

    Thanks

    Stephane

  • SCAN  : reads all rows.

    SEEK :  binary search algorithm to find only the required row.

     

    Could you reindex and / or update the statistics on the table.

     

    If that doesn't help you could always use index and join hints (last resort).

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply