Query performance (Avoid Cluster index scan and Lazy spool)

  • bas_vdl (5/15/2014)


    So how can i measure the actual query time to get a good comparison?

    Pipe the results to a temporary table:

    IF OBJECT_ID('tempdb..#discard') IS NOT NULL

    DROP TABLE #discard

    GO

    SELECT (

    SELECT TOP(1) COUNTRY_CODE_ALPHA2

    FROM dbo.IP2COUNTRY_LOOKUP C

    WHERE START = (

    SELECT TOP(1) START

    FROM DBO.IP2COUNTRY_LOOKUP

    WHERE START <= E.SOURCE_ADDRESS_INT

    ORDER BY START DESC

    )

    AND [END] >= E.SOURCE_ADDRESS_INT

    ) AS SRC_COUNTRY_CODE_ALPHA2,

    E.*

    INTO #discard

    FROM SECURITY_EVENTS_STG_TEST E

    -- Gianluca Sartori

  • bas_vdl (5/15/2014)


    Hi HowardW,

    Thanks for your input. You're right about the fact that the ip ranges are not overlapping. The query you suggest is very slow. In my first post I refered tot this tyope of query and the performance are really poor (+4 min. on the current data set)

    If you change your clustered index on IP2COUNTRY_LOOKUP to START ASC,END ASC, does this improve?

    bas_vdl (5/15/2014)


    You second point is a good one. We are running SAP Business Objects. To get the records from the source system to our data warehouse I've build a work flow within SAP Data Services. I started by caching the IP2COUNTRY_LOOKUP table in memory and for every records I received from the source I tried to lookup the SRC_COUNTRY_CODE_ALPHA2. Poor SAP Data Services couldn't handle the lookup with a BETWEEN operator. Performance where so poor that I went back to an old school query.

    Ultimately, you've ended up with the two tables in your Data Warehouse, from which you can derive the data that you need.

    Could you do an old fashioned update in SQL rather than using the ETL tool or add a trigger to the SECURITY_EVENTS_STG_TEST table that populates this in T-SQL on insert?

  • HowardW (5/15/2014)


    bas_vdl (5/15/2014)


    Hi HowardW,

    Thanks for your input. You're right about the fact that the ip ranges are not overlapping. The query you suggest is very slow. In my first post I refered tot this tyope of query and the performance are really poor (+4 min. on the current data set)

    If you change your clustered index on IP2COUNTRY_LOOKUP to START ASC,END ASC, does this improve?

    I had a Clustered index on START and END both where ASC.

    Using spaghettidba's trick to measure the time, it still took +4 min. Compare this to the 9sec using the TOP(1) query...

  • bas_vdl (5/15/2014)


    HowardW (5/15/2014)


    bas_vdl (5/15/2014)


    Hi HowardW,

    Thanks for your input. You're right about the fact that the ip ranges are not overlapping. The query you suggest is very slow. In my first post I refered tot this tyope of query and the performance are really poor (+4 min. on the current data set)

    If you change your clustered index on IP2COUNTRY_LOOKUP to START ASC,END ASC, does this improve?

    I had a Clustered index on START and END both where ASC.

    Using spaghettidba's trick to measure the time, it still took +4 min. Compare this to the 9sec using the TOP(1) query...

    The DDL you posted has a clustered index on just START. It has a primary key on Start, End, which is not the same thing. Just to clarify, are you saying you've tried it with different DDL to what you've posted and it's still 4 minutes?

  • HowardW (5/15/2014)


    bas_vdl (5/15/2014)


    HowardW (5/15/2014)


    bas_vdl (5/15/2014)


    Hi HowardW,

    Thanks for your input. You're right about the fact that the ip ranges are not overlapping. The query you suggest is very slow. In my first post I refered tot this tyope of query and the performance are really poor (+4 min. on the current data set)

    If you change your clustered index on IP2COUNTRY_LOOKUP to START ASC,END ASC, does this improve?

    I had a Clustered index on START and END both where ASC.

    Using spaghettidba's trick to measure the time, it still took +4 min. Compare this to the 9sec using the TOP(1) query...

    The DDL you posted has a clustered index on just START. It has a primary key on Start, End, which is not the same thing. Just to clarify, are you saying you've tried it with different DDL to what you've posted and it's still 4 minutes?

    I changed the indexes as I mentioned in post http://www.sqlservercentral.com/Forums/FindPost1571260.aspx. It sounds strange to me too that a JOIN with a BETWEEN operator is so slow...

    Index

    ALTER TABLE [dbo].[IP2COUNTRY_LOOKUP] ADD CONSTRAINT [PK_IP2COUNTRY_LOOKUP] PRIMARY KEY CLUSTERED

    (

    [START] ASC,

    [END] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

  • set statistics io, time on

    PRINT 'Original query'

    SELECT

    (SELECT COUNTRY_CODE_ALPHA2

    FROM dbo.IP2COUNTRY_LOOKUP C

    WHERE START = (SELECT max(START)

    FROM DBO.IP2COUNTRY_LOOKUP

    WHERE START <= E.SOURCE_ADDRESS_INT)

    AND "END" >= E.SOURCE_ADDRESS_INT) AS SRC_COUNTRY_CODE_ALPHA2,

    E.*

    INTO #Temp1

    FROM SECURITY_EVENTS_STG_TEST E

    print '====================================================================================='

    PRINT 'Possible replacement query'

    SELECT

    SRC_COUNTRY_CODE_ALPHA2 = x.COUNTRY_CODE_ALPHA2,

    e.*

    INTO #Temp2

    FROM SECURITY_EVENTS_STG_TEST e

    CROSS APPLY (

    SELECT TOP 1 c.COUNTRY_CODE_ALPHA2

    FROM dbo.IP2COUNTRY_LOOKUP c

    WHERE e.SOURCE_ADDRESS_INT BETWEEN c.[START]

    AND c.[END]

    ORDER BY c.[START] DESC

    ) x

    set statistics io, time off

    /*

    Original query

    Table 'IP2COUNTRY_LOOKUP'. Scan count 490782, logical reads 1474199, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'SECURITY_EVENTS_STG_TEST'. Scan count 5, logical reads 583, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    (245391 row(s) affected)

    (1 row(s) affected)

    SQL Server Execution Times:

    CPU time = 5693 ms, elapsed time = 1652 ms.

    =====================================================================================

    Possible replacement query

    Table 'IP2COUNTRY_LOOKUP'. Scan count 245391, logical reads 782283, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'SECURITY_EVENTS_STG_TEST'. Scan count 5, logical reads 583, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    (245391 row(s) affected)

    (1 row(s) affected)

    SQL Server Execution Times:

    CPU time = 1904 ms, elapsed time = 621 ms.

    */

    This is with a unique clustered index on start and end.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • This query

    SELECT COUNTRY_CODE_ALPHA2,

    e.*

    into #temp3

    FROM SECURITY_EVENTS_STG_TEST e

    INNER JOIN dbo.IP2COUNTRY_LOOKUP c

    ON e.SOURCE_ADDRESS_INT BETWEEN c.[START] AND c.[END]

    With a covering index on SECURITY_EVENTS_STG_TEST.SOURCE_ADDRESS_INT

    Results in seeks with no (relatively expensive) residual predicate:

    Seek Keys[1]:

    Start: [dbo].[SECURITY_EVENTS_STG_TEST].SOURCE_ADDRESS_INT >= Scalar Operator([dbo].[IP2COUNTRY_LOOKUP].[START] as [c].[START]),

    End: [dbo].[SECURITY_EVENTS_STG_TEST].SOURCE_ADDRESS_INT <= Scalar Operator([dbo].[IP2COUNTRY_LOOKUP].[END] as [c].[END])

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Viewing 7 posts - 16 through 22 (of 22 total)

You must be logged in to reply to this topic. Login to reply