May 15, 2014 at 2:21 am
I'm trying to lookup the country of an ip address. Therefor I'm using the table IP2COUNTRY_LOOKUP.
The execution plan shows a Clustered Index Scan (Cost 24%) and a Lazy Spool (Cost 74%). The ecution plan and query are based on 245391 records in the SECURITY_EVENTS_STG_TEST and 107136 records in the IG_ODS.dbo.IP2COUNTRY_LOOKUP table. Total execution time for the query is 30sec.
Is there anything I can change to get better performance?
My first version of the query was using an INNER JOIN from SECURITY_EVENTS_STG_TEST to IG_ODS.dbo.IP2COUNTRY_LOOKUP using a BETWEEN operator. This was even slower!
Thank you in advance!
Query to lookup countries by a security event based on the source and destination ip address
SELECT
(SELECT COUNTRY_CODE_ALPHA2
FROM IG_ODS.dbo.IP2COUNTRY_LOOKUP C
WHERE START = (SELECT max(START)
FROM IG_ODS.DBO.IP2COUNTRY_LOOKUP
WHERE START <= E.SOURCE_ADDRESS_INT)
AND "END" >= E.SOURCE_ADDRESS_INT) AS SRC_COUNTRY_CODE_ALPHA2,
E.*
FROM SECURITY_EVENTS_STG_TEST E
Index configuration
SECURITY_EVENTS_STG_TEST (event_id is primary key)
event_id (clustered)
source_address_int (Non-Unique, Non-Clustered)
IP2COUNTRY_LOOKUP (start and end are primary key)
start, end (clustered)
source_address_int (Non-Unique, Non-Clustered)
System specs:
2 Cores
16GB memory
MS SQL Server 2012
MS Windows Server 2008 R2
May 15, 2014 at 2:45 am
Please post:
1) Table creation scripts
2) Index creation scripts
3) Actual execution plan attached as .sqlplan file (not jpg image)
4) Some sample data (possibly same amount of data you have in production, obfuscated if needed)
-- Gianluca Sartori
May 15, 2014 at 2:57 am
CREATE TABLE [dbo].[IP2COUNTRY_LOOKUP](
[START] [bigint] NOT NULL,
[END] [bigint] NOT NULL,
[COUNTRY_CODE_ALPHA2] [varchar](2) NULL,
CONSTRAINT [PK_IP2COUNTRY_LOOKUP] PRIMARY KEY NONCLUSTERED
(
[START] ASC,
[END] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
CREATE UNIQUE CLUSTERED INDEX [U1_IP2COUNTRY_LOOKUP] ON [dbo].[IP2COUNTRY_LOOKUP]
(
[START] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE UNIQUE NONCLUSTERED INDEX [U2_IP2COUNTRY_LOOKUP] ON [dbo].[IP2COUNTRY_LOOKUP]
(
[END] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE TABLE [SECURITY_EVENTS_STG_TEST](
[EVENT_ID] [nvarchar](32) NOT NULL,
[DATE] [datetime2](7) NULL,
[SOURCE_ADDRESS] [nvarchar](15) NULL,
[SOURCE_ADDRESS_INT] [bigint] NULL,
[SOURCE_PORT] [int] NULL,
[DESTINATION_ADDRESS] [nvarchar](15) NULL,
[DESTINATION_ADDRESS_INT] [bigint] NULL,
[DESTINATION_PORT] [int] NULL,
[ASSET_SOURCE] [int] NULL,
[ASSET_DESTINATION] [int] NULL,
[PRIORITY] [int] NULL,
[RELIABILITY] [int] NULL,
[RISK_ATTACK] [int] NULL,
[RISK_COMPROMISE] [int] NULL,
[AIO_NAME] [nvarchar](64) NULL,
[SENSOR_ID] [nvarchar](32) NULL,
[SENSOR_NAME] [nvarchar](64) NULL,
[SENSOR_DESCRIPTION] [nvarchar](255) NULL,
[SENSOR_IP] [nvarchar](15) NULL,
[PLUGIN_ID] [int] NULL,
[PLUGIN_NAME] [nvarchar](100) NULL,
[SIGNATURE_ID] [int] NULL,
[SIGNATURE_NAME] [nvarchar](512) NULL,
[SIGNATURE_RELIABILITY] [int] NULL,
[SIGNATURE_PRIORITY] [int] NULL,
[SIGNATURE_CATEGORY] [nvarchar](100) NULL,
[SIGNATURE_SUBCATEGORY] [nvarchar](100) NULL,
[SIGNATURE_CLASSIFICATION] [nvarchar](100) NULL,
[SIGNATURE_PRODUCT_TYPE] [nvarchar](100) NULL,
[PROTOCOL_NAME] [nvarchar](24) NULL,
[PROTOCOL_DESCRIPTION] [nvarchar](255) NULL,
[USERDATA1] [nvarchar](1024) NULL,
[USERDATA2] [nvarchar](1024) NULL,
[USERDATA3] [nvarchar](1024) NULL,
[USERDATA4] [nvarchar](1024) NULL,
[USERDATA5] [nvarchar](1024) NULL,
[USERDATA6] [nvarchar](1024) NULL,
[USERDATA7] [nvarchar](1024) NULL,
[USERDATA8] [nvarchar](1024) NULL,
[USERDATA9] [nvarchar](1024) NULL,
[REPUTATION_ADDRESS_SOURCE] [nvarchar](15) NULL,
[REPUTATION_PRIORITY_SOURCE] [int] NULL,
[REPUTATION_RELIABILITY_SOURCE] [int] NULL,
[REPUTATION_ACTIVITY_SOURCE] [nvarchar](64) NULL,
[REPUTATATION_ADDRESS_DESTINATION] [nvarchar](15) NULL,
[REPUTATION_PRIORITY_DESTINATION] [int] NULL,
[REPUTATION_RELIABILITY_DESTINATION] [int] NULL,
[REPUTATION_ACTIVITY_DESTINATION] [nvarchar](64) NULL,
CONSTRAINT [PK_SECURITY_EVENTS_STG_TEST] PRIMARY KEY CLUSTERED
(
[EVENT_ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [U1_SECURITY_EVENTS_STG_TEST] ON [SECURITY_EVENTS_STG_TEST]
(
[SOURCE_ADDRESS_INT] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
May 15, 2014 at 3:45 am
You cannot avoid the scan: you're asking for the whole SECURITY_EVENTS_STG_TEST table (no filter predicates).
As far as the index spool is concerned, I have no idea what it's trying to do and why it appears like that.
In my laptop that operator is substituted by an "Assert" operator.
However, you can get rid of that and the stream aggregate using this syntax:
SELECT (
SELECT TOP(1) COUNTRY_CODE_ALPHA2
FROM dbo.IP2COUNTRY_LOOKUP C
WHERE START = (
SELECT TOP(1) START
FROM DBO.IP2COUNTRY_LOOKUP
WHERE START <= E.SOURCE_ADDRESS_INT
ORDER BY START DESC
)
AND [END] >= E.SOURCE_ADDRESS_INT
) AS SRC_COUNTRY_CODE_ALPHA2,
E.*
FROM SECURITY_EVENTS_STG_TEST E
Executes in 2.8 seconds on my laptop (discarding results).
Hope this helps
Gianluca
-- Gianluca Sartori
May 15, 2014 at 4:13 am
spaghettidba (5/15/2014)
You cannot avoid the scan: you're asking for the whole SECURITY_EVENTS_STG_TEST table (no filter predicates).As far as the index spool is concerned, I have no idea what it's trying to do and why it appears like that.
In my laptop that operator is substituted by an "Assert" operator.
However, you can get rid of that and the stream aggregate using this syntax:
SELECT (
SELECT TOP(1) COUNTRY_CODE_ALPHA2
FROM dbo.IP2COUNTRY_LOOKUP C
WHERE START = (
SELECT TOP(1) START
FROM DBO.IP2COUNTRY_LOOKUP
WHERE START <= E.SOURCE_ADDRESS_INT
ORDER BY START DESC
)
AND [END] >= E.SOURCE_ADDRESS_INT
) AS SRC_COUNTRY_CODE_ALPHA2,
E.*
FROM SECURITY_EVENTS_STG_TEST E
Executes in 2.8 seconds on my laptop (discarding results).
Hope this helps
Gianluca
I wonder if that is functionally equivalent to this:
SELECT
SRC_COUNTRY_CODE_ALPHA2 = x.COUNTRY_CODE_ALPHA2,
e.*
FROM ONS_AIO_1.SECURITY_EVENTS_STG_TEST e
CROSS APPLY (
SELECT
c.COUNTRY_CODE_ALPHA2,
rn = ROW_NUMBER() OVER(ORDER BY c.[START] DESC)
FROM IG_ODS.dbo.IP2COUNTRY_LOOKUP c
WHERE e.SOURCE_ADDRESS_INT BETWEEN c.[START] AND c.[END]
) x
WHERE x.rn = 1
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
May 15, 2014 at 4:15 am
Thanks for the quick reply. Your query is slightly quicker, it takes between 15 and 20 seconds on my dataset...
Are there any other query construction to lookup the country for an ip address?
May 15, 2014 at 4:25 am
ChrisM@Work (5/15/2014)
I wonder if that is functionally equivalent to this:
SELECT
SRC_COUNTRY_CODE_ALPHA2 = x.COUNTRY_CODE_ALPHA2,
e.*
FROM ONS_AIO_1.SECURITY_EVENTS_STG_TEST e
CROSS APPLY (
SELECT
c.COUNTRY_CODE_ALPHA2,
rn = ROW_NUMBER() OVER(ORDER BY c.[START] DESC)
FROM IG_ODS.dbo.IP2COUNTRY_LOOKUP c
WHERE e.SOURCE_ADDRESS_INT BETWEEN c.[START] AND c.[END]
) x
WHERE x.rn = 1
Should be equivalent.
This should also be semantically identical. Sometimes TOP(1) works faster than WHERE ROW_NUMBER() = 1
SELECT
SRC_COUNTRY_CODE_ALPHA2 = x.COUNTRY_CODE_ALPHA2,
e.*
FROM ONS_AIO_1.SECURITY_EVENTS_STG_TEST e
CROSS APPLY (
SELECT TOP(1)
c.COUNTRY_CODE_ALPHA2,
FROM IG_ODS.dbo.IP2COUNTRY_LOOKUP c
WHERE e.SOURCE_ADDRESS_INT BETWEEN c.[START] AND c.[END]
ORDER BY c.[START] DESC
) x
-- Gianluca Sartori
May 15, 2014 at 4:27 am
bas_vdl (5/15/2014)
Thanks for the quick reply. Your query is slightly quicker, it takes between 15 and 20 seconds on my dataset...Are there any other query construction to lookup the country for an ip address?
SELECT
SRC_COUNTRY_CODE_ALPHA2 = x.COUNTRY_CODE_ALPHA2,
e.*
FROM ONS_AIO_1.SECURITY_EVENTS_STG_TEST e
CROSS APPLY (
SELECT TOP 1 c.COUNTRY_CODE_ALPHA2
FROM IG_ODS.dbo.IP2COUNTRY_LOOKUP c
WHERE e.SOURCE_ADDRESS_INT BETWEEN c.[START] AND c.[END]
ORDER BY c.[START] DESC
) x
Which Gianluca has already posted.
Change the clustered index: add column END.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
May 15, 2014 at 4:27 am
bas_vdl (5/15/2014)
it takes between 15 and 20 seconds on my dataset...
Discard your results: transferring and formatting results is part of the execution time and, since you're returning a big data set, it can hide performance improvements.
That part of the execution time is not shrinkable.
-- Gianluca Sartori
May 15, 2014 at 5:13 am
spaghettidba (5/15/2014)
You cannot avoid the scan: you're asking for the whole SECURITY_EVENTS_STG_TEST table (no filter predicates).As far as the index spool is concerned, I have no idea what it's trying to do and why it appears like that.
In my laptop that operator is substituted by an "Assert" operator.
However, you can get rid of that and the stream aggregate using this syntax:
SELECT (
SELECT TOP(1) COUNTRY_CODE_ALPHA2
FROM dbo.IP2COUNTRY_LOOKUP C
WHERE START = (
SELECT TOP(1) START
FROM DBO.IP2COUNTRY_LOOKUP
WHERE START <= E.SOURCE_ADDRESS_INT
ORDER BY START DESC
)
AND [END] >= E.SOURCE_ADDRESS_INT
) AS SRC_COUNTRY_CODE_ALPHA2,
E.*
FROM SECURITY_EVENTS_STG_TEST E
Executes in 2.8 seconds on my laptop (discarding results).
Hope this helps
Gianluca
This one is still the fastest. The other suggestions using CROSS APPLY takes around 3 minutes on the same data set.
May 15, 2014 at 5:16 am
bas_vdl (5/15/2014)
spaghettidba (5/15/2014)
You cannot avoid the scan: you're asking for the whole SECURITY_EVENTS_STG_TEST table (no filter predicates).As far as the index spool is concerned, I have no idea what it's trying to do and why it appears like that.
In my laptop that operator is substituted by an "Assert" operator.
However, you can get rid of that and the stream aggregate using this syntax:
SELECT (
SELECT TOP(1) COUNTRY_CODE_ALPHA2
FROM dbo.IP2COUNTRY_LOOKUP C
WHERE START = (
SELECT TOP(1) START
FROM DBO.IP2COUNTRY_LOOKUP
WHERE START <= E.SOURCE_ADDRESS_INT
ORDER BY START DESC
)
AND [END] >= E.SOURCE_ADDRESS_INT
) AS SRC_COUNTRY_CODE_ALPHA2,
E.*
FROM SECURITY_EVENTS_STG_TEST E
Executes in 2.8 seconds on my laptop (discarding results).
Hope this helps
Gianluca
This one is still the fastest. The other suggestions using CROSS APPLY takes around 3 minutes on the same data set.
Change the clustered index: add column END.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
May 15, 2014 at 5:37 am
ChrisM@Work (5/15/2014)
bas_vdl (5/15/2014)
Thanks for the quick reply. Your query is slightly quicker, it takes between 15 and 20 seconds on my dataset...Are there any other query construction to lookup the country for an ip address?
SELECT
SRC_COUNTRY_CODE_ALPHA2 = x.COUNTRY_CODE_ALPHA2,
e.*
FROM ONS_AIO_1.SECURITY_EVENTS_STG_TEST e
CROSS APPLY (
SELECT TOP 1 c.COUNTRY_CODE_ALPHA2
FROM IG_ODS.dbo.IP2COUNTRY_LOOKUP c
WHERE e.SOURCE_ADDRESS_INT BETWEEN c.[START] AND c.[END]
ORDER BY c.[START] DESC
) x
Which Gianluca has already posted.
Change the clustered index: add column END.
Hi ChrisM@Work,
Thanks for pointing this out, but I did edit the indexes in the first place when testing Gianluca query. And still it took around 3 minutes.
EDIT: the execution plan looks simpler:
SELECT (Cost 0%) <--- Nested Loops (Cost 78%) <--- Clustered Index Scan (Cost 0%)
<--- Clustered Index Seek (Cost 22%)
Index configuration while running Gianluca query
ALTER TABLE [dbo].[IP2COUNTRY_LOOKUP] ADD CONSTRAINT [PK_IP2COUNTRY_LOOKUP] PRIMARY KEY CLUSTERED
(
[START] ASC,
[END] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE UNIQUE NONCLUSTERED INDEX [U1_IP2COUNTRY_LOOKUP] ON [dbo].[IP2COUNTRY_LOOKUP]
(
[START] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE UNIQUE NONCLUSTERED INDEX [U2_IP2COUNTRY_LOOKUP] ON [dbo].[IP2COUNTRY_LOOKUP]
(
[END] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
May 15, 2014 at 5:40 am
Btw, the time indications I refered to is the time MS SQL Management Studio shows me (bottom right). I'm aware of the fact that displaying all those results in the window takes time as well. So how can i measure the actual query time to get a good comparison?
May 15, 2014 at 5:42 am
As others have said, you can't avoid a scan if you're returning all rows, however, the naming of your tables does suggest there may be a better way.
1) IP to location lookup ranges are usually designed to be non-overlapping, so the top 1 logic shouldn't be required. You can enforce this as either an indexed view or a trigger on insert to the IP2COUNTRY_LOOKUP table if you need, then you can have a more simple lookup:
SELECT C.COUNTRY_CODE_ALPHA2 SRC_COUNTRY_CODE_ALPHA2, E.* FROM SECURITY_EVENTS_STG_TEST E
LEFT JOIN IG_ODS.dbo.IP2COUNTRY_LOOKUP C ON E.SOURCE_ADDRESS_INT BETWEEN C.START AND C.END
2) Events tend to be transactional and you wouldn't expect either the IP or the country of an event to change after the fact. E.g. inserted once, never changed. So if this is the case, why not just persist a column in the events table, populate rows with a country code as they come in and then lookup directly against the table, rather than running a lookup across the whole data set on SELECT?
May 15, 2014 at 5:57 am
HowardW (5/15/2014)
As others have said, you can't avoid a scan if you're returning all rows, however, the naming of your tables does suggest there may be a better way.1) IP to location lookup ranges are usually designed to be non-overlapping, so the top 1 logic shouldn't be required. You can enforce this as either an indexed view or a trigger on insert to the IP2COUNTRY_LOOKUP table if you need, then you can have a more simple lookup:
SELECT C.COUNTRY_CODE_ALPHA2 SRC_COUNTRY_CODE_ALPHA2, E.* FROM SECURITY_EVENTS_STG_TEST E
LEFT JOIN IG_ODS.dbo.IP2COUNTRY_LOOKUP C ON E.SOURCE_ADDRESS_INT BETWEEN C.START AND C.END
2) Events tend to be transactional and you wouldn't expect either the IP or the country of an event to change after the fact. E.g. inserted once, never changed. So if this is the case, why not just persist a column in the events table, populate rows with a country code as they come in and then lookup directly against the table, rather than running a lookup across the whole data set on SELECT?
Hi HowardW,
Thanks for your input. You're right about the fact that the ip ranges are not overlapping. The query you suggest is very slow. In my first post I refered tot this tyope of query and the performance are really poor (+4 min. on the current data set)
You second point is a good one. We are running SAP Business Objects. To get the records from the source system to our data warehouse I've build a work flow within SAP Data Services. I started by caching the IP2COUNTRY_LOOKUP table in memory and for every records I received from the source I tried to lookup the SRC_COUNTRY_CODE_ALPHA2. Poor SAP Data Services couldn't handle the lookup with a BETWEEN operator. Performance where so poor that I went back to an old school query.
Viewing 15 posts - 1 through 15 (of 22 total)
You must be logged in to reply to this topic. Login to reply