May 15, 2014 at 6:47 am
bas_vdl (5/15/2014)
So how can i measure the actual query time to get a good comparison?
Pipe the results to a temporary table:
IF OBJECT_ID('tempdb..#discard') IS NOT NULL
DROP TABLE #discard
GO
SELECT (
SELECT TOP(1) COUNTRY_CODE_ALPHA2
FROM dbo.IP2COUNTRY_LOOKUP C
WHERE START = (
SELECT TOP(1) START
FROM DBO.IP2COUNTRY_LOOKUP
WHERE START <= E.SOURCE_ADDRESS_INT
ORDER BY START DESC
)
AND [END] >= E.SOURCE_ADDRESS_INT
) AS SRC_COUNTRY_CODE_ALPHA2,
E.*
INTO #discard
FROM SECURITY_EVENTS_STG_TEST E
-- Gianluca Sartori
May 15, 2014 at 6:52 am
bas_vdl (5/15/2014)
Hi HowardW,Thanks for your input. You're right about the fact that the ip ranges are not overlapping. The query you suggest is very slow. In my first post I refered tot this tyope of query and the performance are really poor (+4 min. on the current data set)
If you change your clustered index on IP2COUNTRY_LOOKUP to START ASC,END ASC, does this improve?
bas_vdl (5/15/2014)
You second point is a good one. We are running SAP Business Objects. To get the records from the source system to our data warehouse I've build a work flow within SAP Data Services. I started by caching the IP2COUNTRY_LOOKUP table in memory and for every records I received from the source I tried to lookup the SRC_COUNTRY_CODE_ALPHA2. Poor SAP Data Services couldn't handle the lookup with a BETWEEN operator. Performance where so poor that I went back to an old school query.
Ultimately, you've ended up with the two tables in your Data Warehouse, from which you can derive the data that you need.
Could you do an old fashioned update in SQL rather than using the ETL tool or add a trigger to the SECURITY_EVENTS_STG_TEST table that populates this in T-SQL on insert?
May 15, 2014 at 7:03 am
HowardW (5/15/2014)
bas_vdl (5/15/2014)
Hi HowardW,Thanks for your input. You're right about the fact that the ip ranges are not overlapping. The query you suggest is very slow. In my first post I refered tot this tyope of query and the performance are really poor (+4 min. on the current data set)
If you change your clustered index on IP2COUNTRY_LOOKUP to START ASC,END ASC, does this improve?
I had a Clustered index on START and END both where ASC.
Using spaghettidba's trick to measure the time, it still took +4 min. Compare this to the 9sec using the TOP(1) query...
May 15, 2014 at 7:05 am
bas_vdl (5/15/2014)
HowardW (5/15/2014)
bas_vdl (5/15/2014)
Hi HowardW,Thanks for your input. You're right about the fact that the ip ranges are not overlapping. The query you suggest is very slow. In my first post I refered tot this tyope of query and the performance are really poor (+4 min. on the current data set)
If you change your clustered index on IP2COUNTRY_LOOKUP to START ASC,END ASC, does this improve?
I had a Clustered index on START and END both where ASC.
Using spaghettidba's trick to measure the time, it still took +4 min. Compare this to the 9sec using the TOP(1) query...
The DDL you posted has a clustered index on just START. It has a primary key on Start, End, which is not the same thing. Just to clarify, are you saying you've tried it with different DDL to what you've posted and it's still 4 minutes?
May 15, 2014 at 7:49 am
HowardW (5/15/2014)
bas_vdl (5/15/2014)
HowardW (5/15/2014)
bas_vdl (5/15/2014)
Hi HowardW,Thanks for your input. You're right about the fact that the ip ranges are not overlapping. The query you suggest is very slow. In my first post I refered tot this tyope of query and the performance are really poor (+4 min. on the current data set)
If you change your clustered index on IP2COUNTRY_LOOKUP to START ASC,END ASC, does this improve?
I had a Clustered index on START and END both where ASC.
Using spaghettidba's trick to measure the time, it still took +4 min. Compare this to the 9sec using the TOP(1) query...
The DDL you posted has a clustered index on just START. It has a primary key on Start, End, which is not the same thing. Just to clarify, are you saying you've tried it with different DDL to what you've posted and it's still 4 minutes?
I changed the indexes as I mentioned in post http://www.sqlservercentral.com/Forums/FindPost1571260.aspx. It sounds strange to me too that a JOIN with a BETWEEN operator is so slow...
Index
ALTER TABLE [dbo].[IP2COUNTRY_LOOKUP] ADD CONSTRAINT [PK_IP2COUNTRY_LOOKUP] PRIMARY KEY CLUSTERED
(
[START] ASC,
[END] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
May 15, 2014 at 8:01 am
set statistics io, time on
PRINT 'Original query'
SELECT
(SELECT COUNTRY_CODE_ALPHA2
FROM dbo.IP2COUNTRY_LOOKUP C
WHERE START = (SELECT max(START)
FROM DBO.IP2COUNTRY_LOOKUP
WHERE START <= E.SOURCE_ADDRESS_INT)
AND "END" >= E.SOURCE_ADDRESS_INT) AS SRC_COUNTRY_CODE_ALPHA2,
E.*
INTO #Temp1
FROM SECURITY_EVENTS_STG_TEST E
print '====================================================================================='
PRINT 'Possible replacement query'
SELECT
SRC_COUNTRY_CODE_ALPHA2 = x.COUNTRY_CODE_ALPHA2,
e.*
INTO #Temp2
FROM SECURITY_EVENTS_STG_TEST e
CROSS APPLY (
SELECT TOP 1 c.COUNTRY_CODE_ALPHA2
FROM dbo.IP2COUNTRY_LOOKUP c
WHERE e.SOURCE_ADDRESS_INT BETWEEN c.[START]
AND c.[END]
ORDER BY c.[START] DESC
) x
set statistics io, time off
/*
Original query
Table 'IP2COUNTRY_LOOKUP'. Scan count 490782, logical reads 1474199, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'SECURITY_EVENTS_STG_TEST'. Scan count 5, logical reads 583, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(245391 row(s) affected)
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 5693 ms, elapsed time = 1652 ms.
=====================================================================================
Possible replacement query
Table 'IP2COUNTRY_LOOKUP'. Scan count 245391, logical reads 782283, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'SECURITY_EVENTS_STG_TEST'. Scan count 5, logical reads 583, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(245391 row(s) affected)
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 1904 ms, elapsed time = 621 ms.
*/
This is with a unique clustered index on start and end.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
May 15, 2014 at 8:17 am
This query
SELECT COUNTRY_CODE_ALPHA2,
e.*
into #temp3
FROM SECURITY_EVENTS_STG_TEST e
INNER JOIN dbo.IP2COUNTRY_LOOKUP c
ON e.SOURCE_ADDRESS_INT BETWEEN c.[START] AND c.[END]
With a covering index on SECURITY_EVENTS_STG_TEST.SOURCE_ADDRESS_INT
Results in seeks with no (relatively expensive) residual predicate:
Seek Keys[1]:
Start: [dbo].[SECURITY_EVENTS_STG_TEST].SOURCE_ADDRESS_INT >= Scalar Operator([dbo].[IP2COUNTRY_LOOKUP].[START] as [c].[START]),
End: [dbo].[SECURITY_EVENTS_STG_TEST].SOURCE_ADDRESS_INT <= Scalar Operator([dbo].[IP2COUNTRY_LOOKUP].[END] as [c].[END])
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Viewing 7 posts - 16 through 22 (of 22 total)
You must be logged in to reply to this topic. Login to reply