Storing IPv4 Addresses for Performance

  • Jean-Sebastien Carle (8/19/2009)


    TheSQLGuru (8/19/2009)


    could you please redo your testing using a char(17) datatype instead of a varchar(17)?

    I'm not sure what would be the point since IP addresses (when used as strings) are not fixed length.

    The point would be exactly what your article was about - what offers the best performance. There is a chance that fixed storage will perform better (however slightly just like parsename as opposed to substrings). It should be very easy to modify your existing test suite to try this option out. It wasn't like I was asking you to completely reinvent the wheel. 🙂

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • rja.carnegie (8/19/2009)


    lhowe (8/19/2009)


    Darn it HansVE, you beat me to the punch! I was going to say exactly the same thing. I don't see why someone would go to all of the trouble to convert IPv4 values this way, and back again, when it's far easier just to store them as four tinyint values.

    There could be a performance benefit. My own tests a while back indicated that one int comparison was equally efficient as smallint or tinyint, probably because you're using a computer with 32-bit registers. The bit datatype stunk - but this was on SQL Server 2000 or maybe even 7. Still, I prefer to put bit in tinyint.

    I agree however that four octets is useful in many ways. There's nothing to stop you storing an int or binary(4) version as well.

    I spent a little while devising an encoding of 6 character alphanumeric strings (09AZ) into int, for an indexed key field. I got slightly poorer performance with int than with char(6) even not counting the cost of conversion, which wasn't what I expected. I have increased my knowledge of indexing since, too (apparently non-declared-unique indexes have 8 bytes arbitrarily added to the key??) Perhaps I got it wrong.

    Anyway, maybe we should be thinking about IPv6, where the address is, what, 16 bytes? Plus refinements that the Wikipedia page hasn't yet communicated to me. 😉 Binary(16) for that, I think. Although IPv4 is not going away quickly yet.

    Yes, we now use binary(16) because we may need to support IPV6 in the future. The binary datatype has the advantage of working with SUBSTRING() and many other string functions. SQL Server 2008 also now has the style parameter of 2 to support conversion to and from strings without using undocumented and/or poorly performing functions.

  • I'm with HansVE and the others on the TinyInt split of the address. You can index those columns on any table that includes them and then join each of the four pieces together very efficiently. You can optionally store the address as a varchar as well if necessary. The cool thing about this is that you can easily search the addresses within just 1 of the four parts (or any combo thereof). If you are worried about space over functionality and low TCO these days, pursue an archive/purging option.

    BTW, very cool example of bit shifting.

    J Pratt

  • TheSQLGuru (8/19/2009)


    The point would be exactly what your article was about - what offers the best performance. There is a chance that fixed storage will perform better (however slightly just like parsename as opposed to substrings). It should be very easy to modify your existing test suite to try this option out. It wasn't like I was asking you to completely reinvent the wheel. 🙂

    Right. However, char(15) introduces a new element, trailing whitespace. Wouldn't you consider that as a reason to negate performance benefits char(15) could bring over varchar(15)?

  • Jean-Sebastien Carle (8/19/2009)


    TheSQLGuru (8/19/2009)


    The point would be exactly what your article was about - what offers the best performance. There is a chance that fixed storage will perform better (however slightly just like parsename as opposed to substrings). It should be very easy to modify your existing test suite to try this option out. It wasn't like I was asking you to completely reinvent the wheel. 🙂

    Right. However, char(15) introduces a new element, trailing whitespace. Wouldn't you consider that as a reason to negate performance benefits char(15) could bring over varchar(15)?

    only 1 rtrim is required:

    declare @ip char(15)

    set @ip = '1.1.1.1'

    select '/' + PARSENAME(@IP,4) + '/'

    select '/' + PARSENAME(@IP,3) + '/'

    select '/' + PARSENAME(@IP,2) + '/'

    select '/' + PARSENAME(@IP,1) + '/'

    select '/' + RTRIM(PARSENAME(@IP,1)) + '/'

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • assuming completely random distribution of IP numbers for all octets (no idea if this is valid or not) then the average length of an octet is 2.57 if my math is correct:

    select avg(cast(len(number) as decimal(10,3)))

    from

    (select number

    from numbers --my numbers table starts at 1

    union all

    select 0 as number) as t

    where number < 256

    2.570312

    thus with the 3 periods the average length of a full ip address is:

    select (4*2.57) + 3 --13.28

    with the overhead of a variable character it is thus more efficient space-wise, by 0.28 bytes per value if memory serves, to store an IP address as a char(15). doing so you also avoid the dereference for the length calculation as well

    Again, this doesn't mean that it will be more efficient to store/process IP addresses as a char(15) using your methodology, but it is clearly an option worth pursuing IMHO

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • A quick comment on the "4 tinyints" idea. It does depend on your situation, but having it predivided into octets can actually make subnet handling more difficult, or at least no better, thanks to CIDR. No statistics handy, but I feel confident that the number of organizations that have non-multiple-of-8 bit subnet masks is not insignificant.

  • The cool thing about this is that you can easily search the addresses within just 1 of the four parts (or any combo thereof).

    Is this something you do a lot? Or was it an example of something you could do? And coincidently you could do the same thing with the binary method with BETWEEN operations, not as easy I agree but possible.

    Also I agree, very cool example of bit shifting..

    I wonder about the index size of 4 tinyint columns as opposed to a single varbinary(4) column.. Let the index gods speak to that..

    CEWII

  • DC (8/19/2009)


    A quick comment on the "4 tinyints" idea. It does depend on your situation, but having it predivided into octets can actually make subnet handling more difficult, or at least no better, thanks to CIDR. No statistics handy, but I feel confident that the number of organizations that have non-multiple-of-8 bit subnet masks is not insignificant.

    I agree that it is "no better" in such cases. But we prefer to work with values and masks that are recognizable, e.g.

    WHERE ip1 & 128 = 128 AND ip2 & 32 = 32

  • How would something like this compare performance-wise? It gives you readability, but also great flexibility in filtering and sorting. You could take the persisted column and apply it to a number of the scenarios offered here to prevent using udf's in select statements.

    Note: For testing I only did 16 numbers for each part (65536 IP's).

    if object_id('IPv4') is not null

    begin drop table dbo.IPv4 end

    create table dbo.IPv4

    (ID integer not null identity(1,1) unique

    ,P1 tinyint not null

    ,P2 tinyint not null

    ,P3 tinyint not null

    ,P4 tinyint not null

    ,IPAddress as

    convert(varchar(3), P1) + '.' +

    convert(varchar(3), P2) + '.' +

    convert(varchar(3), P3) + '.' +

    convert(varchar(3), P4)

    persisted

    );

    create clustered index ix4Part on IPv4

    (P1

    ,P2

    ,P3

    ,P4

    )

    create index ixIPAddress on IPv4

    (IPAddress

    );

    WITH Digits(i)

    AS (SELECT i

    FROM (select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9 union select 0

    union select 10 union select 11 union select 12 union select 13 union select 14 union select 15) AS X(i))

    INSERT INTO IPv4(P1, P2, P3, P4)

    SELECTD3.i

    ,D2.i

    ,D1.i

    ,D0.i

    FROM Digits AS D0, Digits AS D1, Digits AS D2, Digits AS D3

    selectIPAddress

    fromIPv4

    whereP1 = 12

    order by

    P1

    ,P2

    ,P3

    ,P4

    selectIPAddress

    fromIPv4

    whereP1 in (11, 12)

    and P2 = 0

    and P3 between 0 and 255

    and P4 between 0 and 255

    order by

    P1

    ,P2

    ,P3

    ,P4

    selectIPAddress

    fromIPv4

    whereIPAddress = '12.5.2.12'

  • TheSQLGuru (8/19/2009)


    Again, this doesn't mean that it will be more efficient to store/process IP addresses as a char(15) using your methodology, but it is clearly an option worth pursuing IMHO

    I took your suggestion and decided to do some testing using varchar(15) vs char(15).

    When going from a varchar(15) to integer, I used the following : CREATE FUNCTION [dbo].[VarCharIPAddressToInteger]

    (

    @IPvarchar(15)

    )

    RETURNS int

    AS

    BEGIN

    RETURN

    CAST(PARSENAME(@IP,1) AS INT) +

    CAST(PARSENAME(@IP,2) AS INT) * 256 +

    CAST(PARSENAME(@IP,3) AS INT) * 65536 +

    (CAST(PARSENAME(@IP,4) AS INT) - 128) * 16777216

    ENDThe logical equivalent to that using char(15) would be : CREATE FUNCTION [dbo].[CharIPAddressToInteger]

    (

    @IPchar(15)

    )

    RETURNS int

    AS

    BEGIN

    RETURN

    CAST(PARSENAME(@IP,1) AS INT) +

    CAST(PARSENAME(@IP,2) AS INT) * 256 +

    CAST(PARSENAME(@IP,3) AS INT) * 65536 +

    (CAST(PARSENAME(@IP,4) AS INT) - 128) * 16777216

    ENDNot much to change.

    In other direction, things also look similar : CREATE FUNCTION [dbo].[IntegerIPAddressToVarChar]

    (

    @IPint

    )

    RETURNS varchar(15)

    AS

    BEGIN

    RETURN

    CAST(((@IP & 0xFF000000) / 16777216) + 128 AS varchar(3)) + '.' +

    CAST((@IP & 0x00FF0000) / 65536 AS varchar(3)) + '.' +

    CAST((@IP & 0x0000FF00) / 256 AS varchar(3)) + '.' +

    CAST((@IP & 0x000000FF) AS varchar(3))

    ENDAnd the char(15) equivalent : CREATE FUNCTION [dbo].[IntegerIPAddressToChar]

    (

    @IPint

    )

    RETURNS char(15)

    AS

    BEGIN

    RETURN

    CAST(((@IP & 0xFF000000) / 16777216) + 128 AS varchar(3)) + '.' +

    CAST((@IP & 0x00FF0000) / 65536 AS varchar(3)) + '.' +

    CAST((@IP & 0x0000FF00) / 256 AS varchar(3)) + '.' +

    CAST((@IP & 0x000000FF) AS varchar(3))

    END

    I created the following table which I then filled with 100,000 rows of random IP addresses (same IP per row in all 3 fields) :CREATE TABLE [dbo].[IPAddresses](

    [CharIPAddress] [char](15) NOT NULL,

    [VarCharIPAddress] [varchar](15) NOT NULL,

    [NumericIPAddress] [int] NOT NULL

    ) ON [PRIMARY]

    And the results of the testing was as follows (run ten times and averaged for accuracy) :SELECT [dbo].[VarCharIPAddressToInteger]([VarCharIPAddress])

    FROM [dbo].[IPAddresses]788ms


    SELECT [dbo].[CharIPAddressToInteger]([CharIPAddress])

    FROM [dbo].[IPAddresses]812ms


    SELECT [dbo].[IntegerIPAddressToVarChar]([NumericIPAddress])

    FROM [dbo].[IPAddresses]2058ms


    SELECT [dbo].[IntegerIPAddressToChar]([NumericIPAddress])

    FROM [dbo].[IPAddresses]2308ms


    SELECT RTRIM([dbo].[IntegerIPAddressToChar]([NumericIPAddress]))

    FROM [dbo].[IPAddresses]2078ms


    Although char(15) may be a more efficient way of storing the human readable equivalent of an IP address in the database, performance appears to be worse using char(15) when doing conversions to and from integer based IP address values. SQL Server's internal string handling mechanisms are most likely optimized for variable length strings.

  • Thanks for taking the time JS. Looks like a lot of the timing is in returning the results to the front-end (RTRIM much faster). But looks like overall the var is a winner.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Jean-Sebastien Carle (8/19/2009)


    HansVE (8/19/2009)


    1. You do not know of subnetting?

    Subnetting can easily be achieved using bitmasking on integer IPs. The following will yield 192.168.1.0 :

    SELECT [dbo].[IPAddressToString]([dbo].[IPAddressToInteger]('192.168.1.1') & [dbo].[IPAddressToInteger]('255.255.255.0'))

    I guess the definition of "easily" is subjective...

    HansVE (8/19/2009)


    A "ip_int_to_ip_string computed column" will impede performance, as will the functions in the article.

    Although using multiple tinyint fields requires no computational conversion, it does require that every ORDER BY, UNION, WHERE, etc operation be done four times, whilst a single integer field requires only one operation in all instances.

    SQL Server is a set-based system, so it is only "one operation" in any case. As I wrote, the only part that must be done "four times" is the typing of the SQL code. The four tinyint columns would have a concatenated index, and the performance of this versus an indexed int column is virtually the same.

    Also, if you wish to submit your IP address as a varchar(15) to the stored procedure to be then stored as four tinyint field within the table, you will require the same string splitting conversion as a single integer.

    As you found, the PARSENAME() function makes this trivial.

    Selecting rows to return IP addresses as varchar(15) will also require similar conversions. The only computational difference is mostly the avoidance of a rather efficient bitmask operation, which in my opinion, far outweights the costs of multiple operations when joining, filtering or ordering tables by IP addresses stored as four tinyint fields.

    Again, no "multiple operations" involved--check an execution plan. Speaking of which, do not trust "query cost" when using UDFs in code. Take actual timings using tables with many rows. You will find reason to avoid UDFs and use inline functions whenever possible.

    My point is that you have chosen the wrong data type for this purpose. I have seen schemas where the designers used string data types for dates because they didn't understand the datetime data type. They also had to write much conversion code to make that work.

  • I use four tinyints for the IP, another tinyint for the CIDR mask, and a clustered index on the four tinyints. This design is fast and it's easy to display in the front end application -

    SELECT ip1 + '.' + ip2 + '.' + ip3 + '.' + ip4 + '/' + cidr_mask AS ip_address

  • I believe that another distinction that must be taken into account is the difference between working with individual IP addresses and IP networks. Design considerations are vastly different for each respective scenario. When dealing with networks, storing IP addresses as a single integer may not be the optimal solution. For example, web statistics and security logs would be good candidates for integer based IP addresses while IP network based ACLs would be better suited to a four tinyint approach.

Viewing 15 posts - 16 through 30 (of 101 total)

You must be logged in to reply to this topic. Login to reply