Storing IPv4 Addresses for Performance

  • Elliott W (10/29/2009)


    Tom,

    I take exception to several things here.

    Elliot, if you think I intended to offend the author or malign him in any way, you are quite wrong. If I gave that impression, I expressed myself badly.

    1. There was no confusion that the string version was for human readability and had nothing to do with storage.

    There was a mention at the front of the article that the standard published way back when specified the IP address as 32 bits (4 octets) and that the string of 7 to 15 characters was there to make it easy to read. Then the article dived off into how to convert this string representation back to 32 bits with giving any consideration as to whether the 32-bit IP address might be a perfectly good representation.

    2. The statement: "anyone experienced in data communications who sees a 32 bit int column used to represent IPv4 address will naturally assume that the integer is the one represented by the same bit pattern as the IP address itself" I find to be invalid. If I see a field that is not clear I assume VERY LITTLE, maybe I'm the minority, but I wouldn't just assume that it is the 4 octet bit pattern, I would ask some questions.

    I always hope people won't make unwarranted assumptions too; but after more than 4 decades of experience in computing I've reached (with sadness) the conclusion that although that is my hope I should not permit it to become my expectation.

    3. "It was as if the author knew that IP addresses were in theory 32-bit values but were not aware that that is what they are in practise, not just in theory, that the 32-bit value is what almost all software, other than user interface presentation software, actually uses, and therefor thought that any old 32-bit pattern would be just as good a representation as the 32-bit IP address itself as long as there was a one-to-one correspondance." We we reading the same article? I didn't see any issue here. I thought it was pretty clear, they do represent a 32-bit value BUT SQL can't hold the first octet without some manipulation.

    But it is so trivial to get the IP address from the string representation - just change

    (CAST(PARSENAME(@IP,4) AS INT) - 128) * 16777216

    to read instead

    CASE WHEN CAST(PARSENAME(@IP,4) AS INT) > 127

    THEN CAST(PARSENAME(@IP,4) AS INT) - 256

    ELSE CAST(PARSENAME(@IP,4) AS INT)

    END * 16777216

    and that changes the code in the paper to do it. I think if the author had considered using the natural representation and rejected it he would have mentioned it, so it appears to me that he didn't consider it. That's not terribly surprising, he is a DBA not a data communications developer, but in my view DBAs should ask people with application field expertise before taking decisions like the choice of representation of an IP address.

    As for comments about the risks of subsequent developers screwing up when the see this approach, I guess I have a greater awareness of how often developers quickly skim domcumentation and miss important points, leading to problems down the line; or perhaps I'm just more of a pessimist. Anyway, these risks are secondary to the risk that the real IP address will turn out to be needed once you have some sort of IP address in your DB.

    For a logging application I don't see any clear winner between binary(4) and int, especially with the use of the computed column to make the data readable when we query it for humans to read.

    I don't disagree - there isn't much to choose between them for a logging application - but maybe the natural int representation has some advantages over the weird one of the article?. For a non-logging application, whether the natural representation is there or not is more important that whether it is there in its int verson or its binary(4) version.

    IF I were working on an application that dealt with IP addresses a lot and I thought for a second that I would be doing any submasking, I would almost certainly go with the int type.

    That sounds as if you want to do subnet checking by arithmetic comparison rather than by bitwise XOR and AND; be warned that this will not work with the representation in the article - for example the subnet <126.0.0.0 MASK 249.0.0.0>, although it is a contiguous subrange of 32 bit iP address values (considered as 32 bit unsigned ints) is contiguous neither in the representation of the article nor in the natural representation as 32-bit signed ints. So it's better to use the bitwise masking operations (which are identical for binary(4) and for int). But the bitwise masking operations don't give the correct result if IP addresses and the netmask are all converted to 32 bit form using the article's algorithm - they do only if the natural representation is used for the netmask - in which case why not use it for the addresses as well? (I don't much care about binary(4) vs int, just about natural versus unnatural representation.)

    Tom

  • Tom.Thomson (10/29/2009)


    ... he is a DBA not a data communications developer, but in my view DBAs should ask people with application field expertise before taking decisions like the choice of representation of an IP address.

    How are you so sure that I am a DBA? I am, in fact, a Senior Systems Architect and Developer and simply extend my skill with SQL Server. I have a very solid grasp of networking down to the bit level. I have not blatantly ignored any of its fundamental principles. The perhaps unorthodox approach I took was the one that I believe had the best fit within SQL Server’s sometimes unexplainable limitations. It would appear that you’ve so closely focused on an IPv4 address’s natural representation that you’re ignoring the adaptations that are required within SQL Server to be able to do operations such as atomic subnet mask filtering.

    So I ask you, please show us how you would, using T-SQL, take the IP address 192.168.1.1 in binary form (0xC0A80101), apply a mask of 255.255.255.0 in binary form (0xFFFFFF00), and compare it to the network 192.168.1.0 in binary form (0xC0A80100) to see if the IP address is part of the network.

  • Jean-Sebastien Carle (10/29/2009)


    How are you so sure that I am a DBA? I am, in fact, a Senior Systems Architect and Developer and simply extend my skill with SQL Server. I have a very solid grasp of networking down to the bit level. I have not blatantly ignored any of its fundamental principles.

    It seemed to me to be probable that you were a DBA - after all, most of the articles posted on SQL ServerCentral are written by DBAs, and you certainly appeared to give no consideration to using the natural representation. Nor did you say anything about transforming masks from varchar(15) form, which suggested to me that you hadn't thought of the impact of flipping the MS bit in the mask (flipping that sign bit is what your code does). I didn't at any point suggest that you had blatantly ignored any fundamental principle, nor that you didn't have a solid grasp of networking; if I gave the impression that I was suggesting any of that I'm very sorry, I certainly didn't intend to. I did suggest that you were not a data comms programmer, which I still think is true (for example I think a data comms programmer would know that the first RFC to specify a 32 bit internet address was RFC 760 of Jan 1980, rather than the more recent document that you referenced - and even that wasn't where it was first introduced)- or have you written and/or debugged chunks of code to implement protocols in layer 4 (transport) and downward?

    The perhaps unorthodox approach I took was the one that I believe had the best fit within SQL Server’s sometimes unexplainable limitations. It would appear that you’ve so closely focused on an IPv4 address’s natural representation that you’re ignoring the adaptations that are required within SQL Server to be able to do operations such as atomic subnet mask filtering.

    Well, since T-SQL code to transform the string form to the natural representation is only trivially different from what you suggested (see snippet in the comment which has annoyed you) it doesn't seem to me that the choice of representation was really to do with SQL limitations; and I already explained in another comment how I would handle subnet checking, but as I just wrote that in algebraic notation, not in T-SQL, I've provided some T-SQL below. As to focus, well it's pretty trivial to store the natural representation so the question to me was why on earth anyone would want to do anything else.

    So I ask you, please show us how you would, using T-SQL, take the IP address 192.168.1.1 in binary form (0xC0A80101), apply a mask of 255.255.255.0 in binary form (0xFFFFFF00), and compare it to the network 192.168.1.0 in binary form (0xC0A80100) to see if the IP address is part of the network.

    It's easy enough - although I don't understand why you want me to use binary, since the code snippet in the comment that offended you is clearly based on it being int. But code for binary is close to identical to that for int, so here it is:

    -- first set up the addresses and mask specified by JS-C

    declare @netAddress binary(4) set @netAddress = 0xC0A80100

    declare @netMask binary(4) set @netMask = 0xFFFFFF00

    declare @candidate binary(4) set @candidate = 0xC0A80101

    -- now the one-liner test to see whether the two addresses are in the same subnet if the specified mask is used

    if ((@netAddress ^ cast(@candidate as int)) & @netMask ) = 0

    select 'The candidate address is in the target network'

    else

    select 'The candidate address is not in the target network'

    go

    This works for all valid network addresses and netmasks, and all valid candidate addresses. It is symmetrical in the two addresses - what we are actually checking is whether the two subnets each containing one of the given addresses and both having the given mask are identical, so the cast is an annoyance (and one that really is caused by a ludicrous restriction in SQL - why on earth can either argument of ^ be binary but not both) which can be avoided by using int instead of binary(4) (and the symmetry of course merans that the cast can be moved to the other address instead in contexts where that would be more efficient): the real issue is about what bit pattern should be used and not about how it should be typed. There's one thing that sometimes makes me (mistakenly) think that binary might have an advantage: machines I worked on a (very) long time ago required int to be word-aligned, which made it a pain in the butt for byte-stream (even worse for bit-stream, like HDLC and SDLC - but of course bit-stuffing is now done in hardware) protocols, but that can be dealt with at the DB interface, it's not relevant internally, so I should ignore that thought.

    It's not actually difficult to produce code that works when the numbers are transformed as you suggest: it just needs one extra XOR (and some brackets) to test whether 192.168.1.42 mask 255.255.255.0 and 192.168.1.192 mask 255.255.255.0 are both the same when you are representing the addresses and mask by the integers 0x40A8012A, 0x40A801C0, and 0x7FFFFF00, even while making sure that you don't accidentally include include for example 64.168.1.200 in that subnet - but the test is no longer "no bits are 1 in the mask where the two addresses have different bits in that position", so it no longer matches the definition of the subnet in a straightforward manner. And that, to me, seems to be pointless obscurity. One can fix it by saying "we'll use the mask as its own representation" but that would seem very strange when you've rejected using the IP address as its own representation because you have to work around the sign bit, since the masks and the IP addresses have identical structures and identical human-readable forms (with of course some constraints on which values are allowed to be used as masks).

    Tom

  • You know they say about assumptions..

    Also I think the statement:

    It seemed to me to be probable that you were a DBA - after all, most of the articles posted on SQL ServerCentral are written by DBAs, and you certainly appeared to give no consideration to using the natural representation.

    Is wrong, while I may be working as a DBA right now, that is not what I have been doing for the last several decades.. And very few of the DBAs that I know are only working in that role now, they all have a much wider range of experience.

    I had to think about whether I wanted to respond at all..

    That's not terribly surprising, he is a DBA not a data communications developer, but in my view DBAs should ask people with application field expertise before taking decisions like the choice of representation of an IP address.

    I am taking exception to the premise of "not a data communications developer". That makes the statement that the front-end guy should be setting the datatypes, and that is a BAD assumption. Most (and I know I'm generalizing) front-end developers get into SQL so they can get data for their front-ends, and a data guy can usually tell the databases that were built by front-end guys, the front-end guys just didn't know any better, involvement of a data guy is usually pretty clear. Now as a subject matter expert in SQL I would ask the "data communications developer" what he needed and how he was going to use it and I would set the type, now if the developer was unhappy with that we would likely haggle. But then end decision is MINE, because it is MY function to be a an expert in SQL.

    CEWII

  • Tom.Thomson (10/31/2009)


    It seemed to me to be probable that you were a DBA - after all, most of the articles posted on SQL ServerCentral are written by DBAs, and you certainly appeared to give no consideration to using the natural representation ... if I gave the impression that I was suggesting any of that I'm very sorry, I certainly didn't intend to.

    I'm not upset in any way, but there's a saying that I like to keep in the forefront of my mind. To assume makes an *** out of u and me.

    I did suggest that you were not a data comms programmer ...

    If you mean on the embedded level, that is most certainly true. I do hold my own when it comes to networking nonetheless because of the large quantity of networks I have deployed and managed as well as the amount of network focused applications I have developed that involved either subnetting or packet filtering.

    ... would know that the first RFC to specify a 32 bit internet address was RFC 760 of Jan 1980, rather than the more recent document that you referenced - and even that wasn't where it was first introduced ...

    The reference to the RFC wasn't meant as a technical accuracy to the first mention as such and, if you notice my wording, I don't specify as much either. I was simply being a writer through that passage in an attempt to make the article seem less monotonous. Even the driest of us nerds can get bored of looking at flat code and table summaries.

    ... have you written and/or debugged chunks of code to implement protocols in layer 4 (transport) and downward?

    I cannot say that I have had the luxury to do so, no.

    ... see snippet in the comment which has annoyed you ... the code snippet in the comment that offended ...

    I am not quite sure how you infered that I was annoyed. I appologize if I led you to believe as such.

    it's pretty trivial to store the natural representation so the question to me was why on earth anyone would want to do anything else

    Because storage is not the only thing to take into consideration. If, in your application or usage, the only requirement is to log and store IPv4 addresses, then I would perhaps even question if there was a need to store the data in SQL Server. Traditionally, SQL Server's role is not only for storage but also to act as a mechanism to interact, aggregate and analyse data as well.

    It's easy enough - although I don't understand why you want me to use binary, since the code snippet in the comment that offended you is clearly based on it being int.

    Because you yourself have repeated several times that the natural representation of an address is in its binary 32-bit unmodified form. I simply wanted to see how you would achieve the required mathematics to accomplish subnet filtering while working around SQL Server's limitations. Specifically the fact that "In a bitwise operation, only one expression can be of either binary or varbinary data type.".

    ... why on earth can either argument of ^ be binary but not both ...

    As much as I agree, it is outside of our control. What we can control is how we choose to work around those restrictions to provide the best solution to satisfy our goal. That goal may be different for different people. Some may choose performance as their target, others storage efficiency and perhaps someone like yourself may choose to negate either for the purpose of natural representation. (Not to say that doing so cannot be done without achieving either of the other goals as well.)

    ... one thing that sometimes makes me (mistakenly) think that binary might have an advantage: machines I worked on a (very) long time ago required int to be word-aligned, which made it a pain in the butt for byte-stream (even worse for bit-stream, like HDLC and SDLC - but of course bit-stuffing is now done in hardware) protocols, but that can be dealt with at the DB interface, it's not relevant internally, so I should ignore that thought.

    SQL Server's 32 bit integer is its most used and most optimized datatype. Everything from indexes to storage and aggregate functions to joins have all been tweaked to gills in terms of efficiency and performance for this datatype. There are other datatypes that will perform similarly but I do not know of a datatype that can beat the integer.

    It's not actually difficult to produce code that works when the numbers are transformed as you suggest: it just needs one extra XOR (and some brackets) to test ...

    I did not believe it as something that was difficult and difficulty is an arbitrary thing once wrapped inside a stored procedure or a function. It is mostly a curiosity to see your approach to the matter and, again, how you would work with SQL Server's limitations.

    ... it no longer matches the definition of the subnet in a straightforward manner. And that, to me, seems to be pointless obscurity. One can fix it by saying "we'll use the mask as its own representation" but that would seem very strange when you've rejected using the IP address as its own representation because you have to work around the sign bit, since the masks and the IP addresses have identical structures and identical human-readable forms (with of course some constraints on which values are allowed to be used as masks).

    I have to admit that at this point, I was no longer clear on the point you were trying to make as it is unclear what is pointlessly obscure, your XOR approach or my bitshifted approach.

    Now allow me to make some points of my own.

    I understand your point of view regarding storing IP addresses under their natural representations but being that SQL Server's integer datatype is its most efficient, I would still prefer to store the address as such regardless of the fact that it requires additional transformation to do so. At the end of the day, what is most important to me is how SQL Server performs with the data I choose to store within my database.

    Generally speaking, IP addresses exist within a database as a result of logging. The logging will often cumulate over time to generate hundreds of thousands if not millions of rows. When operations are done over those rows, they usually revolve around verifying if an IP address already exists, verifying the number of times a specific IP address has been logged or finding logged entries that match either a list of addresses or networks. The efficiency of doing those operations is based on the performance achieved when SQL Server executes the mathematics, the aggregate functions, the joins and/or sorts required to do so.

    Binary does not perform as well as the integer although the difference may be negligable if either your dataset is small or your operations are simplistic.

    To illustrate, I used my test database. Within it I have two IP address tables. The first one ([dbo].[IPAddresses]) contains 100,000 random IP addresses in three forms varchar(15), int and binary. The second one ([dbo].[ManyIPAddresses]) contains 1,000,000 random IP addresses (10 copies of the [dbo].[IPAddresses] table).

    If we look at simply filtering 1,000,000 rows to find IP addresses from a certain subnet, here is the setup for our comparison:

    DECLARE @BinaryNetwork binary(4) SET @BinaryNetwork = [dbo].[ParseToBinary]('192.168.1.0')

    DECLARE @BinaryMask binary(4) SET @BinaryMask = [dbo].[ParseToBinary]('255.255.255.0')

    DECLARE @IntegerNetwork int SET @IntegerNetwork = [dbo].[ParseToInt]('192.168.1.0')

    Storing the address as binary(4) and filtering using XOR:

    SET STATISTICS TIME ON

    SELECT [StringAddress]

    FROM [dbo].[ManyIPAddresses]

    WHERE ((@BinaryNetwork ^ CAST([BinaryAddress] AS int)) & @BinaryMask) = 0

    SET STATISTICS TIME OFF

    SQL Server Execution Times:

    CPU time = 140 ms, elapsed time = 162 ms.

    Storing the address as an int and filtering using AND with an int mask:

    SET STATISTICS TIME ON

    SELECT [StringAddress]

    FROM [dbo].[ManyIPAddresses]

    WHERE [IntegerAddress] & @BinaryMask = @IntegerNetwork

    SET STATISTICS TIME OFF

    SQL Server Execution Times:

    CPU time = 94 ms, elapsed time = 98 ms.

    Now if we compare the results between doing a binary JOIN vs an integer JOIN, this is what we achieve.

    Doing a binary JOIN:

    SET STATISTICS TIME ON

    SELECT dbo.IPAddresses.StringAddress

    FROM dbo.IPAddresses

    INNER JOIN dbo.ManyIPAddresses

    ON dbo.IPAddresses.BinaryAddress = dbo.ManyIPAddresses.BinaryAddress

    SET STATISTICS TIME OFF

    SQL Server Execution Times:

    CPU time = 1718 ms, elapsed time = 45913 ms.

    Doing an integer JOIN:

    SET STATISTICS TIME ON

    SELECT dbo.IPAddresses.StringAddress

    FROM dbo.IPAddresses

    INNER JOIN dbo.ManyIPAddresses

    ON dbo.IPAddresses.IntegerAddress = dbo.ManyIPAddresses.IntegerAddress

    SET STATISTICS TIME OFF

    SQL Server Execution Times:

    CPU time = 1032 ms, elapsed time = 45409 ms.

    As for sorting, this is what we achieve.

    Doing a binary SORT:

    SET STATISTICS TIME ON

    SELECT [StringAddress]

    FROM [dbo].[ManyIPAddresses]

    ORDER BY [BinaryAddress] ASC

    SET STATISTICS TIME OFF

    SQL Server Execution Times:

    CPU time = 3390 ms, elapsed time = 46790 ms.

    Doing an integer SORT:

    SET STATISTICS TIME ON

    SELECT [StringAddress]

    FROM [dbo].[ManyIPAddresses]

    ORDER BY [IntegerAddress] ASC

    SET STATISTICS TIME OFF

    SQL Server Execution Times:

    CPU time = 2625 ms, elapsed time = 45974 ms.

    And let's finish with some grouping.

    Doing a binary GROUP BY:

    SET STATISTICS TIME ON

    SELECT [BinaryAddress], COUNT([BinaryAddress])

    FROM [dbo].[ManyIPAddresses]

    GROUP BY [BinaryAddress]

    SET STATISTICS TIME OFF

    SQL Server Execution Times:

    CPU time = 610 ms, elapsed time = 3723 ms.

    Doing an integer GROUP BY:

    SET STATISTICS TIME ON

    SELECT [IntegerAddress], COUNT([IntegerAddress])

    FROM [dbo].[ManyIPAddresses]

    GROUP BY [IntegerAddress]

    SET STATISTICS TIME OFF

    SQL Server Execution Times:

    CPU time = 484 ms, elapsed time = 2911 ms.

    So as you can see, not only does binary have its mathematical limitations, it also has its performance limitations. That's not to mention the fact that certain operations simply cannot be done on binary fields. Attempting to perform an aggregate function like AVG on a binary field will yield the error "Operand data type binary is invalid for avg operator.". Although this is something that would never be performed on IP addresses, it is still to be noted. But I digress.

    With the obvious performance advantages offered by the integer datatype, I would not hesitate to store IP addresses in the same way as I described in my article. The cost of transformation at the time of insert is minimal at best. With the speed of computed columns, transformation from its stored integer form back to a human readable varchar(15) or a naturally represented binary(4) can be also be achieved at a minimal cost. So is storing an IP address in its natural representation more important then storing it for efficiency and performance? I don't believe so but perhaps that's a need you may have.

  • Thankyou (Ladies and) Gentlemen

    for this ongoing discussion. Recent items have shown me the need to make further comparions on what is the best way to store an IP Address.

  • I'm not sure if I already said this in the thread, and it's a minor point, but I am not sure that int is the best thing about SQL Server generally - because when I performed an experiment on encoding char(6) alphanumeric strings (A-Z 0-9) - six bytes - into four bytes int in SQL Server 2000 as more economical indexed keys, the int representations seemed to perform worse as keys than the strings. I expect I have the details on file somewhere. That suggests that Microsoft worked hard on getting char key indexing to work really well. However, I may have overlooked other factors.

    On another occasion I found that there was not much performance difference between int, smallint, and tinyint, presumably because a modern server's processor does 32 bit or 64 bit computation anyway. And bit stunk (they may have improved it since), and has use limitations as mentioned, so in practical programming I usually put bit data in tinyint - I've also considered using a user-defined 'tinybit' data type which would just be tinyint with or without a constraint of being <= 1 or IN (0, 1). Having said that, I don't know what CAST costs here. Maybe nothing.

    One factor that my experiments probably excluded is hard disk access time, because my data probably ended up cached in RAM. Because of that, int ought to do better than char(6) in real applications, if - if - query performance is limited by disk speed and not by processor. In other words, if it isn't a fancy query that requires a lot of calculation per unit of data. But I've decided not to rely too much on what I think is going to work well in SQL Server.

  • Jean-Sebastien Carle (10/31/2009)


    <much snipped>

    Tom.Thomson (10/31/2009)It's easy enough - although I don't understand why you want me to use binary, since the code snippet in the comment that offended you is clearly based on it being int.

    Because you yourself have repeated several times that the natural representation of an address is in its binary 32-bit unmodified form.

    <more snipped>

    I think we have managed to get ourselves misunderstanding each-other quite a lot.

    When I said "the natural representation" I meant one that use the bit pattern that is the IP address - whether it's typed as binary(4) or as int doesn't come into it. That's why I didn't (as I pointed out) understand why you wanted me to provide the code for the binary(4) version. Your integer representation flips the top bit, so it isn't a natural representation, and I already had posted a suggested mod to your transformation which delivers an integer without flipping the top bit, which was the natural representation I was advocating. This is particularly important when using the transformation to convert a mask from dotted form to a bit pattern, because if your original transformation is used for the mask the test

    WHERE [IntegerAddress] & @IntegerMask = @IntegerNetwork

    will sometimes give the wrong result: for example it will include 64.1.1.0 in the network <192.1.1.0 mask 255.255.255.255> because @IntegerMask will be positive instead of negative; and economy of code suggests that once you have the right transformation for the mask you may as well use it for all transformations from 3 dot format to int.

    Also, I dislike that version of the test for a simple reason: it assumes that the subnet is always specified using the lowest address that it contains. But in practise how the subnet ios specified tends to vary according to where you are in the network - in a subnet with mask M, a machine with address A will tend to view the subnet as <A MASK M> rather than as <A&M MASK M>. That's why I would use a test which checks whether there is any bit position where the mask is 1 and the network and candidate addresses differ, if the subnet information was coming from the outside world as parameters to a stored proc. This could be expresssed (assuming that @IntegerMask doesn't have the top bit flipped) by

    WHERE [IntegerAddress] & @IntegerMask = @IntegerNetwork & @IntegerMask

    as well as by the expression using XOR. Obviously when a subnet is stored in a table it makes a lot of sense to use its lowest address as the network address, but I would prefer not to impose that restriction at the API between the DB and the outside world.

    I agree completely with you that efficiency matters. Bothe performance and storage have to be considered, and if one has an address table with various forms in computed columns and only one form in storage, the best form for storage will be integer because it comes top for both storage efficiency (best equal with a few other forms) and performance. But I still think that the integer that doesn't have the flipped top bit is a better representation than the one that does.

    When one is going for efficiency, one extra case that needs to be considered is the case where a table with a lot of addresses in it has to be filtered to get the rows in a particular subnet. I'm not sure that the optimiser will manage to use an index on the address if the column is subject to an & operation or an ^ operation in the filter; so I would probably want to convert the subnet to a pair of addresses, the lower and upper bound, for this case; then the index can certainly be used. Also one may want to discover whether two subnets and disjoint, or whether one is a subnet of the other. In these cases too it's useful to convert to lower bound - upper bound form. Should the two bounds be the standard representation of the subnet if it has to be held in the table, or should it be lower bound and mask If it's lower bound and upper bound, the mask can be held (if wanted) as a computed colum easily enough. If it's lower cound and mask, the upper bound can be a computed column. So I guess one would have to look at the performace and all operations and the frequency of each and cdecide accordingly - difficult in a world with incremental requirement discovery, incremental development, and incremental release.

    Tom

  • Jean-Sebastien Carle (10/31/2009)


    Generally speaking, IP addresses exist within a database as a result of logging. The logging will often cumulate over time to generate hundreds of thousands if not millions of rows. When operations are done over those rows, they usually revolve around verifying if an IP address already exists, verifying the number of times a specific IP address has been logged or finding logged entries that match either a list of addresses or networks. The efficiency of doing those operations is based on the performance achieved when SQL Server executes the mathematics, the aggregate functions, the joins and/or sorts required to do so.

    Binary does not perform as well as the integer although the difference may be negligable if either your dataset is small or your operations are simplistic.

    To illustrate, I used my test database. Within it I have two IP address tables. The first one ([dbo].[IPAddresses]) contains 100,000 random IP addresses in three forms varchar(15), int and binary. The second one ([dbo].[ManyIPAddresses]) contains 1,000,000 random IP addresses (10 copies of the [dbo].[IPAddresses] table).

    If we look at simply filtering 1,000,000 rows to find IP addresses from a certain subnet, here is the setup for our comparison:

    DECLARE @BinaryNetwork binary(4) SET @BinaryNetwork = [dbo].[ParseToBinary]('192.168.1.0')

    DECLARE @BinaryMask binary(4) SET @BinaryMask = [dbo].[ParseToBinary]('255.255.255.0')

    DECLARE @IntegerNetwork int SET @IntegerNetwork = [dbo].[ParseToInt]('192.168.1.0')

    [snip]

    So as you can see, not only does binary have its mathematical limitations, it also has its performance limitations. That's not to mention the fact that certain operations simply cannot be done on binary fields. Attempting to perform an aggregate function like AVG on a binary field will yield the error "Operand data type binary is invalid for avg operator.". Although this is something that would never be performed on IP addresses, it is still to be noted. But I digress.

    With the obvious performance advantages offered by the integer datatype, I would not hesitate to store IP addresses in the same way as I described in my article. The cost of transformation at the time of insert is minimal at best. With the speed of computed columns, transformation from its stored integer form back to a human readable varchar(15) or a naturally represented binary(4) can be also be achieved at a minimal cost. So is storing an IP address in its natural representation more important then storing it for efficiency and performance? I don't believe so but perhaps that's a need you may have.

    I am not a network programmer, so I'll get that out of the way up front. 😉

    However, don't bitmasks always represent contiguous blocks of address space? Therefore, wouldn't you get better performance if you calculate the beginning and ending addresses in the range represented by the bitmask and then use BETWEEN rather than bitwise operators? This eliminates per-row implicit conversions within the query and it should be sargable. I'd be interested to see how it compares to the test results you posted.

    As for using AVG on IP addresses, I can't think of a case where that even makes sense. (Your comment indicates you agree.) As such, I don't see much point in noting it. In fact, it may be a VERY minor point, but I can see a case to be made for preferring binary(4) specifically because it doesn't even offer operations like AVG that have no meaning relevant to the data.

    Andrew

    --Andrew

  • I agree that aggregate functions make little sense for IP Addresses.

    This is a very important discussion for developers such as myself as originating IP addresses need to be logged quickly and efficiently with regard to space.

  • Tom.Thomson (11/3/2009)


    Your integer representation flips the top bit, so it isn't a natural representation, and I already had posted a suggested mod to your transformation which delivers an integer without flipping the top bit, which was the natural representation I was advocating.

    The problem is that you cannot represent the IP address in its natural representation in both integer form and binary form at the same time using the int datatype. I believe that we can both agree that 0.0.0.0 should be 0x00000000 and 0 but since we can't store 255.255.255.255 (0xFFFFFFFF) as an unmodified int without using the bigint datatype, we have no choice but to finish with bitshifting and playing with the sign bit to make our value fit within SQL Server's signed int.

    So the question is what's more important? Casting compatibility or integer ordering? If being able to cast to the binary form with no additional manipulation is better for you, then hands down, your variation is more faithful. However, your variation comes with its own setback that my variation does not incur. If you were to translate from 0.0.0.0 to 255.255.255.255 using your method, you'd find that you would not be able to sort your field without casting it to binary first as your translation results in the lowest integer value starting in the middle of the IP range. In my variation, that's not the cast as 0.0.0.0 yields the lowest integer and 255.255.255.255 yields the highest.

    To illustrate, here is the comparison. IP1 is my translation method with it's binary casted result, IP2 is yours.

    IP IP1 IP2

    --------------- ----------- ---------- ----------- ----------

    0.0.0.0 -2147483648 0x80000000 0 0x00000000

    1.0.0.0 -2130706432 0x81000000 16777216 0x01000000

    2.0.0.0 -2113929216 0x82000000 33554432 0x02000000

    3.0.0.0 -2097152000 0x83000000 50331648 0x03000000

    4.0.0.0 -2080374784 0x84000000 67108864 0x04000000

    5.0.0.0 -2063597568 0x85000000 83886080 0x05000000

    6.0.0.0 -2046820352 0x86000000 100663296 0x06000000

    7.0.0.0 -2030043136 0x87000000 117440512 0x07000000

    8.0.0.0 -2013265920 0x88000000 134217728 0x08000000

    9.0.0.0 -1996488704 0x89000000 150994944 0x09000000

    10.0.0.0 -1979711488 0x8A000000 167772160 0x0A000000

    11.0.0.0 -1962934272 0x8B000000 184549376 0x0B000000

    12.0.0.0 -1946157056 0x8C000000 201326592 0x0C000000

    13.0.0.0 -1929379840 0x8D000000 218103808 0x0D000000

    14.0.0.0 -1912602624 0x8E000000 234881024 0x0E000000

    15.0.0.0 -1895825408 0x8F000000 251658240 0x0F000000

    16.0.0.0 -1879048192 0x90000000 268435456 0x10000000

    17.0.0.0 -1862270976 0x91000000 285212672 0x11000000

    18.0.0.0 -1845493760 0x92000000 301989888 0x12000000

    19.0.0.0 -1828716544 0x93000000 318767104 0x13000000

    20.0.0.0 -1811939328 0x94000000 335544320 0x14000000

    21.0.0.0 -1795162112 0x95000000 352321536 0x15000000

    22.0.0.0 -1778384896 0x96000000 369098752 0x16000000

    23.0.0.0 -1761607680 0x97000000 385875968 0x17000000

    24.0.0.0 -1744830464 0x98000000 402653184 0x18000000

    25.0.0.0 -1728053248 0x99000000 419430400 0x19000000

    26.0.0.0 -1711276032 0x9A000000 436207616 0x1A000000

    27.0.0.0 -1694498816 0x9B000000 452984832 0x1B000000

    28.0.0.0 -1677721600 0x9C000000 469762048 0x1C000000

    29.0.0.0 -1660944384 0x9D000000 486539264 0x1D000000

    30.0.0.0 -1644167168 0x9E000000 503316480 0x1E000000

    31.0.0.0 -1627389952 0x9F000000 520093696 0x1F000000

    32.0.0.0 -1610612736 0xA0000000 536870912 0x20000000

    33.0.0.0 -1593835520 0xA1000000 553648128 0x21000000

    34.0.0.0 -1577058304 0xA2000000 570425344 0x22000000

    35.0.0.0 -1560281088 0xA3000000 587202560 0x23000000

    36.0.0.0 -1543503872 0xA4000000 603979776 0x24000000

    37.0.0.0 -1526726656 0xA5000000 620756992 0x25000000

    38.0.0.0 -1509949440 0xA6000000 637534208 0x26000000

    39.0.0.0 -1493172224 0xA7000000 654311424 0x27000000

    40.0.0.0 -1476395008 0xA8000000 671088640 0x28000000

    41.0.0.0 -1459617792 0xA9000000 687865856 0x29000000

    42.0.0.0 -1442840576 0xAA000000 704643072 0x2A000000

    43.0.0.0 -1426063360 0xAB000000 721420288 0x2B000000

    44.0.0.0 -1409286144 0xAC000000 738197504 0x2C000000

    45.0.0.0 -1392508928 0xAD000000 754974720 0x2D000000

    46.0.0.0 -1375731712 0xAE000000 771751936 0x2E000000

    47.0.0.0 -1358954496 0xAF000000 788529152 0x2F000000

    48.0.0.0 -1342177280 0xB0000000 805306368 0x30000000

    49.0.0.0 -1325400064 0xB1000000 822083584 0x31000000

    50.0.0.0 -1308622848 0xB2000000 838860800 0x32000000

    51.0.0.0 -1291845632 0xB3000000 855638016 0x33000000

    52.0.0.0 -1275068416 0xB4000000 872415232 0x34000000

    53.0.0.0 -1258291200 0xB5000000 889192448 0x35000000

    54.0.0.0 -1241513984 0xB6000000 905969664 0x36000000

    55.0.0.0 -1224736768 0xB7000000 922746880 0x37000000

    56.0.0.0 -1207959552 0xB8000000 939524096 0x38000000

    57.0.0.0 -1191182336 0xB9000000 956301312 0x39000000

    58.0.0.0 -1174405120 0xBA000000 973078528 0x3A000000

    59.0.0.0 -1157627904 0xBB000000 989855744 0x3B000000

    60.0.0.0 -1140850688 0xBC000000 1006632960 0x3C000000

    61.0.0.0 -1124073472 0xBD000000 1023410176 0x3D000000

    62.0.0.0 -1107296256 0xBE000000 1040187392 0x3E000000

    63.0.0.0 -1090519040 0xBF000000 1056964608 0x3F000000

    64.0.0.0 -1073741824 0xC0000000 1073741824 0x40000000

    65.0.0.0 -1056964608 0xC1000000 1090519040 0x41000000

    66.0.0.0 -1040187392 0xC2000000 1107296256 0x42000000

    67.0.0.0 -1023410176 0xC3000000 1124073472 0x43000000

    68.0.0.0 -1006632960 0xC4000000 1140850688 0x44000000

    69.0.0.0 -989855744 0xC5000000 1157627904 0x45000000

    70.0.0.0 -973078528 0xC6000000 1174405120 0x46000000

    71.0.0.0 -956301312 0xC7000000 1191182336 0x47000000

    72.0.0.0 -939524096 0xC8000000 1207959552 0x48000000

    73.0.0.0 -922746880 0xC9000000 1224736768 0x49000000

    74.0.0.0 -905969664 0xCA000000 1241513984 0x4A000000

    75.0.0.0 -889192448 0xCB000000 1258291200 0x4B000000

    76.0.0.0 -872415232 0xCC000000 1275068416 0x4C000000

    77.0.0.0 -855638016 0xCD000000 1291845632 0x4D000000

    78.0.0.0 -838860800 0xCE000000 1308622848 0x4E000000

    79.0.0.0 -822083584 0xCF000000 1325400064 0x4F000000

    80.0.0.0 -805306368 0xD0000000 1342177280 0x50000000

    81.0.0.0 -788529152 0xD1000000 1358954496 0x51000000

    82.0.0.0 -771751936 0xD2000000 1375731712 0x52000000

    83.0.0.0 -754974720 0xD3000000 1392508928 0x53000000

    84.0.0.0 -738197504 0xD4000000 1409286144 0x54000000

    85.0.0.0 -721420288 0xD5000000 1426063360 0x55000000

    86.0.0.0 -704643072 0xD6000000 1442840576 0x56000000

    87.0.0.0 -687865856 0xD7000000 1459617792 0x57000000

    88.0.0.0 -671088640 0xD8000000 1476395008 0x58000000

    89.0.0.0 -654311424 0xD9000000 1493172224 0x59000000

    90.0.0.0 -637534208 0xDA000000 1509949440 0x5A000000

    91.0.0.0 -620756992 0xDB000000 1526726656 0x5B000000

    92.0.0.0 -603979776 0xDC000000 1543503872 0x5C000000

    93.0.0.0 -587202560 0xDD000000 1560281088 0x5D000000

    94.0.0.0 -570425344 0xDE000000 1577058304 0x5E000000

    95.0.0.0 -553648128 0xDF000000 1593835520 0x5F000000

    96.0.0.0 -536870912 0xE0000000 1610612736 0x60000000

    97.0.0.0 -520093696 0xE1000000 1627389952 0x61000000

    98.0.0.0 -503316480 0xE2000000 1644167168 0x62000000

    99.0.0.0 -486539264 0xE3000000 1660944384 0x63000000

    100.0.0.0 -469762048 0xE4000000 1677721600 0x64000000

    101.0.0.0 -452984832 0xE5000000 1694498816 0x65000000

    102.0.0.0 -436207616 0xE6000000 1711276032 0x66000000

    103.0.0.0 -419430400 0xE7000000 1728053248 0x67000000

    104.0.0.0 -402653184 0xE8000000 1744830464 0x68000000

    105.0.0.0 -385875968 0xE9000000 1761607680 0x69000000

    106.0.0.0 -369098752 0xEA000000 1778384896 0x6A000000

    107.0.0.0 -352321536 0xEB000000 1795162112 0x6B000000

    108.0.0.0 -335544320 0xEC000000 1811939328 0x6C000000

    109.0.0.0 -318767104 0xED000000 1828716544 0x6D000000

    110.0.0.0 -301989888 0xEE000000 1845493760 0x6E000000

    111.0.0.0 -285212672 0xEF000000 1862270976 0x6F000000

    112.0.0.0 -268435456 0xF0000000 1879048192 0x70000000

    113.0.0.0 -251658240 0xF1000000 1895825408 0x71000000

    114.0.0.0 -234881024 0xF2000000 1912602624 0x72000000

    115.0.0.0 -218103808 0xF3000000 1929379840 0x73000000

    116.0.0.0 -201326592 0xF4000000 1946157056 0x74000000

    117.0.0.0 -184549376 0xF5000000 1962934272 0x75000000

    118.0.0.0 -167772160 0xF6000000 1979711488 0x76000000

    119.0.0.0 -150994944 0xF7000000 1996488704 0x77000000

    120.0.0.0 -134217728 0xF8000000 2013265920 0x78000000

    121.0.0.0 -117440512 0xF9000000 2030043136 0x79000000

    122.0.0.0 -100663296 0xFA000000 2046820352 0x7A000000

    123.0.0.0 -83886080 0xFB000000 2063597568 0x7B000000

    124.0.0.0 -67108864 0xFC000000 2080374784 0x7C000000

    125.0.0.0 -50331648 0xFD000000 2097152000 0x7D000000

    126.0.0.0 -33554432 0xFE000000 2113929216 0x7E000000

    127.0.0.0 -16777216 0xFF000000 2130706432 0x7F000000

    128.0.0.0 0 0x00000000 -2147483648 0x80000000

    129.0.0.0 16777216 0x01000000 -2130706432 0x81000000

    130.0.0.0 33554432 0x02000000 -2113929216 0x82000000

    131.0.0.0 50331648 0x03000000 -2097152000 0x83000000

    132.0.0.0 67108864 0x04000000 -2080374784 0x84000000

    133.0.0.0 83886080 0x05000000 -2063597568 0x85000000

    134.0.0.0 100663296 0x06000000 -2046820352 0x86000000

    135.0.0.0 117440512 0x07000000 -2030043136 0x87000000

    136.0.0.0 134217728 0x08000000 -2013265920 0x88000000

    137.0.0.0 150994944 0x09000000 -1996488704 0x89000000

    138.0.0.0 167772160 0x0A000000 -1979711488 0x8A000000

    139.0.0.0 184549376 0x0B000000 -1962934272 0x8B000000

    140.0.0.0 201326592 0x0C000000 -1946157056 0x8C000000

    141.0.0.0 218103808 0x0D000000 -1929379840 0x8D000000

    142.0.0.0 234881024 0x0E000000 -1912602624 0x8E000000

    143.0.0.0 251658240 0x0F000000 -1895825408 0x8F000000

    144.0.0.0 268435456 0x10000000 -1879048192 0x90000000

    145.0.0.0 285212672 0x11000000 -1862270976 0x91000000

    146.0.0.0 301989888 0x12000000 -1845493760 0x92000000

    147.0.0.0 318767104 0x13000000 -1828716544 0x93000000

    148.0.0.0 335544320 0x14000000 -1811939328 0x94000000

    149.0.0.0 352321536 0x15000000 -1795162112 0x95000000

    150.0.0.0 369098752 0x16000000 -1778384896 0x96000000

    151.0.0.0 385875968 0x17000000 -1761607680 0x97000000

    152.0.0.0 402653184 0x18000000 -1744830464 0x98000000

    153.0.0.0 419430400 0x19000000 -1728053248 0x99000000

    154.0.0.0 436207616 0x1A000000 -1711276032 0x9A000000

    155.0.0.0 452984832 0x1B000000 -1694498816 0x9B000000

    156.0.0.0 469762048 0x1C000000 -1677721600 0x9C000000

    157.0.0.0 486539264 0x1D000000 -1660944384 0x9D000000

    158.0.0.0 503316480 0x1E000000 -1644167168 0x9E000000

    159.0.0.0 520093696 0x1F000000 -1627389952 0x9F000000

    160.0.0.0 536870912 0x20000000 -1610612736 0xA0000000

    161.0.0.0 553648128 0x21000000 -1593835520 0xA1000000

    162.0.0.0 570425344 0x22000000 -1577058304 0xA2000000

    163.0.0.0 587202560 0x23000000 -1560281088 0xA3000000

    164.0.0.0 603979776 0x24000000 -1543503872 0xA4000000

    165.0.0.0 620756992 0x25000000 -1526726656 0xA5000000

    166.0.0.0 637534208 0x26000000 -1509949440 0xA6000000

    167.0.0.0 654311424 0x27000000 -1493172224 0xA7000000

    168.0.0.0 671088640 0x28000000 -1476395008 0xA8000000

    169.0.0.0 687865856 0x29000000 -1459617792 0xA9000000

    170.0.0.0 704643072 0x2A000000 -1442840576 0xAA000000

    171.0.0.0 721420288 0x2B000000 -1426063360 0xAB000000

    172.0.0.0 738197504 0x2C000000 -1409286144 0xAC000000

    173.0.0.0 754974720 0x2D000000 -1392508928 0xAD000000

    174.0.0.0 771751936 0x2E000000 -1375731712 0xAE000000

    175.0.0.0 788529152 0x2F000000 -1358954496 0xAF000000

    176.0.0.0 805306368 0x30000000 -1342177280 0xB0000000

    177.0.0.0 822083584 0x31000000 -1325400064 0xB1000000

    178.0.0.0 838860800 0x32000000 -1308622848 0xB2000000

    179.0.0.0 855638016 0x33000000 -1291845632 0xB3000000

    180.0.0.0 872415232 0x34000000 -1275068416 0xB4000000

    181.0.0.0 889192448 0x35000000 -1258291200 0xB5000000

    182.0.0.0 905969664 0x36000000 -1241513984 0xB6000000

    183.0.0.0 922746880 0x37000000 -1224736768 0xB7000000

    184.0.0.0 939524096 0x38000000 -1207959552 0xB8000000

    185.0.0.0 956301312 0x39000000 -1191182336 0xB9000000

    186.0.0.0 973078528 0x3A000000 -1174405120 0xBA000000

    187.0.0.0 989855744 0x3B000000 -1157627904 0xBB000000

    188.0.0.0 1006632960 0x3C000000 -1140850688 0xBC000000

    189.0.0.0 1023410176 0x3D000000 -1124073472 0xBD000000

    190.0.0.0 1040187392 0x3E000000 -1107296256 0xBE000000

    191.0.0.0 1056964608 0x3F000000 -1090519040 0xBF000000

    192.0.0.0 1073741824 0x40000000 -1073741824 0xC0000000

    193.0.0.0 1090519040 0x41000000 -1056964608 0xC1000000

    194.0.0.0 1107296256 0x42000000 -1040187392 0xC2000000

    195.0.0.0 1124073472 0x43000000 -1023410176 0xC3000000

    196.0.0.0 1140850688 0x44000000 -1006632960 0xC4000000

    197.0.0.0 1157627904 0x45000000 -989855744 0xC5000000

    198.0.0.0 1174405120 0x46000000 -973078528 0xC6000000

    199.0.0.0 1191182336 0x47000000 -956301312 0xC7000000

    200.0.0.0 1207959552 0x48000000 -939524096 0xC8000000

    201.0.0.0 1224736768 0x49000000 -922746880 0xC9000000

    202.0.0.0 1241513984 0x4A000000 -905969664 0xCA000000

    203.0.0.0 1258291200 0x4B000000 -889192448 0xCB000000

    204.0.0.0 1275068416 0x4C000000 -872415232 0xCC000000

    205.0.0.0 1291845632 0x4D000000 -855638016 0xCD000000

    206.0.0.0 1308622848 0x4E000000 -838860800 0xCE000000

    207.0.0.0 1325400064 0x4F000000 -822083584 0xCF000000

    208.0.0.0 1342177280 0x50000000 -805306368 0xD0000000

    209.0.0.0 1358954496 0x51000000 -788529152 0xD1000000

    210.0.0.0 1375731712 0x52000000 -771751936 0xD2000000

    211.0.0.0 1392508928 0x53000000 -754974720 0xD3000000

    212.0.0.0 1409286144 0x54000000 -738197504 0xD4000000

    213.0.0.0 1426063360 0x55000000 -721420288 0xD5000000

    214.0.0.0 1442840576 0x56000000 -704643072 0xD6000000

    215.0.0.0 1459617792 0x57000000 -687865856 0xD7000000

    216.0.0.0 1476395008 0x58000000 -671088640 0xD8000000

    217.0.0.0 1493172224 0x59000000 -654311424 0xD9000000

    218.0.0.0 1509949440 0x5A000000 -637534208 0xDA000000

    219.0.0.0 1526726656 0x5B000000 -620756992 0xDB000000

    220.0.0.0 1543503872 0x5C000000 -603979776 0xDC000000

    221.0.0.0 1560281088 0x5D000000 -587202560 0xDD000000

    222.0.0.0 1577058304 0x5E000000 -570425344 0xDE000000

    223.0.0.0 1593835520 0x5F000000 -553648128 0xDF000000

    224.0.0.0 1610612736 0x60000000 -536870912 0xE0000000

    225.0.0.0 1627389952 0x61000000 -520093696 0xE1000000

    226.0.0.0 1644167168 0x62000000 -503316480 0xE2000000

    227.0.0.0 1660944384 0x63000000 -486539264 0xE3000000

    228.0.0.0 1677721600 0x64000000 -469762048 0xE4000000

    229.0.0.0 1694498816 0x65000000 -452984832 0xE5000000

    230.0.0.0 1711276032 0x66000000 -436207616 0xE6000000

    231.0.0.0 1728053248 0x67000000 -419430400 0xE7000000

    232.0.0.0 1744830464 0x68000000 -402653184 0xE8000000

    233.0.0.0 1761607680 0x69000000 -385875968 0xE9000000

    234.0.0.0 1778384896 0x6A000000 -369098752 0xEA000000

    235.0.0.0 1795162112 0x6B000000 -352321536 0xEB000000

    236.0.0.0 1811939328 0x6C000000 -335544320 0xEC000000

    237.0.0.0 1828716544 0x6D000000 -318767104 0xED000000

    238.0.0.0 1845493760 0x6E000000 -301989888 0xEE000000

    239.0.0.0 1862270976 0x6F000000 -285212672 0xEF000000

    240.0.0.0 1879048192 0x70000000 -268435456 0xF0000000

    241.0.0.0 1895825408 0x71000000 -251658240 0xF1000000

    242.0.0.0 1912602624 0x72000000 -234881024 0xF2000000

    243.0.0.0 1929379840 0x73000000 -218103808 0xF3000000

    244.0.0.0 1946157056 0x74000000 -201326592 0xF4000000

    245.0.0.0 1962934272 0x75000000 -184549376 0xF5000000

    246.0.0.0 1979711488 0x76000000 -167772160 0xF6000000

    247.0.0.0 1996488704 0x77000000 -150994944 0xF7000000

    248.0.0.0 2013265920 0x78000000 -134217728 0xF8000000

    249.0.0.0 2030043136 0x79000000 -117440512 0xF9000000

    250.0.0.0 2046820352 0x7A000000 -100663296 0xFA000000

    251.0.0.0 2063597568 0x7B000000 -83886080 0xFB000000

    252.0.0.0 2080374784 0x7C000000 -67108864 0xFC000000

    253.0.0.0 2097152000 0x7D000000 -50331648 0xFD000000

    254.0.0.0 2113929216 0x7E000000 -33554432 0xFE000000

    255.0.0.0 2130706432 0x7F000000 -16777216 0xFF000000

    This is particularly important when using the transformation to convert a mask from dotted form to a bit pattern, because if your original transformation is used for the mask the test

    WHERE [IntegerAddress] & @IntegerMask = @IntegerNetwork

    will sometimes give the wrong result: for example it will include 64.1.1.0 in the network <192.1.1.0 mask 255.255.255.255> because @IntegerMask will be positive instead of negative; and economy of code suggests that once you have the right transformation for the mask you may as well use it for all transformations from 3 dot format to int.

    That's correct, which is why I didn't specify an integer mask as it simply wouldn't work.

    Also, I dislike that version of the test for a simple reason: it assumes that the subnet is always specified using the lowest address that it contains. But in practise how the subnet ios specified tends to vary according to where you are in the network - in a subnet with mask M, a machine with address A will tend to view the subnet as <A MASK M> rather than as <A&M MASK M>. That's why I would use a test which checks whether there is any bit position where the mask is 1 and the network and candidate addresses differ, if the subnet information was coming from the outside world as parameters to a stored proc. This could be expresssed (assuming that @IntegerMask doesn't have the top bit flipped) by

    WHERE [IntegerAddress] & @IntegerMask = @IntegerNetwork & @IntegerMask

    as well as by the expression using XOR. Obviously when a subnet is stored in a table it makes a lot of sense to use its lowest address as the network address, but I would prefer not to impose that restriction at the API between the DB and the outside world.

    Absolutely. The point I was trying to make, however, is that each operation you do compounds itself when multiplied over a large number of rows, which is often the case when it comes to IP addresses within a database. If you can do the same validation using two mathematical operations vs three mathematical operations "per row", the impact is still there, even if minimal. At the end of the day, the implementor must decide where to draw the line between the efficiency aquired and the flexibility needed.

    To further illustrate my point, the following tests show five different ways to test for IP addresses belonging to a network (including Clive Chinery's suggestion):

    Variables setup.

    DECLARE @BinaryNetwork binary(4) SET @BinaryNetwork = [dbo].[ParseToBinary]('192.168.1.0')

    DECLARE @BinaryMask binary(4) SET @BinaryMask = [dbo].[ParseToBinary]('255.255.255.240')

    DECLARE @IntegerNetwork int SET @IntegerNetwork = [dbo].[ParseToInt]('192.168.1.0')

    DECLARE @StartIntegerAddress int SET @StartIntegerAddress = @IntegerNetwork + 1

    DECLARE @EndIntegerAddress int SET @EndIntegerAddress = (@IntegerNetwork | CAST(~ CAST(@BinaryMask AS int) AS binary(4))) - 1

    XOR using stored binary addresses.

    SET STATISTICS TIME ON

    SELECT [StringAddress]

    FROM [dbo].[ManyIPAddresses]

    WHERE ((@BinaryNetwork ^ CAST([BinaryAddress] AS int)) & @BinaryMask) = 0

    SET STATISTICS TIME OFF

    SQL Server Execution Times:

    CPU time = 125 ms, elapsed time = 117 ms.

    XOR using stored integer addresses.

    SET STATISTICS TIME ON

    SELECT [StringAddress]

    FROM [dbo].[ManyIPAddresses]

    WHERE ((@IntegerNetwork ^ [IntegerAddress]) & @BinaryMask) = 0

    SET STATISTICS TIME OFF

    SQL Server Execution Times:

    CPU time = 109 ms, elapsed time = 108 ms.

    AND using stored integer addresses.

    SET STATISTICS TIME ON

    SELECT [StringAddress]

    FROM [dbo].[ManyIPAddresses]

    WHERE [IntegerAddress] & @BinaryMask = @IntegerNetwork

    SET STATISTICS TIME OFF

    SQL Server Execution Times:

    CPU time = 94 ms, elapsed time = 97 ms.

    BETWEEN using stored integer addresses.

    SET STATISTICS TIME ON

    SELECT [StringAddress]

    FROM [dbo].[ManyIPAddresses]

    WHERE [IntegerAddress] BETWEEN @StartIntegerAddress AND @EndIntegerAddress

    SET STATISTICS TIME OFF

    SQL Server Execution Times:

    CPU time = 63 ms, elapsed time = 73 ms.

    Greater then, Lesser then comparison using integer addresses.

    SET STATISTICS TIME ON

    SELECT [StringAddress]

    FROM [dbo].[ManyIPAddresses]

    WHERE [IntegerAddress] >= @StartIntegerAddress AND [IntegerAddress] <= @EndIntegerAddress

    SET STATISTICS TIME OFF

    SQL Server Execution Times:

    CPU time = 62 ms, elapsed time = 73 ms.

    Straight integer filtering is at a very clear advantage. I am undecided as to which method of integer translation is better (if there's such a thing as one being better then the other). Each of the variations offer advantages and disadvantages in equal proportions it seems.

Viewing 11 posts - 91 through 101 (of 101 total)

You must be logged in to reply to this topic. Login to reply