|
|
|
SSCrazy
      
Group: General Forum Members
Last Login: Yesterday @ 8:46 AM
Points: 2,750,
Visits: 1,410
|
|
|
|
|
|
Forum Newbie
      
Group: General Forum Members
Last Login: Monday, April 09, 2007 2:17 PM
Points: 1,
Visits: 1
|
|
You missed one obvious way to store IP addresses, which is both the smallest data size and yet still easy to interpret as an IP address -- binary(4).
127.0.0.1 becomes 0x7F000001
I wrote the following functions, and hereby declare them into the public domain:
create function dbo.ConvertIPAddressToBinary (@IPAddress varchar(15)) returns binary(4) as begin declare @Octet1 binary(1), @Octet2 binary(1), @Octet3 binary(1), @Octet4 binary(1);
select @Octet1 = cast(cast(substring(@IPAddress, 1, charindex('.', @IPAddress) - 1) as tinyint) as binary(1)); select @IPAddress = substring(@IPAddress, charindex('.', @IPAddress) + 1, len(@IPAddress)); select @Octet2 = cast(cast(substring(@IPAddress, 1, charindex('.', @IPAddress) - 1) as tinyint) as binary(1)); select @IPAddress = substring(@IPAddress, charindex('.', @IPAddress) + 1, len(@IPAddress)); select @Octet3 = cast(cast(substring(@IPAddress, 1, charindex('.', @IPAddress) - 1) as tinyint) as binary(1)); select @IPAddress = substring(@IPAddress, charindex('.', @IPAddress) + 1, len(@IPAddress)); select @Octet4 = cast(cast(@IPAddress as tinyint) as binary(1));
return @Octet1 + @Octet2 + @Octet3 + @Octet4; end go
create function dbo.ConvertBinaryToIPAddress (@IPAddress binary(4)) returns varchar(15) as begin declare @BigIP bigint; declare @Octet1 tinyint, @Octet2 tinyint, @Octet3 tinyint, @Octet4 tinyint; select @BigIP = cast(@IPAddress as bigint);
select @Octet4 = @BigIP % 256; select @BigIP = @BigIP / 256; select @Octet3 = @BigIP % 256; select @BigIP = @BigIP / 256; select @Octet2 = @BigIP % 256; select @BigIP = @BigIP / 256; select @Octet1 = @BigIP % 256; return cast(@Octet1 as varchar) + '.' + cast(@Octet2 as varchar) + '.' + cast(@Octet3 as varchar) + '.' + cast(@Octet4 as varchar); end go
|
|
|
|
|
Grasshopper
      
Group: General Forum Members
Last Login: Tuesday, January 17, 2012 10:37 PM
Points: 14,
Visits: 42
|
|
Just a side comment. Depending on the purpose and expected lifetime of the application, you might have to deal with IPv6 addresses (http://en.wikipedia.org/wiki/Ipv6). In that case, you'd have to consider larger data types. If you wanted to stick to built-in types, uniqueidentifier could be an option (not very human readable, but atomic)-or you could go for binary(16) (or maybe varbinary(16) thinking also of ipv4) as implied by earlier post.
|
|
|
|
|
Ten Centuries
      
Group: General Forum Members
Last Login: Yesterday @ 4:24 AM
Points: 1,026,
Visits: 752
|
|
Surely your choice of index for the large range could have been better? Yes the highly specific reversed order may be better for a particular lookup, but for a ranged lookup over all 4 levels surely an index in order would be better? After all, in this example you will need to go through ALL of the top 3 levels before you acquire any level of usable index? Seems pretty inefficient to me.
|
|
|
|
|
SSCertifiable
       
Group: Moderators
Last Login: Thursday, May 09, 2013 12:38 PM
Points: 6,462,
Visits: 1,384
|
|
|
|
|
|
Right there with Babe
      
Group: General Forum Members
Last Login: Friday, May 17, 2013 9:18 AM
Points: 772,
Visits: 1,825
|
|
Good work, David. I agree with the comments about IPv6 though. Easily expanded to 6. Our need is to validate IP's and to count the number of distinct rows over time for licensing. Some of these techniques could be quite helpful speeding performance.
ATB
Charles Kincaid
|
|
|
|
|
SSCertifiable
       
Group: General Forum Members
Last Login: Tuesday, May 14, 2013 4:39 PM
Points: 6,260,
Visits: 1,977
|
|
I have seen this approach before and it *is* quite fast but when you try to make the returned values into readable format is when you get hit *hard* by performance.
* Noel
|
|
|
|
|
SSCommitted
      
Group: General Forum Members
Last Login: Tuesday, January 15, 2013 11:11 AM
Points: 1,945,
Visits: 2,782
|
|
The four octets are atomic, but not scalar. This is a subtle point that people miss -- the data element versus the reprsentation of the data element. Same thing for (longitude, latitude) pairs, (x,y,z) co-ordinates, etc.
Atomic means that if you cut out a part, the meaning of the whole is lost. Scalar means that it is a unit of measurement on some scale expressed as a single dimennsional value.
Also, I don't mind TINYINT. But I want to declare as
CREATE DOMAIN TinyInt AS INTEGER DEFAULT 0 CHECK (VALUE BETWEEN 0 AND 255);
BIGINT is in the SQL-200x stuff, but I wish we had used the old Algol convenmtion of the key LONG, so we could write "LONG INTEGER", "LONG LONG INTEGER", etc. for user defined precision
I have to walk a dog now, but the complex logic in the range comparison example should be reducable with CASE expressions, something like this
CASE WHEN ip1 NOT BETWEEN low_ip1 AND high_ip1 THEN 'F' ELSE CASE WHEN ip2 NOT BETWEEN low_ip2 AND high_ip2 THEN 'F' .. ELSE'T' END = 'T'
Anyone want to finish this and test it?
Books in Celko Series for Morgan-Kaufmann Publishing Analytics and OLAP in SQL Data and Databases: Concepts in Practice Data, Measurements and Standards in SQL SQL for Smarties SQL Programming Style SQL Puzzles and Answers Thinking in Sets Trees and Hierarchies in SQL
|
|
|
|
|
SSC-Enthusiastic
      
Group: General Forum Members
Last Login: Thursday, November 01, 2012 1:18 PM
Points: 110,
Visits: 261
|
|
I'm not sure I see the need to have the data be human readable. I guess it depends on the specifics of your use case.
My assumption is that the index lookup would be best with int or binary(4) rather than having the index over 4 columns. Each comparison would be very direct this way. Not sure if this small perf gain is worth much though, depends on the circumstances. I have about 300MM rows of web logs (recent data only, I keep a rolling window of data) in the DB at the moment, so it might add up.. but I very rarely need to analyze by IP. When I do, it is always after already filtering based on a time window.
I would think that a good solution might be to just use int / binary(4) for indexing your IP and have a non-persisted/indexed computed column that has your human readable value. You would want simple mapping functions with most of these solutions, so just use your mapping function in the computed column for human readable. The major downside clearly is if you routinely want to render a large dataset of IPs rather than just filter or aggregate data on them.
I just use varchar(15) at the moment because it makes it dead simple to BCP in the IIS logs. I care very little about disk space because this is data I exclude from my backup regimen.
It's always good to know how you actually use your data for choosing the best strategy.
|
|
|
|
|
SSCrazy
      
Group: General Forum Members
Last Login: Yesterday @ 8:46 AM
Points: 2,750,
Visits: 1,410
|
|
Joe,
How do you feel about the naming of the integer types?
When I started programming integer meant 16 bit, long integer meant 32 bit. Not everything has shifted up a gear (no pun intended).
I notice that the recommendations for .NET development say that the developer should used the explicit types such as int32 and int64.
Calling 64 bit integers BIGINT strikes me as the equivalent of having giant squid, collosal squid and whatever name they'll think of for the next size squid.
Does this mean that when we move up to 128 bit computing we are going to see HUGEINT, ENORMOUSINT etc?
LinkedIn Profile
|
|
|
|