Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 123»»»

Storing IPs in SQL Server Expand / Collapse
Author
Message
Posted Thursday, February 15, 2007 4:42 PM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Monday, July 28, 2014 3:45 PM
Points: 2,892, Visits: 1,784
Comments posted here are about the content posted at http://www.sqlservercentral.com/columnists/dpoole/2871.asp

LinkedIn Profile
Newbie on www.simple-talk.com
Post #345342
Posted Monday, April 2, 2007 12:01 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Monday, April 9, 2007 2:17 PM
Points: 1, Visits: 1
You missed one obvious way to store IP addresses, which is both the smallest data size and yet still easy to interpret as an IP address -- binary(4).

127.0.0.1 becomes 0x7F000001

I wrote the following functions, and hereby declare them into the public domain:


create function dbo.ConvertIPAddressToBinary (@IPAddress varchar(15))
returns binary(4)
as
begin
declare @Octet1 binary(1), @Octet2 binary(1), @Octet3 binary(1), @Octet4 binary(1);

select @Octet1 = cast(cast(substring(@IPAddress, 1, charindex('.', @IPAddress) - 1) as tinyint) as binary(1));
select @IPAddress = substring(@IPAddress, charindex('.', @IPAddress) + 1, len(@IPAddress));
select @Octet2 = cast(cast(substring(@IPAddress, 1, charindex('.', @IPAddress) - 1) as tinyint) as binary(1));
select @IPAddress = substring(@IPAddress, charindex('.', @IPAddress) + 1, len(@IPAddress));
select @Octet3 = cast(cast(substring(@IPAddress, 1, charindex('.', @IPAddress) - 1) as tinyint) as binary(1));
select @IPAddress = substring(@IPAddress, charindex('.', @IPAddress) + 1, len(@IPAddress));
select @Octet4 = cast(cast(@IPAddress as tinyint) as binary(1));

return @Octet1 + @Octet2 + @Octet3 + @Octet4;
end
go

create function dbo.ConvertBinaryToIPAddress (@IPAddress binary(4))
returns varchar(15)
as
begin
declare @BigIP bigint;
declare @Octet1 tinyint, @Octet2 tinyint, @Octet3 tinyint, @Octet4 tinyint;

select @BigIP = cast(@IPAddress as bigint);

select @Octet4 = @BigIP % 256;
select @BigIP = @BigIP / 256;
select @Octet3 = @BigIP % 256;
select @BigIP = @BigIP / 256;
select @Octet2 = @BigIP % 256;
select @BigIP = @BigIP / 256;
select @Octet1 = @BigIP % 256;

return cast(@Octet1 as varchar) + '.' + cast(@Octet2 as varchar) + '.' + cast(@Octet3 as varchar) + '.' + cast(@Octet4 as varchar);
end
go
Post #355274
Posted Monday, April 2, 2007 12:34 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Monday, April 14, 2014 1:51 AM
Points: 14, Visits: 43
Just a side comment. Depending on the purpose and expected lifetime of the application, you might have to deal with IPv6 addresses (http://en.wikipedia.org/wiki/Ipv6). In that case, you'd have to consider larger data types. If you wanted to stick to built-in types, uniqueidentifier could be an option (not very human readable, but atomic)-or you could go for binary(16) (or maybe varbinary(16) thinking also of ipv4) as implied by earlier post.


Post #355277
Posted Monday, April 2, 2007 3:16 AM
Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: 2 days ago @ 3:07 AM
Points: 1,060, Visits: 876
Surely your choice of index for the large range could have been better?  Yes the highly specific reversed order may be better for a particular lookup, but for a ranged lookup over all 4 levels surely an index in order would be better?  After all, in this example you will need to go through ALL of the top 3 levels before you acquire any level of usable index?  Seems pretty inefficient to me.


Post #355298
Posted Monday, April 2, 2007 7:20 AM
SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: Moderators
Last Login: Yesterday @ 7:03 AM
Points: 6,779, Visits: 1,866
Interesting. I'd be curious to see the query plan on the bigint implementation, and a link to the UDT implementation if anyone has one (though I agree the overhead in storage is substantial).

Andy
SQLAndy - My Blog!
Connect with me on LinkedIn
Follow me on Twitter
Post #355334
Posted Monday, April 2, 2007 8:14 AM


SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Thursday, February 6, 2014 12:59 PM
Points: 801, Visits: 1,962

Good work, David.

I agree with the comments about IPv6 though.  Easily expanded to 6.  Our need is to validate IP's and to count the number of distinct rows over time for licensing.  Some of these techniques could be quite helpful speeding performance.



ATB

Charles Kincaid

Post #355355
Posted Monday, April 2, 2007 9:12 AM
SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Tuesday, May 6, 2014 5:51 AM
Points: 6,266, Visits: 2,028
I have seen this approach before and it *is* quite fast but when you try to make the returned values into readable format is when you get hit *hard* by performance.



* Noel
Post #355384
Posted Monday, April 2, 2007 9:47 AM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Yesterday @ 2:44 PM
Points: 1,945, Visits: 2,862
The four octets are atomic, but not scalar. This is a subtle point that people miss -- the data element versus the reprsentation of the data element. Same thing for (longitude, latitude) pairs, (x,y,z) co-ordinates, etc.

Atomic means that if you cut out a part, the meaning of the whole is lost. Scalar means that it is a unit of measurement on some scale expressed as a single dimennsional value.

Also, I don't mind TINYINT. But I want to declare as

CREATE DOMAIN TinyInt
AS INTEGER DEFAULT 0 CHECK (VALUE BETWEEN 0 AND 255);

BIGINT is in the SQL-200x stuff, but I wish we had used the old Algol convenmtion of the key LONG, so we could write "LONG INTEGER", "LONG LONG INTEGER", etc. for user defined precision

I have to walk a dog now, but the complex logic in the range comparison example should be reducable with CASE expressions, something like this

CASE WHEN ip1 NOT BETWEEN low_ip1 AND high_ip1
THEN 'F'
ELSE CASE WHEN ip2 NOT BETWEEN low_ip2 AND high_ip2
THEN 'F' ..
ELSE'T' END = 'T'

Anyone want to finish this and test it?


Books in Celko Series for Morgan-Kaufmann Publishing
Analytics and OLAP in SQL
Data and Databases: Concepts in Practice
Data, Measurements and Standards in SQL
SQL for Smarties
SQL Programming Style
SQL Puzzles and Answers
Thinking in Sets
Trees and Hierarchies in SQL
Post #355396
Posted Monday, April 2, 2007 10:10 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Thursday, November 1, 2012 1:18 PM
Points: 110, Visits: 261
I'm not sure I see the need to have the data be human readable. I guess it depends on the specifics of your use case.

My assumption is that the index lookup would be best with int or binary(4) rather than having the index over 4 columns. Each comparison would be very direct this way. Not sure if this small perf gain is worth much though, depends on the circumstances. I have about 300MM rows of web logs (recent data only, I keep a rolling window of data) in the DB at the moment, so it might add up.. but I very rarely need to analyze by IP. When I do, it is always after already filtering based on a time window.

I would think that a good solution might be to just use int / binary(4) for indexing your IP and have a non-persisted/indexed computed column that has your human readable value. You would want simple mapping functions with most of these solutions, so just use your mapping function in the computed column for human readable. The major downside clearly is if you routinely want to render a large dataset of IPs rather than just filter or aggregate data on them.

I just use varchar(15) at the moment because it makes it dead simple to BCP in the IIS logs. I care very little about disk space because this is data I exclude from my backup regimen.

It's always good to know how you actually use your data for choosing the best strategy.
Post #355401
Posted Monday, April 2, 2007 10:14 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Monday, July 28, 2014 3:45 PM
Points: 2,892, Visits: 1,784
Joe,

How do you feel about the naming of the integer types?

When I started programming integer meant 16 bit, long integer meant 32 bit. Not everything has shifted up a gear (no pun intended).

I notice that the recommendations for .NET development say that the developer should used the explicit types such as int32 and int64.

Calling 64 bit integers BIGINT strikes me as the equivalent of having giant squid, collosal squid and whatever name they'll think of for the next size squid.

Does this mean that when we move up to 128 bit computing we are going to see HUGEINT, ENORMOUSINT etc?


LinkedIn Profile
Newbie on www.simple-talk.com
Post #355404
« Prev Topic | Next Topic »

Add to briefcase 123»»»

Permissions Expand / Collapse