Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 123»»»

Storing IPs in SQL Server Expand / Collapse
Author
Message
Posted Thursday, February 15, 2007 4:42 PM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Today @ 2:22 PM
Points: 2,978, Visits: 1,986
Comments posted here are about the content posted at http://www.sqlservercentral.com/columnists/dpoole/2871.asp

LinkedIn Profile
Newbie on www.simple-talk.com
Post #345342
Posted Monday, April 2, 2007 12:01 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Monday, April 9, 2007 2:17 PM
Points: 1, Visits: 1
You missed one obvious way to store IP addresses, which is both the smallest data size and yet still easy to interpret as an IP address -- binary(4).

127.0.0.1 becomes 0x7F000001

I wrote the following functions, and hereby declare them into the public domain:


create function dbo.ConvertIPAddressToBinary (@IPAddress varchar(15))
returns binary(4)
as
begin
declare @Octet1 binary(1), @Octet2 binary(1), @Octet3 binary(1), @Octet4 binary(1);

select @Octet1 = cast(cast(substring(@IPAddress, 1, charindex('.', @IPAddress) - 1) as tinyint) as binary(1));
select @IPAddress = substring(@IPAddress, charindex('.', @IPAddress) + 1, len(@IPAddress));
select @Octet2 = cast(cast(substring(@IPAddress, 1, charindex('.', @IPAddress) - 1) as tinyint) as binary(1));
select @IPAddress = substring(@IPAddress, charindex('.', @IPAddress) + 1, len(@IPAddress));
select @Octet3 = cast(cast(substring(@IPAddress, 1, charindex('.', @IPAddress) - 1) as tinyint) as binary(1));
select @IPAddress = substring(@IPAddress, charindex('.', @IPAddress) + 1, len(@IPAddress));
select @Octet4 = cast(cast(@IPAddress as tinyint) as binary(1));

return @Octet1 + @Octet2 + @Octet3 + @Octet4;
end
go

create function dbo.ConvertBinaryToIPAddress (@IPAddress binary(4))
returns varchar(15)
as
begin
declare @BigIP bigint;
declare @Octet1 tinyint, @Octet2 tinyint, @Octet3 tinyint, @Octet4 tinyint;

select @BigIP = cast(@IPAddress as bigint);

select @Octet4 = @BigIP % 256;
select @BigIP = @BigIP / 256;
select @Octet3 = @BigIP % 256;
select @BigIP = @BigIP / 256;
select @Octet2 = @BigIP % 256;
select @BigIP = @BigIP / 256;
select @Octet1 = @BigIP % 256;

return cast(@Octet1 as varchar) + '.' + cast(@Octet2 as varchar) + '.' + cast(@Octet3 as varchar) + '.' + cast(@Octet4 as varchar);
end
go
Post #355274
Posted Monday, April 2, 2007 12:34 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Monday, April 14, 2014 1:51 AM
Points: 14, Visits: 43
Just a side comment. Depending on the purpose and expected lifetime of the application, you might have to deal with IPv6 addresses (http://en.wikipedia.org/wiki/Ipv6). In that case, you'd have to consider larger data types. If you wanted to stick to built-in types, uniqueidentifier could be an option (not very human readable, but atomic)-or you could go for binary(16) (or maybe varbinary(16) thinking also of ipv4) as implied by earlier post.


Post #355277
Posted Monday, April 2, 2007 3:16 AM
Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Monday, June 29, 2015 11:58 PM
Points: 1,077, Visits: 944
Surely your choice of index for the large range could have been better?  Yes the highly specific reversed order may be better for a particular lookup, but for a ranged lookup over all 4 levels surely an index in order would be better?  After all, in this example you will need to go through ALL of the top 3 levels before you acquire any level of usable index?  Seems pretty inefficient to me.


Post #355298
Posted Monday, April 2, 2007 7:20 AM
SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: Moderators
Last Login: Today @ 8:48 AM
Points: 6,899, Visits: 2,151
Interesting. I'd be curious to see the query plan on the bigint implementation, and a link to the UDT implementation if anyone has one (though I agree the overhead in storage is substantial).

Andy
SQLAndy - My Blog!
Connect with me on LinkedIn
Follow me on Twitter
Post #355334
Posted Monday, April 2, 2007 8:14 AM


SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: 2 days ago @ 4:35 PM
Points: 929, Visits: 2,279

Good work, David.

I agree with the comments about IPv6 though.  Easily expanded to 6.  Our need is to validate IP's and to count the number of distinct rows over time for licensing.  Some of these techniques could be quite helpful speeding performance.



ATB

Charles Kincaid

Post #355355
Posted Monday, April 2, 2007 9:12 AM
SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Friday, November 21, 2014 6:34 AM
Points: 6,259, Visits: 2,031
I have seen this approach before and it *is* quite fast but when you try to make the returned values into readable format is when you get hit *hard* by performance.



* Noel
Post #355384
Posted Monday, April 2, 2007 10:10 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Thursday, November 1, 2012 1:18 PM
Points: 110, Visits: 261
I'm not sure I see the need to have the data be human readable. I guess it depends on the specifics of your use case.

My assumption is that the index lookup would be best with int or binary(4) rather than having the index over 4 columns. Each comparison would be very direct this way. Not sure if this small perf gain is worth much though, depends on the circumstances. I have about 300MM rows of web logs (recent data only, I keep a rolling window of data) in the DB at the moment, so it might add up.. but I very rarely need to analyze by IP. When I do, it is always after already filtering based on a time window.

I would think that a good solution might be to just use int / binary(4) for indexing your IP and have a non-persisted/indexed computed column that has your human readable value. You would want simple mapping functions with most of these solutions, so just use your mapping function in the computed column for human readable. The major downside clearly is if you routinely want to render a large dataset of IPs rather than just filter or aggregate data on them.

I just use varchar(15) at the moment because it makes it dead simple to BCP in the IIS logs. I care very little about disk space because this is data I exclude from my backup regimen.

It's always good to know how you actually use your data for choosing the best strategy.
Post #355401
Posted Monday, April 2, 2007 10:14 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Today @ 2:22 PM
Points: 2,978, Visits: 1,986
Joe,

How do you feel about the naming of the integer types?

When I started programming integer meant 16 bit, long integer meant 32 bit. Not everything has shifted up a gear (no pun intended).

I notice that the recommendations for .NET development say that the developer should used the explicit types such as int32 and int64.

Calling 64 bit integers BIGINT strikes me as the equivalent of having giant squid, collosal squid and whatever name they'll think of for the next size squid.

Does this mean that when we move up to 128 bit computing we are going to see HUGEINT, ENORMOUSINT etc?


LinkedIn Profile
Newbie on www.simple-talk.com
Post #355404
Posted Monday, April 2, 2007 11:58 AM


SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Friday, June 5, 2015 1:27 PM
Points: 30, Visits: 118

I think there has been a dangerous tendency to make things human readable, and I would argue that storage and retrieval should be optimized for the computer that is doing the job. Generating a report for humans would be useless with a million rows, and I have found that that the processing cost in rendering reports for human consumption has not been a bottleneck, the key is in applying general logic and business rules with explicit test cases, rather than the more common approach of applying eyeballs to long lists to figure out what was missed. since the libraries to manipulate IP addresses in Binary are readily available, I would stay away from char based approaches. I would also avoid the conversion to tinyint, because this is a transformation of the bytes and therefore should require and extra processing step for logical manipulation. If text based logs are the source for such data, why not turn them off, and write a direct Binary file for orders of magnitude faster ETL?

and following Mr Poole, With Intel's new 80 core processors will we have OCTDECAint?

Cheers,

John R. Hanson

Post #355431
« Prev Topic | Next Topic »

Add to briefcase 123»»»

Permissions Expand / Collapse