Indexing URLs and eMail address

Question

Post reply

Indexing URLs and eMail address

AeroNC

Old Hand

Points: 314
More actions
November 29, 2009 at 8:20 am

#228185

What is the best way to index on URLS's and email address? I looked into creating a CheckSum column but also read that problems may result from some characters found in domain names. Does SQL 2008 provide a better way of doing this?
Thanks!

Viewing 7 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 1

RGDavis396 (11/29/2009)
What is the best way to index on URLS's and email address?

Same way you'd index anything else. Normal index on the column. Is there some reason you don't want to do this?

I looked into creating a CheckSum column but also read that problems may result from some characters found in domain names.

I suppose you could do a checksum (or hash) then index and filter on that. Question is, why go to all the added complexity just to index a string column. Are there some considerations that you haven't mentioned?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

AeroNC Old Hand Points: 314 More actions · Answer 2

Searches against string values tend to be slower than those against numerical values and we are looking for ways to speed up searches against commonly used fields such as an email address.

steveb. SSC-Forever Points: 46733 More actions · Answer 3

RGDavis396 (11/30/2009)
Searches against string values tend to be slower than those against numerical values and we are looking for ways to speed up searches against commonly used fields such as an email address.

Adding an index will speed up the search Email address are usually not that long so you should still get good results. I wouldn't bother using Checksum coumn.

If your URLS are really long then you could possibly use Full-text indexing, but this does have a lot of limitations and is better at searching specific phrases in very large strings such as a specific word in a document.

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 4

RGDavis396 (11/30/2009)
Searches against string values tend to be slower than those against numerical values

Have you tested against large volumes of data under load? If so, are you seeing a measurable, significant performance difference? Is that performance difference significant enough to go to extreme lengths to solve it?

I did a quick test on the Adventureworks database, this is what I got over 10 runs (test code follows)

Runs

1: ID = 18 ms, Email = 22 ms

2: ID = 17 ms, Email = 29 ms

3: ID = 18 ms, Email = 24 ms

4: ID = 17 ms, Email = 25 ms

5: ID = 17 ms, Email = 24 ms

6: ID = 19 ms, Email = 23 ms

7: ID = 20 ms, Email = 22 ms

8: ID = 17 ms, Email = 28 ms

9: ID = 17 ms, Email = 24 ms

10: ID = 18 ms, Email = 24 ms

So a max of 12ms difference over 1000 executions, 0.012ms difference. You'd probably spend more than that in calculating the checksum/hash

Wasn't a completely fair test, the seek by ID is on the unique clustered index, I made the NC on the email column unique to make things a bit more even. Should have created a unique id column with NC index to be completely fair

DECLARE @id INT, @Email VARCHAR(5000)

SET NOCOUNT ON

DECLARE @i INT =0

DECLARE @StartTime DATETIME2

SET @StartTime = SYSDATETIME()

WHILE (@i<1000)

BEGIN

SELECT @id = BusinessEntityID, @Email = EmailAddress

FROM person.EmailAddress

WHERE BusinessEntityID = 42

SET @i = @i+1

END

PRINT ' By ID. Time elapsed ' + CAST(DATEDIFF(ms, @StartTime, SYSDATETIME()) AS VARCHAR(20))

GO

DECLARE @id INT, @Email VARCHAR(5000)

SET NOCOUNT ON

DECLARE @i INT =0

DECLARE @StartTime DATETIME2

SET @StartTime = SYSDATETIME ()

WHILE (@i<1000)

BEGIN

SELECT @id = BusinessEntityID, @Email = EmailAddress

FROM person.EmailAddress

WHERE EmailAddress = N'james0@adventure-works.com'

SET @i = @i+1

END

PRINT ' By Email. Time elapsed ' + CAST(DATEDIFF(ms, @StartTime, SYSDATETIME()) AS VARCHAR(20))

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Steve Jones - SSC Editor SSC Guru Points: 739997 More actions · Answer 5

I tend to agree with Gail that I'd just index the fields. If there are specific values that you need to pull out, you could add a computed column that just pulls out those values, like the domain from an email address, and index that column.

AeroNC Old Hand Points: 314 More actions · Answer 6

Thanks for your help everyone! It appears that it is as good as it is going to get so I think that we are going to leave things as they are.