November 29, 2009 at 8:20 am
What is the best way to index on URLS's and email address? I looked into creating a CheckSum column but also read that problems may result from some characters found in domain names. Does SQL 2008 provide a better way of doing this?
Thanks!
November 30, 2009 at 12:03 am
RGDavis396 (11/29/2009)
What is the best way to index on URLS's and email address?
Same way you'd index anything else. Normal index on the column. Is there some reason you don't want to do this?
I looked into creating a CheckSum column but also read that problems may result from some characters found in domain names.
I suppose you could do a checksum (or hash) then index and filter on that. Question is, why go to all the added complexity just to index a string column. Are there some considerations that you haven't mentioned?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
November 30, 2009 at 7:37 am
Searches against string values tend to be slower than those against numerical values and we are looking for ways to speed up searches against commonly used fields such as an email address.
November 30, 2009 at 8:04 am
RGDavis396 (11/30/2009)
Searches against string values tend to be slower than those against numerical values and we are looking for ways to speed up searches against commonly used fields such as an email address.
Adding an index will speed up the search Email address are usually not that long so you should still get good results. I wouldn't bother using Checksum coumn.
If your URLS are really long then you could possibly use Full-text indexing, but this does have a lot of limitations and is better at searching specific phrases in very large strings such as a specific word in a document.
November 30, 2009 at 8:09 am
RGDavis396 (11/30/2009)
Searches against string values tend to be slower than those against numerical values
Have you tested against large volumes of data under load? If so, are you seeing a measurable, significant performance difference? Is that performance difference significant enough to go to extreme lengths to solve it?
I did a quick test on the Adventureworks database, this is what I got over 10 runs (test code follows)
Runs
1: ID = 18 ms, Email = 22 ms
2: ID = 17 ms, Email = 29 ms
3: ID = 18 ms, Email = 24 ms
4: ID = 17 ms, Email = 25 ms
5: ID = 17 ms, Email = 24 ms
6: ID = 19 ms, Email = 23 ms
7: ID = 20 ms, Email = 22 ms
8: ID = 17 ms, Email = 28 ms
9: ID = 17 ms, Email = 24 ms
10: ID = 18 ms, Email = 24 ms
So a max of 12ms difference over 1000 executions, 0.012ms difference. You'd probably spend more than that in calculating the checksum/hash
Wasn't a completely fair test, the seek by ID is on the unique clustered index, I made the NC on the email column unique to make things a bit more even. Should have created a unique id column with NC index to be completely fair
DECLARE @id INT, @Email VARCHAR(5000)
SET NOCOUNT ON
DECLARE @i INT =0
DECLARE @StartTime DATETIME2
SET @StartTime = SYSDATETIME()
WHILE (@i<1000)
BEGIN
SELECT @id = BusinessEntityID, @Email = EmailAddress
FROM person.EmailAddress
WHERE BusinessEntityID = 42
SET @i = @i+1
END
PRINT ' By ID. Time elapsed ' + CAST(DATEDIFF(ms, @StartTime, SYSDATETIME()) AS VARCHAR(20))
GO
DECLARE @id INT, @Email VARCHAR(5000)
SET NOCOUNT ON
DECLARE @i INT =0
DECLARE @StartTime DATETIME2
SET @StartTime = SYSDATETIME ()
WHILE (@i<1000)
BEGIN
SELECT @id = BusinessEntityID, @Email = EmailAddress
FROM person.EmailAddress
WHERE EmailAddress = N'james0@adventure-works.com'
SET @i = @i+1
END
PRINT ' By Email. Time elapsed ' + CAST(DATEDIFF(ms, @StartTime, SYSDATETIME()) AS VARCHAR(20))
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
November 30, 2009 at 8:49 am
I tend to agree with Gail that I'd just index the fields. If there are specific values that you need to pull out, you could add a computed column that just pulls out those values, like the domain from an email address, and index that column.
December 1, 2009 at 10:53 am
Thanks for your help everyone! It appears that it is as good as it is going to get so I think that we are going to leave things as they are.
Viewing 7 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply