A Pattern for Email Address storage and retrieval

  • ChrisM@Work (9/29/2015)


    Wayne West (9/29/2015)


    ChrisM@Work (9/24/2015)


    Why are you using NVARCHAR for an email address?

    We now have Unicode URLs, so why not Unicode characters for email addresses. You can Google Unicode Email and there's also this Wikipedia article: https://en.wikipedia.org/wiki/International_email

    There's some interesting issues that we're going to have to cope with at some point, especially if you're a multi-national org.

    It's on the way - possibly - but it isn't here yet. Now bearing in mind that the point of the article was efficiency, I'd still argue in favour of VARCHAR.

    I seem to remember back in the '80s some argument about not needing to put the century in date representations - cos it wouldn't happen for a long time.

    Dealing with the problem when it comes may well be a case of "Not my Problem", but someone will have to fix it. Software often ends up being used long after its original expected lifetime.

    Yes it may be arrogant to assume that one's latest masterwork will still be going in 20 years time, but it seems short sighted to hope that it won't. Way back then, computers were expensive and seriously limited, storage, memory, cpu horsepower, even screen size/capacity were are all major hurdles to overcome. There were significant gains to be made by compressing data - though why people didn't store dates as day counts and convert for display escapes me.

    Nowadays we have cloud providers falling over themselves to throw free storage at us. 64-bit memory means that we can size our server to meet the caching needs. The performance hit from nvarchar/nchar can't be that significant. Space to store Unicode should rarely be a problem worth thinking about.

    In fact I am bemused that varchar and char are not on the deprecated list. I always use nvarchar unless something indicates that it might need a rethink. Developments nowadays need to consider their likely use across the world and in unexpected ways.

  • andrew_c_worth@yahoo.co.uk (9/30/2015)

    Dealing with the problem when it comes may well be a case of "Not my Problem", but someone will have to fix it. Software often ends up being used long after its original expected lifetime.

    Non-ascii characters in email addresses are implemented using "xn" encoding, just like IDN names. If your back end is getting emails with unicode values in them, I would suggest that you are are doing it wrong. I thought the xn decoding should be done on the presentation.

    Example:

    user sees: doug@??.com

    database stores: doug@xn--0zwm56d.com

    The recommended email address storage format remains varchar(254). But that's just window dressing. There is some CPU overhead, but the disk and memory footprints for storing ascii values in nvarchar vs varchar for SQL server >= 2008r2 is pretty much identical since Microsoft started using unicode compression.

    https://msdn.microsoft.com/en-us/library/ee240835.aspx

  • I never use email address as a PK. Typically I'll have a User, UserProfile, Person or ContactDetails table that would contain the email address field based upon the domain I was implementing and that table would have it's own ID column that would be the PK and be used as FK in other tables.

    I'd really like to know of a domain where you would have even considered using email address as the PK in the first place.

  • Rather than a checksum you should use a hash, which is built right into SQL Server. This type of application is exactly what hashes are for.

    http://www.sqlservercentral.com/articles/SQLCLR/102324/

  • select checksum(N'TheMostInterestingMan@dosequis.com') EmailCheckSum
    EmailCheckSum = 1720775697
    select checksum(N'2TheMostInterestingMan@dosequis.com') EmailCheckSum
    EmailCheckSum = -1728309547

  • SHA-2 is available in Hashbytes() - https://docs.microsoft.com/en-us/sql/t-sql/functions/hashbytes-transact-sql
    CHECKSUM() is MD5 (I think), which has a higher probability of collissions. Use HASHBYTES() instead.

  • Surprised that no one has commented on the overhead of the table variable.

  • well.what's the benefit - IN NUMBERS - over having an integer PK and an additional index on full email address ?

Viewing 8 posts - 31 through 37 (of 37 total)

You must be logged in to reply to this topic. Login to reply