UTF-8 Problems with ODBC

  • Edit: It looks like this site doesn't store the greek alpha character either.

    I have this stored in the database:

    3ß,7a-Dihydroxy-5-androsten-7-one

    When I query the row using query analyzer it comes out corrrectly. However, when I query the row using perl and ODBC it comes out as:

    3ß,7a-Dihydroxy-5-androsten-7-one

    The ß remains but the a is converted to an a.

    Using:

    Perl 5.8.x (supports utf-8)

    SQL 2005 Standard 64x

    Sql Native Client (odbc driver v. 2005.90.1399.00)

  • It appears that the lower case alpha character is only supported in Unicode. Look at this example (The SET statements have the correct alpha, just in case posting this converted them to 'a'):

    DECLARE @chem varchar(200)

    SET @chem = '3ß,7a-Dihydroxy-5-androsten-7-one'

    PRINT @chem

    DECLARE @nchem nvarchar(200)

    SET @nchem = N'3ß,7a-Dihydroxy-5-androsten-7-one'

    PRINT @nchem

  • The data is stored as unicode (nvarchar).

    It's listed correctly in the db. It's just that after it goes through the odbc driver to perl, it is losing the encoding.

    Is there something I have to set in the odbc driver?

  • Sorry, I don't use Perl, so I can't really help with that.  I assume you've already searched for help on this, but here are some that seemed helpful.

    http://perl.active-venture.com/pod/perlguts-unicode.html

    http://www.ahinea.com/en/tech/perl-unicode-struggle.html

    http://userpage.fu-berlin.de/~ram/pub/pub_jf47ht12Ht/perl_unicode_en

  • I don't know that it is a perl problem. From perl I can print UTF-8 characters through the web server just fine.

    The characters are correct in the table, I can query them just fine with a query from query analyzer.

    The problem is when I read them through an ODBC connection I lose the characters. Is there something I have to set up in the DSN or connect string to force UTF-8?

    Here is an image of the problem:

    http://web.mitsi.com/cis.jpg

  • Windows and SQL Server use UCS-2 unicode, which is a fixed-length 2-byte format. UTF-8 uses 8 bits for some characters, and 16 bits for others. Maybe the characters you are referring to are getting "lost in translation". Queries may work because the characters are converted internally when the query is parsed.

  • SQL Server 2000 uses UCS-2. We specifically upgraded to SQL 2005 because it does support UTF-8.

    The querys did not work with SQL Server 2000 because of the UCS-2, but they do work with SQL Server 2005.

    I have updated to the ODBC driver that comes with SQL Server 2005, but that's were the problem seems to happening.

    Thanks for your help so far.

  • Some minor explanation needed SQL Server 2005 can natively support UTF8 because of the .NET CLR in SQL Server 2005 but 2000 also accepts UTF8 because although the .NET FCL(framework class library) is UTF16 by default you can change that in Visual Studio. I am not using Vista but it maybe UTF16 instead of UCS-2 because .NET from definition in the ealry 2000 is UTF16 and not UCS-2.

    Now to your problem I think you need to set Greek collation in SQL Server 2005 to generate UTF8 Unicode you should know there is no equivalent of the .NET Char in SQL Server because .NET Char is the ninth integer and UTF16 by default so until you use Nvarchar above 200 you are still using bytes. You can save your Perl code as Unicode in Notepad, the chart in the link below was created by the SQL Server team very interesting read. Hope this helps.

    http://msdn2.microsoft.com/en-us/library/ms131092.aspx

    Kind regards,
    Gift Peddie

  • Apparently, using DBI and ADO is the only working way to do this right now.

    I was able to get it working.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply