Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

Japanese characters as question marks Expand / Collapse
Author
Message
Posted Wednesday, April 10, 2013 8:27 PM


SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Friday, October 17, 2014 12:32 PM
Points: 467, Visits: 1,879
Hi,

I'm having a bit of a strange issue with Japanese characters.

1) They're in the Excel file I'm importing.

2) They show up in the raw table I first load the file to, when I select results to text or file, but not to grid. In grid they show up as black rectangles that are hollow in the center.

3) They don't show up at all in the secondary staging table, only question marks.

I will post anything I can to help, but I'm not sure how to generate dummy data. I can't post what I have, because it's from a client file.

I can post my table creation code, but they're kind of ridiculous stored procedures full of cursors (which will probably give some of you a twitch) that have been in place since before I ever had an inkling I'd be touching SQL at work, and a year and a half into my SQL life I have no idea how to improve upon.

What I'm really looking for are some things to check on to see why this might be happening. Maybe a setting somewhere, but I'm not counting on being that lucky.

I've tried various collate selects, but they don't change how the data displays in either table. I've put in a request for the MS OLEDB driver to be updated (it's currently 4.0), but I'm really not even sure if that could be an issue. Windows *seems* to have the language packs I would need, since I can see the characters in some places.

Anyway, I'd appreciate any suggestions.

Thanks

Post #1441088
Posted Thursday, April 11, 2013 12:49 AM


SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Thursday, October 9, 2014 9:10 PM
Points: 442, Visits: 1,146
First thing that comes to mind is Unicode conversion problems. That data must be stored as nvarchar() or nchar(). If any table application, column, temp table column, or variable touched that data along its path to the table was char() and not nchar(), then it would get mangled.

Eddie Wuerch
MCM: SQL
Post #1441126
Posted Thursday, April 11, 2013 6:36 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Yesterday @ 5:14 PM
Points: 7,125, Visits: 12,719
The ? marks are a dead giveaway that non-printable ANSI characters are being pushed into an ANSI column or variable (i.e. VARCHAR or CHAR). The hollow squares might be a problem if some of your characters are multi-byte Unicode characters, of which I think Japanese has a few. SQL Server (at least 2005) does not handle multi-byte character sets. SQL 2012 introduced some support for them through Collations. PS The squares could also be a simple display issue depending on what you're viewing the data with. You can always check by using the SUBSTRING() and UNICODE() or ASCII() functions to see what is actually stored.

__________________________________________________________________________________________________
There are no special teachers of virtue, because virtue is taught by the whole community. --Plato
Post #1441212
Posted Thursday, April 11, 2013 8:24 AM


SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Friday, October 17, 2014 12:32 PM
Points: 467, Visits: 1,879
opc.three (4/11/2013)
The ? marks are a dead giveaway that non-printable ANSI characters are being pushed into an ANSI column or variable (i.e. VARCHAR or CHAR). The hollow squares might be a problem if some of your characters are multi-byte Unicode characters, of which I think Japanese has a few. SQL Server (at least 2005) does not handle multi-byte character sets. SQL 2012 introduced some support for them through Collations. PS The squares could also be a simple display issue depending on what you're viewing the data with. You can always check by using the SUBSTRING() and UNICODE() or ASCII() functions to see what is actually stored.


Hm. All the columns I have are nvarchar. I can SEE them as Japanese characters correctly using SSMS 2008 in the raw table. 2005 is hopeless, I'm convinced. However, they're both still the same in the second table.


Using ascii() I get a value of 63. Using unicode() I get a 5 digit string.

So, I guess I have to see if the columns that the characters are going into are ANSI, and change them to Unicode, if that's possible?

Thanks
Post #1441279
Posted Thursday, April 11, 2013 8:53 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Yesterday @ 5:14 PM
Points: 7,125, Visits: 12,719
For terminology's sake so we're speaking the same language:

ANSI = VARCHAR/CHAR
Unicode = NVARCHAR/NCHAR
In SQL Server, Unicode = UCS-2

Still not clear if all your columns are Unicode or whether you have some ANSI columns. If all Unicode then you can stop using the ASCII() function and stick with the UNICODE() function.

SQL Server will not munge data inserted into Unicode columns. If you're seeing ? marks in a Unicode column then those characters were munged prior to entering the table.


__________________________________________________________________________________________________
There are no special teachers of virtue, because virtue is taught by the whole community. --Plato
Post #1441299
Posted Thursday, April 11, 2013 9:05 AM


SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Friday, October 17, 2014 12:32 PM
Points: 467, Visits: 1,879
opc.three (4/11/2013)
For terminology's sake so we're speaking the same language:

ANSI = VARCHAR/CHAR
Unicode = NVARCHAR/NCHAR
In SQL Server, Unicode = UCS-2

Still not clear if all your columns are Unicode or whether you have some ANSI columns. If all Unicode then you can stop using the ASCII() function and stick with the UNICODE() function.

SQL Server will not munge data inserted into Unicode columns. If you're seeing ? marks in a Unicode column then those characters were munged prior to entering the table.


All I know is that when I bypass the stored procedure (kind of) and just do a straight update from the raw table matching on email address, the Japanese characters display properly in SSMS 2008, and as squares in 2005. Which makes about as much sense as I can ask for.

If you're feeling charitable, I can post the stored procedure that copies data from raw to process table. Maybe you can make heads or tails of where it "munges" things. If not, I absolutely understand.

Thanks again
Post #1441313
Posted Thursday, April 11, 2013 9:34 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Yesterday @ 5:14 PM
Points: 7,125, Visits: 12,719
Sure. Post it. If you want to attach some sample UPDATE statements and the table defs too as a text file so the browser doesn't get in the way that might be helpful too.

__________________________________________________________________________________________________
There are no special teachers of virtue, because virtue is taught by the whole community. --Plato
Post #1441332
Posted Thursday, April 11, 2013 11:53 AM


SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Friday, October 17, 2014 12:32 PM
Points: 467, Visits: 1,879
opc.three (4/11/2013)
Sure. Post it. If you want to attach some sample UPDATE statements and the table defs too as a text file so the browser doesn't get in the way that might be helpful too.


I've zipped up the five procedures I run to import a file and set up tables. I've also included the fairly innocuous Prefix column from the sample file I have so you can see some data that I'm working with. Let me know if you need anything else.

https://www.dropbox.com/s/xcbn1gx0blbj55p/erikd.zip
Post #1441401
Posted Thursday, April 11, 2013 1:36 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Yesterday @ 5:14 PM
Points: 7,125, Visits: 12,719
Cool. I can't get to dropbox from here but I'll have a look when I am home.

__________________________________________________________________________________________________
There are no special teachers of virtue, because virtue is taught by the whole community. --Plato
Post #1441465
Posted Thursday, April 11, 2013 8:16 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Yesterday @ 5:14 PM
Points: 7,125, Visits: 12,719
I need the contents of Sample.dbo.sample_fields in order to know the second table definition. Can you simply script out the resulting table definitions for the project_id in question?

This query in dbo.CreateTargetTable looks like it cannot allow for a Unicode column to be created:

DECLARE Column_Cursor CURSOR
FOR
SELECT source_name,
coltype = CASE WHEN source_type = 1 THEN 'varchar(' + CAST(field_varSize AS VARCHAR(4)) + ')'
WHEN source_type = 2 THEN 'int'
WHEN source_type = 3 THEN 'datetime'
/* Code 4 converts Float to Varchar to get around Excel opendatasource */
WHEN source_type = 4 THEN 'varchar(' + CAST(field_varSize AS VARCHAR(4)) + ')'
END
FROM sample_fields
WHERE ID_Project = @ProjectID
AND source_name IS NOT NULL
ORDER BY source_sequence

I am assuming that either source_type 1 or 4 should generate an NVARCHAR column? If the second table is ANSI then you will lose the Unicode characters on INSERT and that is likely where the ? marks are coming from.


__________________________________________________________________________________________________
There are no special teachers of virtue, because virtue is taught by the whole community. --Plato
Post #1441535
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse