SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Japanese characters as question marks


Japanese characters as question marks

Author
Message
sqldriver
sqldriver
Ten Centuries
Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)

Group: General Forum Members
Points: 1018 Visits: 2519
Hi,

I'm having a bit of a strange issue with Japanese characters.

1) They're in the Excel file I'm importing.

2) They show up in the raw table I first load the file to, when I select results to text or file, but not to grid. In grid they show up as black rectangles that are hollow in the center.

3) They don't show up at all in the secondary staging table, only question marks.

I will post anything I can to help, but I'm not sure how to generate dummy data. I can't post what I have, because it's from a client file.

I can post my table creation code, but they're kind of ridiculous stored procedures full of cursors (which will probably give some of you a twitch) that have been in place since before I ever had an inkling I'd be touching SQL at work, and a year and a half into my SQL life I have no idea how to improve upon.

What I'm really looking for are some things to check on to see why this might be happening. Maybe a setting somewhere, but I'm not counting on being that lucky.

I've tried various collate selects, but they don't change how the data displays in either table. I've put in a request for the MS OLEDB driver to be updated (it's currently 4.0), but I'm really not even sure if that could be an issue. Windows *seems* to have the language packs I would need, since I can see the characters in some places.

Anyway, I'd appreciate any suggestions.

Thanks
Eddie Wuerch
Eddie Wuerch
SSC Eights!
SSC Eights! (958 reputation)SSC Eights! (958 reputation)SSC Eights! (958 reputation)SSC Eights! (958 reputation)SSC Eights! (958 reputation)SSC Eights! (958 reputation)SSC Eights! (958 reputation)SSC Eights! (958 reputation)

Group: General Forum Members
Points: 958 Visits: 2203
First thing that comes to mind is Unicode conversion problems. That data must be stored as nvarchar() or nchar(). If any table application, column, temp table column, or variable touched that data along its path to the table was char() and not nchar(), then it would get mangled.

Eddie Wuerch
MCM: SQL
Orlando Colamatteo
Orlando Colamatteo
SSCoach
SSCoach (15K reputation)SSCoach (15K reputation)SSCoach (15K reputation)SSCoach (15K reputation)SSCoach (15K reputation)SSCoach (15K reputation)SSCoach (15K reputation)SSCoach (15K reputation)

Group: General Forum Members
Points: 15611 Visits: 14396
The ? marks are a dead giveaway that non-printable ANSI characters are being pushed into an ANSI column or variable (i.e. VARCHAR or CHAR). The hollow squares might be a problem if some of your characters are multi-byte Unicode characters, of which I think Japanese has a few. SQL Server (at least 2005) does not handle multi-byte character sets. SQL 2012 introduced some support for them through Collations. PS The squares could also be a simple display issue depending on what you're viewing the data with. You can always check by using the SUBSTRING() and UNICODE() or ASCII() functions to see what is actually stored.

__________________________________________________________________________________________________
There are no special teachers of virtue, because virtue is taught by the whole community. --Plato
sqldriver
sqldriver
Ten Centuries
Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)

Group: General Forum Members
Points: 1018 Visits: 2519
opc.three (4/11/2013)
The ? marks are a dead giveaway that non-printable ANSI characters are being pushed into an ANSI column or variable (i.e. VARCHAR or CHAR). The hollow squares might be a problem if some of your characters are multi-byte Unicode characters, of which I think Japanese has a few. SQL Server (at least 2005) does not handle multi-byte character sets. SQL 2012 introduced some support for them through Collations. PS The squares could also be a simple display issue depending on what you're viewing the data with. You can always check by using the SUBSTRING() and UNICODE() or ASCII() functions to see what is actually stored.


Hm. All the columns I have are nvarchar. I can SEE them as Japanese characters correctly using SSMS 2008 in the raw table. 2005 is hopeless, I'm convinced. However, they're both still the same in the second table.


Using ascii() I get a value of 63. Using unicode() I get a 5 digit string.

So, I guess I have to see if the columns that the characters are going into are ANSI, and change them to Unicode, if that's possible?

Thanks
Orlando Colamatteo
Orlando Colamatteo
SSCoach
SSCoach (15K reputation)SSCoach (15K reputation)SSCoach (15K reputation)SSCoach (15K reputation)SSCoach (15K reputation)SSCoach (15K reputation)SSCoach (15K reputation)SSCoach (15K reputation)

Group: General Forum Members
Points: 15611 Visits: 14396
For terminology's sake so we're speaking the same language:

ANSI = VARCHAR/CHAR
Unicode = NVARCHAR/NCHAR
In SQL Server, Unicode = UCS-2

Still not clear if all your columns are Unicode or whether you have some ANSI columns. If all Unicode then you can stop using the ASCII() function and stick with the UNICODE() function.

SQL Server will not munge data inserted into Unicode columns. If you're seeing ? marks in a Unicode column then those characters were munged prior to entering the table.

__________________________________________________________________________________________________
There are no special teachers of virtue, because virtue is taught by the whole community. --Plato
sqldriver
sqldriver
Ten Centuries
Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)

Group: General Forum Members
Points: 1018 Visits: 2519
opc.three (4/11/2013)
For terminology's sake so we're speaking the same language:

ANSI = VARCHAR/CHAR
Unicode = NVARCHAR/NCHAR
In SQL Server, Unicode = UCS-2

Still not clear if all your columns are Unicode or whether you have some ANSI columns. If all Unicode then you can stop using the ASCII() function and stick with the UNICODE() function.

SQL Server will not munge data inserted into Unicode columns. If you're seeing ? marks in a Unicode column then those characters were munged prior to entering the table.


All I know is that when I bypass the stored procedure (kind of) and just do a straight update from the raw table matching on email address, the Japanese characters display properly in SSMS 2008, and as squares in 2005. Which makes about as much sense as I can ask for.

If you're feeling charitable, I can post the stored procedure that copies data from raw to process table. Maybe you can make heads or tails of where it "munges" things. If not, I absolutely understand.

Thanks again
Orlando Colamatteo
Orlando Colamatteo
SSCoach
SSCoach (15K reputation)SSCoach (15K reputation)SSCoach (15K reputation)SSCoach (15K reputation)SSCoach (15K reputation)SSCoach (15K reputation)SSCoach (15K reputation)SSCoach (15K reputation)

Group: General Forum Members
Points: 15611 Visits: 14396
Sure. Post it. If you want to attach some sample UPDATE statements and the table defs too as a text file so the browser doesn't get in the way that might be helpful too.

__________________________________________________________________________________________________
There are no special teachers of virtue, because virtue is taught by the whole community. --Plato
sqldriver
sqldriver
Ten Centuries
Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)

Group: General Forum Members
Points: 1018 Visits: 2519
opc.three (4/11/2013)
Sure. Post it. If you want to attach some sample UPDATE statements and the table defs too as a text file so the browser doesn't get in the way that might be helpful too.


I've zipped up the five procedures I run to import a file and set up tables. I've also included the fairly innocuous Prefix column from the sample file I have so you can see some data that I'm working with. Let me know if you need anything else.

https://www.dropbox.com/s/xcbn1gx0blbj55p/erikd.zip
Orlando Colamatteo
Orlando Colamatteo
SSCoach
SSCoach (15K reputation)SSCoach (15K reputation)SSCoach (15K reputation)SSCoach (15K reputation)SSCoach (15K reputation)SSCoach (15K reputation)SSCoach (15K reputation)SSCoach (15K reputation)

Group: General Forum Members
Points: 15611 Visits: 14396
Cool. I can't get to dropbox from here but I'll have a look when I am home.

__________________________________________________________________________________________________
There are no special teachers of virtue, because virtue is taught by the whole community. --Plato
Orlando Colamatteo
Orlando Colamatteo
SSCoach
SSCoach (15K reputation)SSCoach (15K reputation)SSCoach (15K reputation)SSCoach (15K reputation)SSCoach (15K reputation)SSCoach (15K reputation)SSCoach (15K reputation)SSCoach (15K reputation)

Group: General Forum Members
Points: 15611 Visits: 14396
I need the contents of Sample.dbo.sample_fields in order to know the second table definition. Can you simply script out the resulting table definitions for the project_id in question?

This query in dbo.CreateTargetTable looks like it cannot allow for a Unicode column to be created:

DECLARE Column_Cursor CURSOR
FOR
SELECT source_name,
coltype = CASE WHEN source_type = 1 THEN 'varchar(' + CAST(field_varSize AS VARCHAR(4)) + ')'
WHEN source_type = 2 THEN 'int'
WHEN source_type = 3 THEN 'datetime'
/* Code 4 converts Float to Varchar to get around Excel opendatasource */
WHEN source_type = 4 THEN 'varchar(' + CAST(field_varSize AS VARCHAR(4)) + ')'
END
FROM sample_fields
WHERE ID_Project = @ProjectID
AND source_name IS NOT NULL
ORDER BY source_sequence



I am assuming that either source_type 1 or 4 should generate an NVARCHAR column? If the second table is ANSI then you will lose the Unicode characters on INSERT and that is likely where the ? marks are coming from.

__________________________________________________________________________________________________
There are no special teachers of virtue, because virtue is taught by the whole community. --Plato
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search