ASCII to UNICODE

  • Hi All

    I got a bad feeling about this one.

    I've taken over a web based app and sql db. The original app and db were written to cope with latin based languages only (FR & EN) and the text data types are varchar throughout the database, now the drum role.....

    .....the client needs to extend this app to all cultures.

    Change the web pages (500+) to utf-8, ok, can do this.

    Change all existing text throughout the database to nvarchar.

    Q: What's the most efficient way of achieving this?

    Q: Is this a job for CAST/CONVERT with an update statement and a couple of variables repeated over each column?

    Q: Would I have to change the collation for the db from the standard SQL2000 Latin?

    The database contains the entire website text as well as users submissions

    Any pointers would be greatly appreciated as I have no idea where to start with this.

    Thanks in advance.

    Lee

  • I think the first step is to try to find a smarter client.  It is amazing how much the return diminishes trying to support multi-language systems as soon as you get away from latin based languages.

    If that fails, from the database end, this is not too complicated.  The data in the tables is already fine, so you don't have to do anything there.  Just change all of your data types.  If you have a lot of stored procedures, this may be complicated, but tables should not be hard to do.

    Look at the numbers of objects.  If you have less than 1000 objects in your database, manually opening each one in a editor and changing the data types will take you a day, but not much more.

    If you have 10,000 objects, you will have to be a bit smoother.  In this case, my first suggestion would be to script the entire database (all objects), find and replace VARCHAR with NVARCHAR, and run the script on an empty database or another server.  Then, deal with the errors.  Once the objects have been created, DTS your data from one database to the other.

    From the database end, this is not a complicated task.  It is the application code, DTS packages, and anything else connecting to the database that will be complicated.  Take it a piece at a time.

  • Also, don't forget... that if you have any VARCHAR columns > 4000 characters, you could have a serious problem with truncation...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Michael

    Thanks for your reply.

    Makes sense to me and a lot less involved than I expected to be honest, I had a real dread about the current data. The client is being billed for this so the return is still good and the RedGate tools I've got will help me sort this in double time, well the db anyway, the app is a different story.

    Again many thanks,

    Lee

  • Hi Jeff

    Yeah I got that one too, with the changes to nvarchar I'd have to double the storage size specified on the field so the change varchar(25) would be nvarchar(50). Anything over 4000 and I'll shift it ntext and check the app and or sp's have the new ntext datatype specified as the last item in the select clause.

    Thanks,

    Lee

  • You will also have to watch out for your record sizes.  SQL 2000 can only handle a record up to 8k in total length.  You may get warnings about this if you script the database with new data types.  It will only be a problem when you actuall put that much data into the table, so don't be fooled by it being just a warning.  If you get one of these record length warnings, fix it.

  • Hi Michael

    Yep that too, thanks for the reminder I appreciate it.

    Lee

  • Don't double the size of the field: MyCharField varchar(25) --> MyCharField nvarchar(50) NO.  It simply goes from MyCharField varchar(25) --> MyCharField nvarchar(25).  The change will double the amount of space required for the field from 25 bytes to 50 bytes.  That is why Jeff warned you about any varchar fields defined varchar(4001) and greater.

  • Hi Lynn

    Thanks for that, I get it, feeling a bit of a dope now. 8=) So I'd calculate full row storage requirements (max of 8k) by doubling the stated sizes of any new nvarchar fields and not doubling the actual stated size when changing from varchar to nvarchar, sql will manage that for me.

    Thanks again, you just stopped me falling down a black hole.

    Lee

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply