Corrupt Record - Non-relational database Structure

  • We have a system with proprietary algorithm that converts the data entered in an user interface into a concatenated string. The database is SQL Server 2008R2 (Sp3) but it is not conventional relational database by design.

    The system is divided into multiple sections. For example, General section data has Company name and address. Once the data is saved from the user interface it gets saved in the database in the following manner. The system is written by our sister office in another country and we do not have the source code.

    (ref - attachment -1)

    In the past few days we are experiencing a problem wherein we are getting the corrupt data after it is saved in the database

    (ref - attachment -2)

    As suggested by the core developer we checked hexadecimal version of the text which is saved in the database using following query

    Select CONVERT(varbinary(max),

    Coded_Data),*

    from

    Table_name

    where

    Primary_key = <numeric value>

    and section_name = 'General'

    and we get the hex representation like below

    0x02072616468696B612E74686F7261744072696C2E636F6D496D706F72746572206F662054656C65636F6D6D756E69636174696F6E202620456C656374726F6E69632050726F647563742C2052656C69616E63656469676974616C2073746F72657320746F2052656C69616E6365A

    In the above representation when we remove leading zero after 0x and trailing “A” we get the correct english representation.

    But this pattern of data corruption / invalid character is not the same for all records. Many records have hex representation as given below where we are not sure how to conver that back to correct version of english data

    0x6167656D656E7420636F6E73756C74616E74456C65637472696320706F776572206C696E6520636F6E737472756374696F6E496E647573747269616C20706C616E7420636F6E737472756374696F6E4D616E75666163747572657320646573616C696E6174696F6E2065717569706D656E744F696C20616E642067617320706970656C696E6520636F6E737472756374696F6E5175616C6974

    79206173737572616E636520636F6E73756C74616E74456E676167656420696E20636F6E737472756374696F6E202620636976696C20656E67696E656572696E679

    The system have fields with different data types like date, text, varchar, numbers etc. Text fields may have data copied from any source like a web page an email etc.

    The column that stores entire data can hold one records of 255 characters data type – varchar(255)

    Collation - SQL_1xCompat_CP850_CI_AS

    Server Collation - SQL_Latin1_General_CP1_CI_AS

    This database server is recently upgraded from SQL Server 2000 to SQL Server 2008R2. We have started observing this issue immediately after migrating to SQL Server 2008R2

    Can we identify which are the digits cuasing data corruption from hex version and remove them at once?

    Thanks

    Rohit Chitre

  • wow that's an interesting puzzle.

    so is the varbinary string stored as a string datatype, or is it a varbinary(max)?

    for a solution , we need the datatype as it is stored right now.

    i have an idea, where if the converted string is not like %[a-Z,0-9]%, you would STUFF the two characters you identified with empty strings ,and see if THAT converts nicely.

    that depends on whether it's a varbinary that needs to be converted to a string, tweak it , and cvonvert it back to varbinary, or if it is already a string representation of a hex value

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Thanks for the reply. The datatype of the column is varchar(255) where junk characters are seen.

    if we convert that to varbinary(max) in select statement we get the pattern

  • If I try to remove leading zero from 0x0231 with

    STUFF(column_name,1,1,'') then it removes 02 as that is the first character in binary and makes the binary pattern as 0x31 instead of 0x231

  • SQLRO (5/6/2016)


    If I try to remove leading zero from 0x0231 with

    STUFF(column_name,1,1,'') then it removes 02 as that is the first character in binary and makes the binary pattern as 0x31 instead of 0x231

    i think you wanted STUFF(val,3,1,''), right?

    /*

    --Results

    0x231

    */

    With MyCTE

    AS

    (

    SELECT '0x0231' As Val

    )

    SELECT STUFF(val,3,1,'') FROM MyCTE

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • The val is actually a binary value. So I tried doing the below as per your suggestion.

    declare @varBin as varbinary(max)

    set @varBin = 0x06E3231322020202020203737363533313231322020202020203737323137364D616E756661637475726573206F66206361726275726574746F727320616E6420706172747320666F722074776F2026207468726565A

    SELECT convert(varbinary(max),STUFF(@varBin,3,1,''))

    --Result

    0x06E31322020202020203737363533313231322020202020203737323137364D616E756661637475726573206F66206361726275726574746F727320616E6420706172747320666F722074776F2026207468726565A

  • ok, here's how i would tackle it;

    assuming your analysis on the two chars is correct, this produces both "versions"

    of the data;

    now i would simply test for high ascii, and take only the one that does not contain high ascii in a case statement

    let me see if i can slap together a function to test for that, but this gets you really close already, so i wanted to post it:

    ;WITH MyCTE(val)

    AS

    (

    --bad

    SELECT 0x02072616468696B612E74686F7261744072696C2E636F6D496D706F72746572206F662054656C65636F6D6D756E69636174696F6E202620456C656374726F6E69632050726F647563742C2052656C69616E63656469676974616C2073746F72657320746F2052656C69616E6365A

    UNION ALL

    --bad

    SELECT 0x06C6F77656C6C4073746F726D726167652E636F6D20646F206E6F74207370616D206D6520696620796F75206465636F64652074686973B

    UNION ALL

    --good

    SELECT 0x2053514C205365727665722032303038204461746120436F7272757074696F6E205353324B3820205353324B382052322020436F7272757074205265636F7264202D204E6F6E2D72656C6174696F6E616C20646174616261736520

    ),

    ConvertedToString

    AS

    (

    SELECT val As OriginalBinary,

    CONVERT(varchar(max),val) As Converted,

    master.sys.fn_varbintohexstr(val) As OriginalBinaryToString,

    LEFT(STUFF(master.sys.fn_varbintohexstr(val),3,1,''),LEN(STUFF(master.sys.fn_varbintohexstr(val),3,1,'')) -1) As TrimmedAndStuffed,

    CONVERT(varbinary(max),LEFT(STUFF(master.sys.fn_varbintohexstr(val),3,1,''),LEN(STUFF(master.sys.fn_varbintohexstr(val),3,1,'')) -1),1) As BackToBinary,

    CONVERT(varchar(max),CONVERT(varbinary(max),LEFT(STUFF(master.sys.fn_varbintohexstr(val),3,1,''),LEN(STUFF(master.sys.fn_varbintohexstr(val),3,1,'')) -1),1)) As ConvertedTrimmed

    FROM myCTE

    )

    SELECT * FROM ConvertedToString

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • SQLRO (5/6/2016)


    The val is actually a binary value. So I tried doing the below as per your suggestion.

    declare @varBin as varbinary(max)

    set @varBin = 0x06E3231322020202020203737363533313231322020202020203737323137364D616E756661637475726573206F66206361726275726574746F727320616E6420706172747320666F722074776F2026207468726565A

    SELECT convert(varbinary(max),STUFF(@varBin,3,1,''))

    --Result

    0x06E31322020202020203737363533313231322020202020203737323137364D616E756661637475726573206F66206361726275726574746F727320616E6420706172747320666F722074776F2026207468726565A

    In addition to Lowell's suggestion, you could also just use CONVERT with a style to turn the varbinary to a string:

    DECLARE @varBin as varbinary(max);

    SET @varBin = 0x06E3231322020202020203737363533313231322020202020203737323137364D616E756661637475726573206F66206361726275726574746F727320616E6420706172747320666F722074776F2026207468726565A;

    SELECT @varBin, CONVERT(varchar(max),@varbin); --Unaltered binary and converted

    DECLARE @binary_as_string varchar(max);

    SET @binary_as_string=CONVERT(varchar(max),@varbin,1); --Use style 1 to preserve the binary as a literal string

    SET @binary_as_string=STUFF(@binary_as_string,3,1,''); --Remove the leading 0

    SET @binary_as_string=LEFT(@binary_as_string,LEN(@binary_as_string)-1);--Remove the trailing A

    SET @varBin=CONVERT(varbinary(max),@binary_as_string,1); --Use style 1 again to preserve the literal string

    SELECT @varBin, CONVERT(varchar(max),@varbin); --Altered binary and converted

    Cheers!

  • Thanks. Both the methods working fine. I am updating as many as I get in this range. The others would be still a challenge

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply