SSDT import into a table from .csv corruption

  • Hello everyone,

    I am importing from a csv file into a table.

    Everything works fine.

    One filed is imported as varchar(50) and the values are 600000000000243 , 600000000000272 etc

    when I do a comparison using implicit conversion it fails only for 600000000000243

    If I cast as varchar(15) the comparison works

    Though all the other values of the same field work fine with implicit conversion.

    I did replicate the issue with a test table.

    Unfortunately I cannot post a SQL Fiddle because I have to access the original table to extract the value.

    here is my test:

    create table test(ID int not null identity(1,1), charFld varchar(50))
    insert into test(charFld) values('600000000000243')
    insert into test(charFld) select top 2 Tenant_ID FROM TenantID_CURRENT where cast(Tenant_ID as varchar(15)) = 600000000000243

     

     

    select charFld FROM Test where charFld = '600000000000243'

    returns only one line

    select * FROM Test

    returns 3 lines (as it should)

    How can I identify the extra character(s)

  • If you suspect that it contains a special character you could try a numbers table and cross join.

    The query would look something like this:

    SELECTn.num, ASCII(SUBSTRING(t.charFld, n.num, 1)) AS AsciiCharValue
    FROMTest AS t
    CROSS JOIN Numbers AS n
    ;
  • The explicit cast to varchar(15) is potentially truncating longer values, with either normal or special characters after the 15th position.

    If the Tenant_ID is always supposed to be a number, you could TRY_CAST(Tenant_ID AS BIGINT) to find any values that fail the conversion.

    DECLARE @This TABLE (Tenant_ID VARCHAR(50))
    INSERT @This VALUES ('600000000000243'), ('600000000000243XYZ'), (CONCAT('600000000000243', CHAR(9)))

    -- 1 row
    SELECT *
    FROM @This
    WHERE Tenant_ID = '600000000000243'

    -- 3 rows
    SELECT *
    FROM @This
    WHERE CAST(Tenant_ID AS VARCHAR(15))= '600000000000243'

    SELECT ASCII(RIGHT(Tenant_ID,1)),
    CHAR(ASCII(RIGHT(Tenant_ID,1)))
    FROM @This

    SELECT TRY_CAST(Tenant_ID AS BIGINT) AS NumericTenant_ID,
    Tenant_ID
    FROM @This

    Sometimes an import is set to use line feed only as the end of line indicator, but the file contains carriage return line feed, so the carriage return is imported as part of the final column. Or the other way around.

  • I am not sure what the numbers table is.

    The CAST works but the question is why other values do not require CAST.

    Hw can I remove the extra lf or cr during or after the load?

  • atzvei wrote:

    I am not sure what the numbers table is.

    In SQL Server 2022, it would be the "GENERATE_SERIES()" function.

    atzvei wrote:

    Hw can I remove the extra lf or cr during or after the load?

    REPLACE(REPLACE(REPLACE(REPLACE(SomeString,CHAR(9),''),CHAR(10),''),CHAR(13),''),CHAR(160),'')

    From left to right, those are the Tab, Linefeed, Carriage-return, and Hard-space characters in the ASCII and most other character sets.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thank you Jeff,

    this did the trick.

    I cannot understand why inΒ  a table with about 160,000 rows only one value (not field, but value) displayed this issue.

    It must have been a copy and paste in the original csv.

  • atzvei wrote:

    Thank you Jeff,

    this did the trick.

    I cannot understand why inΒ  a table with about 160,000 rows only one value (not field, but value) displayed this issue. It must have been a copy and paste in the original csv.

    The answer is because some other humans were involved. πŸ˜€Β  We have this problem with imports all the time.Β  Either some piece of code picks up the keystroke(s) in a screen field and then the code saves it that way without checking or someone thought they be smart with a last minute edit (like you say) or there was a planetary alignment of Venus, Mars, and Jupiter with Uranus. πŸ˜€ πŸ˜€ πŸ˜€

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply