Problem Importing CSV Files into SQL 2008 using SSIS

  • Hi All

    I have a CSV file which needs importing into SQL 2008, a row has 28 columns, text has quotes around and sometimes commas within the text, when importing the file sql imports the data up to column 8 where column 8 has the rest of the data in it ( doesn't seem to split on the columns.

    This is the same problem I had it sql 2005, the only way I got around the problem was to script code that goes through row by row (RBAR) looking for a pattern matching and substituting the comma's within text for a |, when working with a file that over a million rows this is very time consuming, (sometimes SSIS crashes when the file size is large).

    I have tried all the usual routes but to no avail

    Excel 2008 has no problem importing the CSV file.

    Any solutions would be much appreciated.:-D

  • Check this link: http://msdn.microsoft.com/en-us/library/ms174393.aspx for info about QUOTED_IDENTIFIER - I've not tried what you're describing before but I think that by setting it ON you can avoid the quotes problem. Not sure how you'd get around the comma issue though, maybe change your delimiter to | or TAB or something else?

    ---

    Note to developers:
    CAST(SUBSTRING(CAST(FLOOR(NULLIF(ISNULL(COALESCE(1,NULL),NULL),NULL)) AS CHAR(1)),1,1) AS INT) == 1
    So why complicate your code AND MAKE MY JOB HARDER??!:crazy:

    Want to get the best help? Click here https://www.sqlservercentral.com/articles/forum-etiquette-how-to-post-datacode-on-a-forum-to-get-the-best-help (Jeff Moden)
    My blog: http://uksqldba.blogspot.com
    Visit http://www.DerekColley.co.uk to find out more about me.

  • If you're using SSIS to import the data via a flat file connection manager, then there is a setting in it's properties that may help. On the "general" section - under Format - change the "text qaulifier" field to a quote (e.g. change it to " instead of <none>).

    This will force SSIS to ignore commas in each quote delimited column, or rather treat them as part of the text.

  • Right click the connection manager and select Edit and play with the settings. You can add more columns using the Advanced tab.

    Tip: you don't have to stick to the delimiters in the dropdown, you can paste your own over it. We now use ยง as users kept mistyping | when they hit shift!

  • one thing i forgot to mentation is that rows in the data don't have the same number of delimiters(,), one row might have 20 delimiters and another row might have 4 delimiters, i don't this is helping the situation. Excel doesn't have problem dealing with the data.:-)

  • clucasi (7/3/2012)


    one thing i forgot to mentation is that rows in the data don't have the same number of delimiters(,), one row might have 20 delimiters and another row might have 4 delimiters, i don't this is helping the situation. Excel doesn't have problem dealing with the data.:-)

    When the rows are short, are they in the same positions as they are when the rows are long?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I have the tables below, I have not included scripts to insert the data as that is what the problem is, the data is provided in one text file and I have to use a SSIS conditional split to import it into the correct tables( prefix 11 imports into STREET_RECORD_TYPE11

    prefix 15 imports into STREET_DESCRIPTOR_RECORD-TYPE15

    prefix 21 imports into BASIC_LAND_AND_PROPERTY_UNIT_RECORD_TYPE21

    prefix 24 imports into LAND_AND_PROPERTY_IDENTIFIER_RECORD24 )

    Hope tis is enough information:-)

    11,"I",5,33449328,2,4230,2,2009-11-06,1,4,0,2009-11-06,,2009-11-06,2009-11-06,379982.00,401724.00,379994.00,401728.00,1

    15,"I",74784,33440849,"CLEAVLEY STREET PSSGE GBLE 9/10 WORSLEY ROAD PSSGE R/O 173-173D","","ECCLES","SALFORD","ENG"

    24,"I",285165,10070915241,"4205L000144961","ENG",1,2008-04-28,,2008-11-11,2008-04-28,,"",,"","",6,"",,"","",3400512,"1","","GROUND, FIRST AND SECOND FLOOR",""

    21,"I",285151,10070745433,1,2,2009-01-26,10004690408,382013,399704,1,4230,2009-01-26,,2009-01-26,2009-01-26,"C","M6 6BY",0

    /****** Object: Table [dbo].[LAND_AND_PROPERTY_IDENTIFIER_RECORD24] Script Date: 07/04/2012 08:43:54 ******/

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    CREATE TABLE [dbo].[LAND_AND_PROPERTY_IDENTIFIER_RECORD24](

    [RECORD_IDENTIFIER] [int] NULL,

    [CHANGE_TYPE] [nvarchar](1) NULL,

    [PRO_ORDER] [bigint] NULL,

    [UPRN] [bigint] NULL,

    [LPI_KEY] [nvarchar](14) NOT NULL,

    [LANGUAGE] [nvarchar](3) NULL,

    [LOGICAL_STATUS] [smallint] NULL,

    [START_DATE] [smalldatetime] NULL,

    [END_DATE] [smalldatetime] NULL,

    [ENTRY_DATE] [smalldatetime] NULL,

    [LAST_UPDATE_DATE] [smalldatetime] NULL,

    [SAO_START_NUMBER] [smallint] NULL,

    [SAO_START_SUFFIX] [nvarchar](2) NULL,

    [SAO_END_NUMBER] [smallint] NULL,

    [SAO_END_SUFFIX] [nvarchar](2) NULL,

    [SAO_TEXT] [nvarchar](90) NULL,

    [PAO_START_NUMBER] [smallint] NULL,

    [PAO_START_SUFFIX] [nvarchar](2) NULL,

    [PAO_END_NUMBER] [smallint] NULL,

    [PAO_END_SUFFIX] [nvarchar](2) NULL,

    [PAO_TEXT] [nvarchar](90) NULL,

    [USRN] [int] NULL,

    [LEVEL] [nvarchar](30) NULL,

    [POSTALLY_ADDRESSABLE] [nvarchar](1) NULL,

    [POSTCODE] [nvarchar](8) NULL,

    [POST_TOWN] [nvarchar](30) NULL,

    [OFFICIAL_FLAG] [nvarchar](1) NULL,

    [CUSTODIAN_ONE] [smallint] NULL,

    [CUSTODIAN_TWO] [smallint] NULL,

    [CAN_KEY] [nvarchar](14) NULL,

    CONSTRAINT [PK_LAND_AND_PROPERTY_IDENTIFIER_RECORD24] PRIMARY KEY CLUSTERED

    (

    [LPI_KEY] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    GO

    /****** Object: Table [dbo].[BASIC_LAND_AND_PROPERTY_UNIT_RECORD_TYPE21] Script Date: 07/04/2012 08:44:35 ******/

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    CREATE TABLE [dbo].[BASIC_LAND_AND_PROPERTY_UNIT_RECORD_TYPE21](

    [RECORD_IDENTIFIER] [int] NULL,

    [CHANGE_TYPE] [nvarchar](1) NULL,

    [PRO_ORDER] [bigint] NULL,

    [UPRN] [bigint] NOT NULL,

    [LOGICAL_STATUS] [smallint] NULL,

    [BLPU_STATE] [smallint] NULL,

    [BLPU_STATE_DATE] [smalldatetime] NULL,

    [BLPU_CLASS] [nvarchar](4) NULL,

    [PARENT_UPRN] [bigint] NULL,

    [X_COORDINATE] [decimal](18, 2) NULL,

    [Y_COORDINATE] [decimal](18, 2) NULL,

    [RPA] [smallint] NULL,

    [LOCAL_CUSDODIAN_CODE] [smallint] NULL,

    [START_DATE] [smalldatetime] NULL,

    [END_DATE] [smalldatetime] NULL,

    [LAST_UPDATE_DATE] [smalldatetime] NULL,

    [ENTRY_DATE] [smalldatetime] NULL,

    [ORGANISATION] [nvarchar](100) NULL,

    [WARD_CODE] [nvarchar](10) NULL,

    [PARISH_CODE] [nvarchar](10) NULL,

    [CUSTODIAN_ONE] [smallint] NULL,

    [CUSTODIAN_TWO] [smallint] NULL,

    [CAN_KEY] [nvarchar](14) NULL,

    [new] [geometry] NULL,

    CONSTRAINT [PK_BASIC_LAND_AND_PROPERTY_UNIT_RECORD_TYPE21] PRIMARY KEY CLUSTERED

    (

    [UPRN] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    GO

    ALTER TABLE [dbo].[BASIC_LAND_AND_PROPERTY_UNIT_RECORD_TYPE21] ADD CONSTRAINT [DF_BASIC_LAND_AND_PROPERTY_UNIT_RECORD_TYPE21_test_RECORD_IDENTIFIER] DEFAULT ((0)) FOR [RECORD_IDENTIFIER]

    GO

    /****** Object: Table [dbo].[STREET_RECORD_TYPE11] Script Date: 07/04/2012 08:45:09 ******/

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    CREATE TABLE [dbo].[STREET_RECORD_TYPE11](

    [RECORD_IDENTIFIER] [int] NULL,

    [CHANGE_TYPE] [nvarchar](1) NULL,

    [PRO_ORDER] [bigint] NULL,

    [USRN] [int] NOT NULL,

    [RECORD_TYPE] [smallint] NULL,

    [SWA_ORG_REF_NAMING] [smallint] NULL,

    [STATE] [smallint] NULL,

    [STATE_DATE] [smalldatetime] NULL,

    [STREET_SURFACE] [smallint] NULL,

    [STREET_CLASSISFICATION] [smallint] NULL,

    [VERSION] [smallint] NULL,

    [RECORD_ENTRY_DATE] [smalldatetime] NULL,

    [LAST_UPDATE_DATE] [smalldatetime] NULL,

    [STREET_START_DATE] [smalldatetime] NULL,

    [STREET_END_DATE] [smalldatetime] NULL,

    [STREET_START_X] [decimal](18, 2) NULL,

    [STREET_START_Y] [decimal](18, 2) NULL,

    [STREET_END_X] [decimal](18, 2) NULL,

    [STREET_END_Y] [decimal](18, 2) NULL,

    [STREET_TOLERANCE] [smallint] NULL,

    CONSTRAINT [PK_STREET_RECORD-TYPE11] PRIMARY KEY CLUSTERED

    (

    [USRN] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    GO

    /****** Object: Table [dbo].[STREET_DESCRIPTOR_RECORD_TYPE15] Script Date: 07/04/2012 08:45:39 ******/

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    CREATE TABLE [dbo].[STREET_DESCRIPTOR_RECORD_TYPE15](

    [RECORD_IDENTIFIER] [smallint] NULL,

    [CHANGE_TYPE] [nvarchar](1) NULL,

    [PRO_ORDER] [bigint] NULL,

    [USRN] [int] NOT NULL,

    [STREET_DESCRIPTOR] [nvarchar](100) NULL,

    [LOCALITY_NAME] [nvarchar](35) NULL,

    [TOWN_NAME] [nvarchar](30) NULL,

    [ADMINISTRATIVE_AREA] [nvarchar](30) NULL,

    [LANGUAGE] [nvarchar](3) NULL,

    CONSTRAINT [PK_STREET_DESCRIPTOR_RECORD-TYPE15] PRIMARY KEY CLUSTERED

    (

    [USRN] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    GO

  • Divide and conquer by extracting each set of records to its own file. Then import each file into the relevant table.

    You can use a utility to extract each set of data out from the combined file. Shouldn't be too hard since your table type identifier is at the start of each line - e.g. from command line:

    findstr /B "11" Input_Data.csv > Input_Data_11.csv

    findstr /B "15" Input_Data.csv > Input_Data_15.csv

    ...

    Would that work for you or do you have to use SSIS to split the records for some reason?

    (Note: findstr does not support unicode, so if your text file is unicode then you'll need to use another util that does support it.)

  • thanks for your reply the only problem with that is in column3 is the process order, they have to be done in that order.

  • Hi clucasi

    I do not know if you resolved your issue; your scenario can be handled by a flat file data source and one script component; It is written with BIDS 2008 using C#.

    Your flat files are multi ragged records; your sample has four different record's types and each one with a different columns count.

    This is the solution explanation:

    DATA FLOW

  • Flat file data source
  • A flat file connection manager; generated at the time you configure the flat file data source
  • Script Transformation Component
  • It also shows the Row Count component with a data viewer just to test-validate the output.

  • Row Count component
  • Data Viewer between the Script Transformation and the Row Count components
  • IMPLEMENTATION

  • Add the Flat File Data Source in your solution data flow
  • Click on its New button in order to add a new flat file connection manager
  • Give it a name
  • Navigate to the location of your flat file, selecting it
  • 1. Select the flat file format as Ragged right as shown in the figure below

    2. Now click at the Columns node and select the Record Length; I used 300 characters, so you should scroll the screen to the right to select that length.

    3. As a verification click on the advance node and look at the Column 0 length; it should show 300 bytes.

    NOTE It is up to you to select the record length, I am using 300 bytes because I am not familiar with your data ๐Ÿ™‚

  • Add the Script Component into your data flow
  • Connect the Script Component with the Flat File Data Source component
  • Double Click on the Script Component
  • 4. Click on its Input Columns node and select the Column 0 assigning it the alias Column_SrcRec

    NOTE The alias name Column_SrcRec is used later in the script (attached), so be sure you type the name as it appears in the image below:

    5. Step 5

  • Click on the Inputs and Outputs node
  • Expands the Input 0 and Output 0 nodes as shown on the image below
  • You should be able to confirm the name of the Input Columns row as Column_SrcRec
  • Click on the Output Columns node, and click 28 or 29 times on the Add Column button (as many times as the maximum number of columns on the longest record)
  • Change the Data Type and Length of each columns but Column 1 to String (DT_STR) and 100
  • If the column length is not long enough the script will fail with an error, setting them to 100 is safer but you should use more realistic lengths, I choose 100 because I am not familiar with your data
  • The first column data type is its default type four-byte signed integer [DT_I4] because based on your data, the first column is an integer
  • Now, click on the Script node, then on the Edit Script button; download the attached code ScriptCode.txt copy everything in this file and replace everything in the script code.

    You can add a Row Count component with a data viewer, so you can monitor the results, anyway the next image shows it.

    FINAL NOTES

    I did not use your columns names, you can fine-tune this solution by using more meaningful columns names, if you do, be aware you will have to change the script as well.

    You will notice by the last picture, that the columns appeared backwards, I mean, Column 1 with the first column of information in the flat file is the very last on the pipe, it should not be a problem.

    Let us know if this solution works for you.

    Cheers,

    Hope this helps,
    Rock from VbCity

  • Thanks for your solution, iam am getting the following error

    :w00t:

  • It seems there is not script code in your script component double click on it and:

    1. make sure you selected C# code, if you did not, or if you are using VB.Net, then the code attached to my previous reply will not work.

    2. Click on the Edit Script button, then download the ScriptCode.txt file attached to my previous reply, copy its content, and replace everything inside the script code with the code from the ScriptCode.txt, in other words, copy and paste.

    3. Click on the OK button, that should be it.

    Would it be possible to confirm if you are using BIDS 2008, or BIDS 2008 R2?

    Cheers,

    Hope this helps,
    Rock from VbCity

  • Cheers that seem to solve that problem,

    now for the next error:-)

  • Its worked:-):hehe:

    I found the error, the text file had a blank row at the end of the text file.

    thanks

  • Just a quick question will the solution you supplied sort the problem of comma's inside text eg '21,2000,"5 Bomley,way",xxxxx

  • Viewing 15 posts - 1 through 15 (of 17 total)

    You must be logged in to reply to this topic. Login to reply