How to remove few rows at begining (vary from each file) of excel file before load it to sql table please

  • i have a excel file source,

    i am working on load the excel file(source) data to sql server(destination) table.

    in my excel file first few rows were about the file some kind of information upto first 20 or 25 rows (not always exactly same),

    these first rows are about what filters they select to create this excel file and some kind of information who created and when it was created.

    so it need to eliminate all these lines till we got Column Name ActName

    then we need to start load the data to the sql table

    how can i do this please help me

    also 2nd thing here is

    in my excel file i have 10 columns, in these 10 column some selected 6 columns(same columns) is same then it considered as duplicate, and need to remove these duplicates

    how can i delete those duplicated columns, is there any way to delete at loading level, or i have to do it at sql server table level (after load the data)

    Please kindly help me

    Thanks in advance

    asita

  • If the destination table is an accumulation of these excel files and you are using DTS to do the loading, consider loading the excel into a Staging table first. This table should have an Identity column as the key in order to keep the records in correct sequence. Then you can search for the row with a value of your column header, and pull out the records with an identity key higher than that. The Staging table also allows you to remove your dups prior to loading into your final destination table. So...

    1) Truncate Staging table

    2) Load all rows from Excel to the staging table

    3) Select your rows for insert into final destination by using IdentityKey > (select min(IdentityKey) from Staging where Column1 = 'Column Header')

    If the column data types are not accommidating, you may have to bring everything in as a string, and then convert to the destination data type while inserting from Staging to the Destination.

  • Using a staging table is a good idea, it's especially handy when trying to diagnose why a particular record either did or did not load like you wanted it to.

    Some things I've done and seen done to handle similar problems with both excel and flat source files are ...

    1) Use a staging file. It's similar to using a staging table except you are writing out to file instead of a table. We do this typically when we have a lot of back and forth with our customers on what should be allowed to load in or not.

    2) Use a constraint expression on the Control Flow tab. This can be used to do different things with your source file depending on a certain "error" condition you trap for. An example would be that you are looking for a specific file name.

    3) Use a Conditional Split transformation on the Data Flow tab. This can be used to load certain rows to a "good' file or table and certain rows to a "bad" file or table.

    4) Use a Script Component transformation on the Data Flow tab. This is most handy for us when trying to do things like verify that each row in a file has the right numbers of columns in it or is a "valid" data row. You read in each row of the file as one big long character with basically no delimiters. Then you loop through each character in the line. You can do things like count commas and (and even add them in if you want to). You can also filter out any number of header and footer lines if you see a particular data value like the word "Totals" or a column header value or something like that.

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply