Getting Data from MSExcel

  • All-

    I have xls files put in a share that I need to access with SSIS. The formatting of the excel files is weird. The first 8 or 9 rows are all headings or random text, but about the 10th row starts the data I actually need to extract. The problem is that when I try to use the import export wizard, SSIS thinks the datatypes of all the columns are strings, bc the numberical data I need doesn't start until about the 10th row. That results in the import/export wizard thinking all of the numerical data is nulls, so I can't use that option.

    Anyone here know any ways to just extract the data that I need?

  • Do I have a problem with the way I post information? I've probably posted 4 or 5 things in this forum that get views, but never any replies. Is there something I should do different or some format I should use?

  • Follow the link at the bottom of my post for help on getting help. I am sorry I don't have an answer on your Excel issue.

    For help on getting help, follow the link below:
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

  • Thanks...I had seen/read that post and made sure I was following that advice. Maybe my questions just suck, though I thought they were relevant and probably stuff others would need help on.

  • Sounds to me that you will need to prep you excel files by stripping the useless data from the beginning of the file and then maybe create a standard set of column headers to place before the data rows. Then I bet you could get an SSIS package to run. I think you can set the data type in the wizard during creation of the package.

    I bet someone smarter than me could help you with a script that could trim the unwanted data rows from the beginning of the file, as long as the data is consistent.

    For help on getting help, follow the link below:
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

  • That's what I've been doing as a workaround. The problem is that we're working on building a system that needs to be fully automated by July, meaning that I need to figure out a way to automate that workaround. I was thinking about/have been researching the script component, but am not really sure what I should be looking for.

    Any other suggestions?

  • I know on the flat file source you can specify to ignore/skip n lines, where you get to set n. Is this on the Excel source as an option? If you manually trim the top n from the file and then use it as the template for the connection, the datatypes should read ok. You'd then have to have something like the 'skip/ignore n rows' I mentioned above to work in conjunction with this too.

    HTH,

    Steve.

  • Steve-

    Thanks for the reply. As far as I can find, the Excel connection manager doesn't have that option. It just allows for choice of file, and that's about it. I'm thinking maybe I need to use OpenQuery (or OpenRowSet...I can't remember what its called). I just want to see if anyone has any other ideas before I jump into having to learn that for this project.

  • I know it sounds like an obvious question, but any way to get your supplier of the file to mod their output? Let's assume that you work through OPENQUERY or use the script task, what consistent indicator are you going to use to determine that you're now at the data part of the file?

    Steve.

  • Getting the supplier to mod the file would be nice and solve tons of problems that we have, however, they established that we will adapt to the data as they provide before I started on the project.

    As far as a consistent indicator goes, I'm not exactly sure what I need. I can use a row number, as the headings are the same everytime, they just last for about 10 rows. I can probably also use the first row of numerical data, as well. What qualifications do I need to meet to use something as a consistent indicator?

  • The row count would/should be ok, i'm just anal about the difference between

    about 10 rows

    and "always 10 rows" 😀

    Just FYI, this link may offer some insight on using the script component of SSIS with Excel files, seems to be all about opening, reading etc.

    Steve.

  • I should have been clearer. It is always the same number of rows, I just don't have the file open in front of me to say the exact number. I just meant its about 10 rows as opposed to like 25 rows in case that would make a difference. If the row number is an option, then I can use that. I'll take a look at the link you gave me. Thanks a lot for answering!!

  • you could use the sqlcommand within the excel data source something like select * from [sheet1$], i think you should be able to get the data in that way.

  • Could you use the ISNUMERIC function in SQL to test whether the cell contents are seen as numbers? For example, if all the cells in the first row of data have numeric values you should be able to use WHERE (ISNUMERIC(CELL1) = 1) AND (ISNUMERIC(CELL2) = 1) AND so on etc... I admittedly have not built a lot of SSIS packages, so this may be more difficult than it sounds...

  • If i understand your problem, the solution is pretty simple.

    Use a select with a range:

    SELECT F1,F2,F3,F4,F5,F6,F7,F8,F9

    ,f10,f11,f12,f13,f14,f15,f16,f17

    ,f18,f19,f20,f21,f22,F23,F24,f25

    FROM

    [Sheet1$A7:Y9000]

    Set the range to the first valid row through the max of whatever possible last valid row might be.

    Bad data skipped over.

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply