Powershell and Excel and SQL Server

  • Is there a way to read an entire Excel sheet into a Powershell variable, including the values as well as the cell formatting (specifically, Interior.ColourIndex), as opposed to the RBAR-CBAC (Cell By Agonizing Cell - to extend Jeff's nomenclature), method?

    Thanks!

    P

  • You'll have to clarify a little.

    My most simpleminded reading of your request sounds like you want all the contents of all the cells (including some description of the cell formats) all strung together and stored into a single Powershell variable?

    Perhaps you mean some sort of array or hashtable?

    Or do you mean into a singe Object of some sort that you can later manipulate?

    If the last, and your source file is in .xlsx format, and you are comfortable looking at C# examples and translate into powershell, then look at

    http://closedxml.codeplex.com/

    which I have used to import excel tables.

    A simpler one to use is

    http://exceldatareader.codeplex.com/discussions/432523

  • I'm doing this in a loop: it's taking about .5 seconds/row to access these cells, + another .5 seconds for processing.

    $nId = $UserWorksheet.Cells.Item($intRow, 1)

    $cE_Description= $UserWorksheet.Cells.Item($intRow, 2)

    $cF_Description = $UserWorksheet.Cells.Item($intRow, 3)

    $nParent_Id= $UserWorksheet.Cells.Item($intRow, 4)

    $cOther= $UserWorksheet.Cells.Item($intRow, 5)

    $cActive = $UserWorksheet.Cells.Item($intRow, 6)

    $nExclusive= $UserWorksheet.Cells.Item($intRow, 7)

    $cDefaultChecked = $UserWorksheet.Cells.Item($intRow, 8)

    Not great performance on 100 rows, downright painful when the number of rows starts to get up there.

    So what I was hoping for is a way to get the entire sheet's data read into an array -- or an object -- all at once, I'm thinking that should speed up processing.

  • Yes. If you can guarantee that the spreadsheet will be in .xlsx form it makes things a lot simpler. Remember, an .xlsx document is really just a suite of XML files. Change the .xlsx extension to .zip on some spreadsheet, and you can explore how the data is arranged.

    Microsoft has an SDK that lets you manipulate these files with a .net object model.

    Welcome to the Open XML SDK 2.5 for Office

    http://msdn.microsoft.com/en-us/library/bb448854(v=office.15).aspx

    But most people find it very difficult to use.

    Someone has written a "wrapper" for this that makes it more like the the Excel Object Model that you used in your example

    ClosedXML

    http://closedxml.codeplex.com/

    Most of the examples are in C# or VB -- but you should be able to translate it.

    I used this successfully at a prior employer and I can't track down any sample code of what I did - but it does work.

    And since you are just reading text files that are XML formatted - it is pretty fast - AND you don't need Excel installed on that machine.

    Also Microsoft really discourages use of the Excel Object Model in a server environment. Every time you do something it spawns an instance of Excel - and try as you might you can't guarantee that the instance is always terminated - so you wind up with dozens or hundreds of Excel instances gumming up memory.

  • Thanks Andy.

    I had a look at the closed XML thing, seems rather complex for my needs.

    Let me get down to specifics: I have a worksheet object, got the range I in question defined.

    I want a list of the rows where any of the cells on that row have cell.Interior.ColorIndex <> -4142.

  • AFAIK all the strategies to speed up reading Excel involve finding ways to avoid using the COM model.

    You can cause excel to save the data as .cvs and then read the .cvs file.

    If the spreadsheet is laid out as a nice table, you can use ODBC to read the file.

    But I don't know of a way to get at the color data using these methods.

    If you control the layout of the spreadsheets, you could write a Function in VBA in the spreadsheet. Said function would return a number indicating what color was found in the referenced cell. Insert a "helper" column next to the column you are interested in. Refer to the function in the helper column - then when you read the file with ODBC or save it as .cvs, you will have a number value you can inspect.

    An example is here:

    http://stackoverflow.com/questions/5110963/excel-get-cell-color

  • Thanks again Andy,

    I had arrived at the same solution: a VBA function to, er, highlight the rows of interest.

    Fortunately this is an ad-hoc type job that only needs to be done every couple of months, so I do have the luxury of the RBAR solution, in that I can set it to run and fuhgetabouddit.

    Cheers!

  • Here's some feedback:

    I've found that the code below is considerably faster than my previous RBAR-CBAC solution -- although I think that's what's happening in the background of the where clause.

    $Rows = $UserWorksheet.Range("A2:H$LastRow").Cells | ?{$_.Interior.ColorIndex -ne -4142} | Sort-Object -Property row -Unique | select row

    I have 8 columns in my table, 7 of which need to be searched for ColorIndex, and as soon as I find one -ne -4142 field on a row, I can stop looking at that row. So I tried to iterate through the columns where rowid not in changed rows. Bad idea: took 7 times longer to go through 51 rows of data than the above code.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply