Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

Devin Knight

Devin is a BI consultant at Pragmatic Works Consulting. Previously, he has tech edited the book Professional Microsoft SQL Server 2008 Integration Services and was an author in the book Knight's 24-Hour Trainer: Microsoft SQL Server 2008 Integration Services. Devin has spoken at past conferences like PASS and at several SQL Saturday events. He is a contributing member to the Business Intelligence Special Interest Group (SIG) for PASS as a leader in the SSIS Focus Group. Making his home in Jacksonville, FL, Devin is a participating member of the local users’ group (JSSUG).

Extracting Data From Multiple Files with Power Query

In this post I’d like to demonstrate another way Power Query can solve simple data extraction problems without much time or effort.  If you’re still learning the basics of Power Query please refer back to an earlier post here.

This demonstration will show you how to use Power Query to scan a file folder to search for a set of files to load.  Then load the contents of multiple files all at once and use transformations to format the data appropriately..

Step by Step

  • Launch Excel 2010 or higher.
  • Select the Power Query tab.
  • Select From File under the Get External Data section of the tab.
  • Choose the option From Folder, which allows you to load more than one file at once.

image

  • Browse to a folder, which has the files you desire to load then click OK. These files should all be formatted similarly to each other.  Having the same data types and column names.

image

  • This opens the Query Editor, which lists all the files that are available in the folder you selected in the previous step.  The initial view show metadata about the files like the file name, extension, relevant dates and the folder path.  If there were other files in this folder that should not be loaded you could filter them out at this point.  The other interesting columns are Content and Attributes.  The Attributes column  has additional metadata that can be queried and the Content column stores the actual data of each file.  Click the down arrows next to the column header for Content. 

image

  • The data from all four files in now combined together into a single query, but there is still transformations in the data that must be completed.  Start by right-clicking on Column1 header and select Split Column > By Delimiter.

image

  • Click OK to accept the defaults of configuration, which will split each column by a comma.
  • Next, right-click on either column header and select Use First Row As Headers to give the query appropriate column names.

image

  • You will notice the previous step only gave us column headers from the first file.  If you look in the results you will find the column headers listed for the other files too.  To remove these extra column header rows select the down arrow next to the Name column and uncheck the Name value then click OK.

image

  • You have now successfully combined 3 files using Power Query.  Click Done to bring this data into Excel.  Once in Excel this can easily be added to tools like Power Pivot for additional analysis.

Hope this helps!

Comments

Leave a comment on the original post [www.bidn.com, opens in a new window]

Loading comments...