Blog Post

Iterate Over Web Pages Using Power Query

,

Power Query makes it easy to pull data from a web page. As easy as deflating a football. Poof. Gone. However, you’d need a few additional steps to iterate through and get data from multiple pages. Let’s take a look.

A use case of this case can be seen when I retrieved financials for restaurant stocks from a website.

Reading from a web page

First have the URL that contains the data you need handy. Then select “Get Data” and select “Web” from the “Other” category as shown below.

data from web power query

In the “Enter a Web Page URL” window, enter your URL. For example, I use “http://www.marketwatch.com/investing/stock/cmg/financials”. This URL shows financial data for the past five years for CMG, stock symbol for Chipotle Mexican Grill.

web page data power query

If the web page has multiple tables, they’ll be shown in the “Navigator” as below. Select the table you want – you can select more than one – and click “Load” at the bottom.

web data using power query

The table should appear as shown below.

web source power query

You can add additional steps/transformations to the query to clean this table, if you want. For example, I’ll introduce the following steps:

  • Remove the last column “5-year trend” as that’s unwanted
  • Filter to include only “Sales/Revenue” row since this is the only row I want
  • Rename the first column to “Metric”

After applying the above steps, my table looks like the following

web data filtered

Iterating over multiple pages

So far, we saw how to read and transform data from a web page. As you’ve seen earlier, the URL used here contains financial data for a stock. Now imagine you’ve to get similar data for multiple stocks. You can get this by supplying the correct stock symbol in the URL. In other words, replace “cmg” with the corresponding symbol.

Wrong way

One way to get data for multiple stocks is to create multiple queries – one query for each stock – and append all of them. Append works similar to Union All in T-SQL. But as you can probably guess, this method is inefficient and difficult to maintain because you’d have multiple queries basically created by copy-pasting and changing the symbol.

Correct way

An efficient alternative is to use a function and custom column. We need to convert the query we crated earlier to pull data from one web page into a function. To do this, add the following first line and replace “cmg” in the URL with the input argument.

web data iterate pages function

Then, create another text file with the list of stocks that you need to pull data for. If you’re Power Query in Excel, you can create a new table instead of a text file. I’m working with Power BI Designer, hence I prefer a new text file. The contents of the text file are shown below:

tickers file

Importing the text file using “Get Data” –> “CSV” will create a table as shown below.

tickers table

Next, add a custom column by selecting “Add Column” –> “Add Custom Column”. And invoke the function we created earlier by passing the input argument “Ticker” as shown below.

invoke function power query web data

Since we’re attempting to access external data, you’ll be prompted to configure privacy levels. Select the appropriate options and click continue.

After adding the custom column and invoking the function, your table should appear as shown below.

web data source power query table

Finally, you can expand the Custom column to display data for all stocks.

expanded table

Scripts

The following are the M scripts if you want to try this out.

Function: ReadMarketWatch

let

ReadMarketWatch = (ticker as text) as table =>

let

Source = Web.Page(Web.Contents(“http://www.marketwatch.com/investing/stock/”&(ticker)&”/financials”)),

Data0 = Source{0}[Data],

#”Changed Type” = Table.TransformColumnTypes(Data0,{{“Fiscal year is January-December. All values USD millions.”, type text}

, {“2010”, type text}, {“2011”, type text}, {“2012”, type text}, {“2013”, type text}, {“2014”, type text}

, {“5-year trend”, type text}}),

#”Removed Columns” = Table.RemoveColumns(#”Changed Type”,{“5-year trend”}),

#”Filtered Rows” = Table.SelectRows(#”Removed Columns”, each [#”Fiscal year is January-December. All values USD millions.”] = “Sales/Revenue”),

#”Renamed Columns” = Table.RenameColumns(#”Filtered Rows”,{{“Fiscal year is January-December. All values USD millions.”, “Metric”}})

in

#”Renamed Columns”

in

ReadMarketWatch

Full Query:

let

Source = Csv.Document(File.Contents(“C:\Users\sam\Downloads\rest\Tickers.txt”),[Delimiter=”,”,Encoding=1252]),

#”Changed Type” = Table.TransformColumnTypes(Source,{{“Column1”, type text}, {“Column2″, type text}}),

#”Promoted Headers” = Table.PromoteHeaders(#”Changed Type”),

#”Added Custom” = Table.AddColumn(#”Promoted Headers”, “Custom”, each ReadMarketWatch([Ticker])),

#”Expanded Custom” = Table.ExpandTableColumn(#”Added Custom”, “Custom”, {“Metric”, “2010”, “2011”, “2012”, “2013”, “2014”}, {“Custom.Metric”, “Custom.2010”, “Custom.2011”, “Custom.2012”, “Custom.2013”, “Custom.2014″})

in

#”Expanded Custom”

Summary

In this post, you learned how to iterate over multiple web pages when the data you need is spread across multiple pages. This example can also be used when you need to iterate over web pages using page numbers.

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating