Power BI Embedded: Stress Testing & Capacity Planning

When deciding if Power BI is the correct reporting platform to meet your business’s needs, price is a large factor. Power BI pricing models are based around SKU’s and v-cores, but how do you know how many you will need? Unfortunately, the ratio of capacity to expected usage is not easily determined.

For example, one instance may have a very large data model that takes a lot of memory and CPU time to refresh, 20 users at peak viewing times, hourly refreshes, and the queries are all very simple and allow for query folding. Another business may have six smaller data models, 950 users at peak viewing times, daily refreshes, and the queries populating the data model are all very very complex. All of these elements impact the usage at any given time, making predicting overall CPU needs nearly impossible. Thankfully, stress testing your capacity gives us an option that is not purely reactionary.

This blog will walk through how to stress test your capacity, the elements of capacity planning, and how to understand the results of the stress test.

Prerequisites

Access to the Premium Capacity Utilization and Metrics App/Report (Utilization and Metrics Report Instructions)
Access to the reports you wish to test against
Decent CPU and memory to run concurrent browser windows
Elevated PowerShell console to run (i.e. “Run As Administrator”)
Load Test Tool (instructions below)

Building the Test

Navigate to the GitHub for the Load Test Tool and download the zip file containing the PowerShell script. There are two options of load testing tools, the default tests the “worst case scenario” where all of your users log on at the same time and continually mess around with filters so Power BI is forced to ignore it’s cache.
The Realistic Load Test Tool operates similar to the default load test tool, but instead of testing the initial load time, it tests programmable functionality that end users would likely do such as changing slicers, filters, and navigating through bookmarks with some “think time” between operations. For this demo, we’ll walk through the standard load test tool to keep it simple. Please refer to the ReadMe file for further instructions on using the realistic load test tool.
NOTE: This PowerShell script contains an unsigned PowerShell script. You must first use Set-ExecutionPolicy Unrestricted command in order to allow running of unsigned scripts. It also requires the “MicrosoftPowerBIMgmt” Power BI PowerShell modules to be installed from here.

2. Unzip the file into a folder on your desktop (or within a VM) and navigate to the Initiate_Load.ps1 file. Right click on the file to run in PowerShell.

3. The PowerShell script will walk you through a number of prompts:

How many reports do you want to configure?
- This will determine how many reports are pinged concurrently. Keep in mind, you can ping reports from different workspaces in the same PowerShell run. Type a number then hit enter.
Authentication
- A pop-up will appear, use the login with access to the workspaces you wish to test.
Select workspace index from above
- Type the number next to the workspace where the report you wish to stress test resides then hit enter.
Select report index from above
- Type the number next to the report you wish to stress test then hit enter.
Filters require FilterTable, FilterColumn, MinimumValue, MaximumValue in FilterColumn
- This will be how the script circumvents the caching feature in Power BI service. By providing a table, column, min, and max values, the script is able to pass different filter contexts to the loading reports. I recommend using a date table or fact table that interacts with the entire data model and use a numerical field (minimum and maximum do not make sense for text fields).
- Make sure there are no spaces between the variables and the commas.
- Example: I have a table called “Invoice Date”, a column called “MonthsAgo” that I would like to filter on. I would type “Invoice Date,MonthsAgo,0,4”
Enter number of instances to initiate for this report
- This is the number of browser windows that will open and ping the report(s) you have designated. Type in a number then hit enter. I recommend starting with 10 then moving up to make sure you won’t crash your machine with browser windows.
Do you want to launch configured reports?[y/n]
- If you are satisfied with the parameters set for the script, type “y” then enter. If you wish to abort the test run, type “n” then enter.

4. To test how refreshes may impact the user experience (and vice versa), go into the Power BI online service and manually refresh the Power BI datasets that power the reports you are stress testing. While the Load Testing Tool is great for testing interactive operations, testing background operations will need to be done outside this tool. There are REST API’s that can trigger Power BI datasets (Refresh PBI Dataset REST API), but we won’t cover that in this blog post.

5. If the number of browser windows exceeds your computer’s capabilities, the windows will time out and you will need to refresh the browser pages individually to get them to run again. I recommend only allowing 10 pages to be active at a time. You may notice there is a little counter in the upper left hand corner of the windows. This tells you how many times that window has pinged the Power BI service. To stop each window from pinging, you will need to close each window individually. All the windows will time out an hour after the initial script is run, since the token the API generates expires after one hour.

Tests are now completed! You will need to wait roughly 45 minutes before manually refreshing the dataset attached to the Capacity & Metrics report.

Considerations for Power BI Capacity Planning

Power BI capacity planning and management is no small task. Microsoft recommends setting your capacity size based on the size of your Power BI items (datasets, dataflows, etc.) because this will directly impact the speed of operations within a SKU (planning capacity in advance). This is a good rule of thumb, but unfortunately this method does not account for extremely high levels of interaction against a small number of items.

For example, let’s say you have one certified dataset and eight reports that use that dataset. There are 20-30 people viewing each report because it’s month end and all the analysts, account managers, and executives are prepping their presentations with screenshots from these reports. The strain on this dataset (and your capacity) will be as if there were 160-240 users interacting with the dataset. Now scale this up – imagine there are 100 people looking at each report. The usage against this dataset/item quickly grows, so your (hopefully) optimized certified dataset impacts the CPU much more than a dataset with only one report on it. That is why we must consider both background and interactive operations when estimating optimal SKU/CPU.

Capacity in Power BI is determined by a few variables:

Schema and size of data models used (background & interactive operation)
The number and complexity of required queries (background operation)
The hourly distribution of the usage of your application (interactive operation)
Data refresh rates and duration (background operation)

Thankfully, the Gen2 metrics app provides insight into the current state of the capacity based on each of these variables.

Reading the Results

The Premium Capacity Utilization And Metrics report measures your capacity usage for the previous 14 or 28 days depending on the visual. This report, much like capacity metrics themselves, can be complex and hard to understand. Microsoft has provided some documentation on the contents of this report, but let’s turn our focus to elements relating to the results Load Testing Tool stress testing we did above.

To start out, make sure you have refreshed the dataset powering the Capacity Metrics report (wait until 45 minutes have passed since running the tests to ensure the test results will be in the report).

Once you enter the report, to narrow the report down to the latest tests, pop open the filter pane and adjust the “Filters on all pages” Date filter to the date of the tests.

If you did multiple rounds of testing, it will be important to know what time you ran each test. The chart in the upper right hand side of the overview page is extremely useful for identifying if any of your tests resulted in a spike of CPU above the CPU limit. For example, I did 3 tests but only one resulted in a spike of 168% CPU. To learn more about this spike, we can right click on the spiked bar and drill through to the time point detail.

This is by far my favorite portion of the report. In this drill through, you are able to see the number of interactive operations that occurred within this 30 second interval on top of all the background operations in the past 24 hours. Let’s walk through what it all means.

The limitations of your current SKU are important to know when estimating the available workload. In our example, the SKU assigned is A1 which allows for 30 seconds of capacity CPU. Keeping that in mind, let’s see how many seconds of CPU our 611 operations resulted in.

50.6 seconds, resulting in 169.98% of the capacity. Wow! Because we used the Load Testing Tool, the user for all these interactions will be the Power BI User. However, in production environment, you should be able to tell if there is a specific user crazily clicking and filtering the report within a 30 second window. Thankfully, the user was able to successfully load the report with each filter and experienced no failures from the service, but we should likely scale our capacity if this behavior is expected consistently.

In our use case, the background interactions (refreshing data model) only accounted for .53% of the capacity over the past 24 hours. It’s important to keep in mind that while the total CPU seconds is much larger than the interactive operations, the background operations are calculated over a 24 hour time frame.

If you tested multiple datasets at a time, I recommend sorting by the artifact then holding down shift and cross sorting by the CPU(s) field. That way, you can tell if one dataset is consuming more resources than another (look at the average CPU seconds to determine that).

Using the three dots in the upper right hand corner of the visual, you can export these results into an excel or CSV to examine further or to stack results from multiple tests and compare.

For the example above, I would recommend increasing SKU’s for more CPU to accommodate the large number of interactive users that are expected. I would also recommend looking closely at the data models tested and ensuring query folding is occurring and a good star schema is in place.