Blog Post

Power BI VertiPaq Engine: Optimizing Column Encoding

,

Power BI encoding is a powerful optimizing option that is often overlooked because it’s not visible in neither the Power BI Desktop tool nor in Power BI Service. Natively, the VertiPaq engine in Power BI investigates all columns in the data model and determines how it can store that data most efficiently. To achieve maximum compression, the VertiPaq engine starts by encoding each column which determines the method of compression applied to that column. There are a couple types of encoding – value and hash.

Value and hash encoding work differently and can drastically impact the size and performance of your data model. Koen Verbeeck defines these two types of encoding very well (blog):

Value encoding – Here the value of the cell itself is used. It’s possible a mathematical operation is included. For example, if a column contains the values 10,000 and 10,001, SSAS can subtract 10,000 from the values and keep the results 0 and 1, which can be stored using fewer bits. Whenever the data is needed, SSAS only needs to add 10,000 again to retrieve the original values. Typically columns used in aggregations benefit from value encoding.

Hash encoding – With this type of encoding, the values are transformed into meaningless integers. This transformation step is then kept in a dictionary, so SSAS can translate the integers back to the original value. Hash encoding is used for string columns and for every column where value encoding is less efficient (for example if the column is very sparsely distributed). Typically columns used as group-by columns or foreign keys benefit from hash encoding.

In general, your data model will consume less storage and populate visuals faster if you have more columns that are compressed using value encoding. Value encoding is only available for numerical columns, which is part of the reason star schema is so powerful in Power BI. If you have a standard star schema, the table with the greatest number of rows/records will be a fact table which will have entirely numerical column types. Ideally, all of these numerical columns will be value encoded by default, but occasionally the VertiPaq engine requires a little extra guidance. While going through the steps below, you can monitor your dataset size and column encoding using the VertiPaq Analyzer.

Unfortunately, there’s not currently a way to check or even change the encoding for a column within the Power BI desktop. Thankfully, the guru’s over at Tabular Editor have exposed this for report developers. If you’re unfamiliar with Tabular Editor, I highly recommend downloading the free version (Tabular Editor 2) and checking out the getting started information on their website. To enable changing the Encoding Hint Type, you will need to go to File > Preferences and ensure that “Allow unsupported Power BI features (experimental)” box is checked. Without this box checked, it will appear that your code is running but no changes will be made within the model.

Once inside Tabular Editor, navigate to the C# script window and input the following code to see the current encoding type. If the encoding type is default (like the example below), this means the engine will determine the encoding on the fly.

Templated code snippet - all caps values are variables
var table_TABLENAME = Model.Tables["TABLENAME"];
var column_nameid = table_TABLENAME.Columns["COLUMNNAME"];
Output(column_nameid.EncodingHint);
Example in screenshot
var table_allocatedcharges = Model.Tables["Product"];
var column_acudfid = table_allocatedcharges.Columns["ProductKey"];
Output(column_acudfid.EncodingHint);

Unfortunately, you cannot set the encoding type itself, but you can set the encoding hint to encourage that column to be encoded one way or another. To change this encoding hint type, use the code below within that same C# editor window and hit the green run arrow. One of the major benefits of using a tool like Tabular Editor is programmability. In the script below, we are using a foreach statement to loop through all of the Int64 columns within our data model and setting the encoding hint to Value.

// for all Int64 columns set EncodingHint
foreach(var column in Model.Tables.SelectMany(t => t.Columns))
{
if(column.DataType == DataType.Int64)
column.EncodingHint = EncodingHintType.Value;
}

Let’s run the first script again and see if that EncodingHintType changed. And boom! Now it says Value! Please note, if yours still says Default, navigate to File > Preferences and ensure that “Allow unsupported Power BI features (experimental)” box is checked. Unfortunately, changing the encoding hint type falls under the unsupported Power BI feature category.

If you are using the VertiPaq Analyzer, you may notice that not all Int64 columns may be encoded as Values. Unfortunately, there are limitations to which columns will be encoded even after running the script. Keep in mind, the encoding hint type is just a hint. This community post covers some of the exceptions that fellow developers have run into: VertiPaq Decimal Columns Encoding Type Help. Marco Russo, Microsoft MVP & co-author of SQLBI, summarizes this phenomenon by saying “The hint is not enforcement. If the column cannot be represented as a value encoded column, it’s compressed as a hash encoded column. For example, the value encoded is 32-bit, if you have values outside the 32-bit range, it will be always hash-encoded. Also keep in mind that encoding hints are applied to the following refresh.”

In the sample script, we only looped through Int64 whole number columns. You can adjust this script to also look for decimal type columns for further optimization. Don’t forget to use VertiPaq Analyzer and enjoy watching that dataset shrink!

Additional Resources

Original post (opens in new tab)
View comments in original post (opens in new tab)

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating