Stairway to DAX and Power BI

Stairway to DAX and Power BI - Level 1: Getting Started with PowerPivot and DAX

,

DAX, which stands for Data Analysis Expressions, comprises the PowerPivot formula language that supports custom calculations in PowerPivot tables and Excel PivotTables. While many of the functions used in Excel are included within the language, DAX also offers additional functions for carrying out dynamic aggregation and other operations with your data.  DAX was introduced with PowerPivot, a free add-in for Excel 2010 that promises true “BI for the masses.”  PowerPivot holds much promise for analysts, reporting specialists and other information consumers, particularly those who are “at home” with Excel, and who serve the enterprise outside traditional IT and development roles.

PowerPivot offers those who adopt it the capability of creating BI solutions rapidly, and often without the need for the resources required to construct traditional BI solutions.  Moreover, my experience has shown that, even in environments where traditional solutions (including everything from ETL mechanisms, underlying star-schema data sources, Analysis Services cubes, and the languages required to support all these components) are desired, PowerPivot can be used effectively to make the design, development and testing of the ultimate solution more efficient and reliable.

The PowerPivot / DAX combination insulates Excel users from many of the complexities of other components of the integrated Microsoft BI solution.  The DAX functions are intuitive, in large part, for those that work routinely with existing Excel functions, and enable knowledgeable users to rapidly extract and present the information needed to support enterprise decision makers in a timely, reliable manner.  In working with columns and tables (even very large ones) in relational data sources, we can enjoy high-speed lookups and calculations via an in-memory engine that DAX is designed to leverage in an optimal manner

In this series, our primary objective is to become comfortable with creating useful queries, within a business context, using the PowerPivot / DAX combination. As a means of achieving our objective, we will gain hands-on exposure to using PowerPivot in general, while becoming familiar with the DAX language, using formulas we construct from the DAX functions we introduce within the levels of this Stairways series. For each function we introduce, we will discuss what it is designed to produce and its operation in doing so, the syntax with which it is employed, and the data that it retrieves and presents.  The objective will be to encourage you to take the example code as you read the ‘steps’, and try it out, making changes and experimenting, so as to get the feel for the capabilities of PowerPivot  and DAX. We’ll illustrate every point with practice examples of business needs that we address with practical syntax. Try them out.

To get the most out of the series, you need to have installed either 64- or 32-bit Excel 2010, matched with the respective version of PowerPivot for Excel 2010PowerPivot for Excel 2010 is, at this writing, a free add-in that is available for download at www.powerpivot.com.  For the lion’s share of our practice exercises, which focus upon the Microsoft BI stack, we will use the Adventure Works relational OLTP and Data Warehouse database samples, as well as the Adventure Works Analysis Services database (and the Adventure Works cube it contains) sample, the installation of which we discuss in the section below.

You will also need the appropriate access rights to the sample data sources provided for SQL Server 2008R2.  Installation of the Standard edition of SQL Server 2008R2 will be adequate for the vast majority of our activities, although the Developer / Enterprise edition is certainly ideal.  I will provide references for step-by-step installation of SQL Server 2008R2 Developer / Enterprise in the section that follows, and the vast majority of the images presented in this series will reflect those environments.

It is also assumed that the computer(s) involved meet the system requirements, including hardware and operating systems, of the applications I have mentioned.

Important Note:  If you have no alternative except to work with SQL Server 2005 or 2008, the practice exercises of this series can perhaps be meaningfully completed with modification of the queries to compensate for differences in the data structures of the sample cube among the versions – although you may find this requirement cumbersome and distracting.  Because both the relational databases and the Analysis Services samples for 2008R2 differ somewhat from those of previous releases (a good example of this is that the Analysis Services date dimension, as well as the supporting relational data, has been advanced into later operating years of the Adventure Works organization), the sample formula syntax that we construct together will, when executed, deliver results which may differ between 2008R2 data sources and those of the previous releases.  While you can adjust your own steps to make up for these differences (perhaps by “checking your answers” independently), you will not have the added comfort of the “instant corroboration” available in simply comparing your results to those presented in the images and explanations I present in the exercises.

Consider working, therefore, with 2008R2, if at all possible:  learning the basics of a new language is typically challenging enough for most that are new to it, without the additional distractions imposed by working with older releases.

Additional Note: The screen captures in this series, unless otherwise noted, are made from a Windows 7 or Windows Server 2008 R2 environment, so what you see on your own machine may differ, somewhat, if you are working within another environment.

Installing PowerPivot, Analysis Services 2008R2 and Samples

To install PowerPivot, simply perform the steps as outlined at the download site (www.powerpivot.com, at this writing), on a PC with Excel 2010 installed.

SQL Server 2008 and 2008R2 each provide a virtually identical single Setup program from which you can install any or all of its components, including Analysis Services. Using the unified Setup, you can install Analysis Services with or without other SQL Server components on a single computer.  It is important, however, to understand that Analysis Services relies upon other components of SQL Server: for example, the Adventure Works cube (which resides within the Adventure Works DW 2008R2 Analysis Services database), uses the AdventureWorksDW2008R2 relational data mart in SQL Server as its data source, so, if we want to process the Analysis Services database and its cube (we cannot query an unprocessed cube), we will need to have access to its underlying relational data source and, therefore, to SQL Server and the associated sample database.

There are many possible considerations in the installation of Analysis Services, depending upon the version(s) you intend to install, the hardware in your local environment, applications you may already have in place, and so forth. Rather than trying to reproduce them all in this article, we provide the following link, which covers this subject thoroughly, yet efficiently.

Considerations for Installing Analysis Services

(http://technet.microsoft.com/en-us/library/ms143708.aspx)

Once you have determined the components you need to install, you can follow step-by-step instructions on how to start Setup, and to select the components you want to install, by following this link:

Quick-Start Installation of SQL Server 2008

(http://technet.microsoft.com/en-us/library/bb500433.aspx)

Once you have successfully installed Analysis Services (along with any other components you have chosen from the Setup program), you are ready to download and install the samples that we will be working with in this series.  A great summary of the options that are available (based upon your SQL Server version and other considerations) can be found at:

How to install Adventure Works SQL DW and Analysis Services 2005/2008 sample database and project

http://www.ssas-info.com/analysis-services-faq/29-mgmt/242-how-install-adventure-works-dw-database-analysis-services-2005-sample-database

Getting Started with PowerPivot

Once we’ve got the PowerPivot for Excel 2010 add-in, SQL Server 2008R2 and the samples noted above installed, we’re ready to get started with PowerPivot.  Opening PowerPivot by taking the steps below will put us in position to begin working with DAX formulas.  An open PowerPivot window, the point at which we will be at the end of this section, will become the beginning point of each of the levels of this series.

  1. From the Start menu, select Microsoft Excel 2010.
  2. Above the Excel ribbon, click the PowerPivot tab, as shown.

Illustration 1:  Click the PowerPivot Tab …

The PowerPivot ribbon appears.

  1. Click the PowerPivot Window button, appearing at the left of the newly appearing PowerPivot toolbar, as shown.

Illustration 2:  The PowerPivot Ribbon Appears

The PowerPivot window, containing its own ribbon, opens atop the existing Excel spreadsheet.  We can tell to which workbook the window is linked because it assumes the name of the workbook as part of its own name, as depicted below.

Illustration 3:  The PowerPivot Window, Associated with the Workbook, Appears

It is in the PowerPivot window that we load and prepare the data with which we will be working (or will continue working, with data already added to the workbook). We will typically build a relational model here.  As we’ll see shortly, the PowerPivot window displays the tables on individual, tabbed sheets, and is the central place where we import tables, create relationships, maintain column data types and formats, and view, as needed, the data that underlies our data model.

Next, we’ll designate a source from which to import data.

  1. Click the From Database button on the PowerPivot window.
  2. Select From SQL Server on the drop-down menu that appears next.

Illustration 4:  Select “From SQL Server” in the From Database Dropdown

The Table Import Wizard dialog opens next.

  1. In the top input box, titled Friendly connection name, type (or copy and paste) the following:

AdventureWorksDW2008R2

  1. Click the selector (downward pointing arrow) on the right side of the box titled Server name.

PowerPivot begins a scan of the machine to detect, and return to the selector, the available server choices.

  1. Select the appropriate server for your local environment, or type in the server name / localhost, as appropriate.
  2. Enter the authentication as required for the local environment (ideally selecting Use Windows Authentication).
  3. Select AdventureWorksDW2008R2 using the dropdown selector to the right of the box titled, Database name, at the bottom of the dialog.

The Table Import Wizard dialog, with our input, appears similar to that depicted below.

Illustration 5:  The Table Import Wizard Dialog, with Our Input

  1. Click the Test Connection button underneath the Database name selector.

A message box appears, indicating that the test connection has succeeded.

Illustration 6:  “Test Connection Succeeded”

  1. Click OK to dismiss the message box.
  2. Click the Next button at the bottom of the Table Import Wizard dialog.
  3. Leave the radio button on the dialog that appears next, labeled Choose How to Import the Data, at its default of Select from a list of tables and views to choose the data to import, as shown.

Illustration 7:  Choose “Select from a list of tables…” Option for Data Import

  1. Click the Next button.

PowerPivot loads the tables and views from the AdventureWorksDW2008R2 database into the Select Tables and Views dialog that appears next.

  1. Select the following tables,  by clicking the checkbox to the immediate left of the respective  table listing in the dialog:
  • DimAccount
  • DimCurrency
  • DimCustomer
  • DimDate
  • DimGeography
  • DimProduct
  • DimPromotion
  • DimSalesReason
  • DimSalesTerritory
  • FactInternetSales
  • FactInternetSalesReason
  1. Click the Select Related Tables button next.

The Select Tables and Views dialog appears, as partially shown, with our selection and related tables checked.

Illustration 8:  Selecting Tables (Partial View) …

The idea in these immediate steps is to import enough information into our model to allow us to do some illustrative analysis surrounding the business activities conducted by our hypothetical client, the Adventure Works organization.

  1. Click the Finish button.

The import process runs, and then we see a “Success” message, complete with a Details pane that indicates population by our choices.

Illustration 9:  Success is Indicated, Along with a List of Tables Imported

  1. Click Close.

The Table Import Wizard is dismissed, and we arrive at the PowerPivot window once again, where we see the imported data as partially depicted.

Illustration 10:  Imported Data, with a Tab for Each Table

Note that a tab for each imported table has been created in the PowerPivot window (the tabs appear in the bottom left of the window as seen above).  What we are now seeing is not an Excel table, but a view of the efficiently compressed columnar database that PowerPivot uses to store imported tables in memory.

The central pane contains the data and looks very similar to – but is not - an Excel table inside a worksheet. Keep in mind that PowerPivot tables and Excel tables are completely different objects: PowerPivot uses far less memory than the Excel tables with which we have become familiar, due to its superior data compression capability.  The database generated by PowerPivot saves space and supports highly efficient querying.

Getting Started with DAX

We learned within the introduction to this level that PowerPivot for Excel 2010 offers the DAX language as a means for extending our capabilities with PowerPivot.  Let’s get some hands-on exposure to some basic DAX expressions as a way of getting up to speed and positioning ourselves to learn about individual functions in the levels of this series.

We’ll get started with some simple exercises.  To begin, let’s assume that we have a need to create a combined Product code / name label, primarily for report parameter picklist support and report labeling, but perhaps for other uses, as well, over time.  (We will use the ProductAlternateKey for this purpose, as the ProductKey in the table is a surrogate key, and not the actual product identifier.) This simple concatenation will serve as a great starting point in our introduction to DAX formulas and expressions.

  1. Click the tab named DimProduct in the PowerPivot window.
  2. Scroll over to, and click, the header of the far right column on the tab, which is labeled Add Column.

The Formula bar (fx) is activated, upon selection of the column.

  1. Type (or copy and paste)  the following into the Formula bar:

= [ProductAlternateKey]  & " - " &  [EnglishProductName]

NOTE:  We can also use the AutoComplete feature to save typing and / or avoid typographical or syntax errors.

We can use simple column names in the above syntax because the columns reside in the same table as our calculated column.  (We have to affix the table name if the column resides elsewhere, as we’ll see in later exercises.)

  1. Press the ENTER key.

The affected area of the DimProduct tab of the PowerPivot window appears, with our new addition, as shown.

Illustration 11:  The Newly Populated Calculated Column Appears …

We note that, upon creation of the calculated column, a new column is created to its immediate right, again with “Add Column” as its label, for easy addition of the next calculated column.  Our new calculated column, containing the new combined Product label, is named CalculatedColumn1 by default.  Let’s modify this to something more meaningful.

  1. Double click the column header of the newly added calculated column, currently containing the CalculatedColumn1 label.
  2. Type (or copy and paste)  the following into the column heading:

ProductLabel

  1. Press the ENTER key.

The column header now displays the desired title.

Illustration 12:  The New Title Appears …

In our example above, we used the “&” operator to concatenate two columns, together with a text string in between.  As we will see throughout the levels of Stairway to DAX, DAX provides many operators, which, with only a few exceptions, perform just like they do in Microsoft Excel.  These operators include those shown in Table 1.

Operator

How Employed

Arithmetic Class

+

Addition

-

Subtraction (or negation)

*

Multiplication

/

Division

^

Exponentiation
Parenthetic Class

()

Grouping / Precedence

 Comparison Class

=

Equal to

<

Less than

<=

Less than or equal to

>

Greater than

>=

Greater than or equal to

<>

Not equal to

Text Concatenation Class

&

Concatenation

Logic Operator Class

&&

And

||

Or

!

Not / negation

Table 1:  DAX Operators

We will become familiar with these operators as we work through the levels of our series.

Next, as a prelude to moving into the core focus of our series, we’ll introduce DAX formulas and functions.  We will then be ready to examine our first function.

DAX Formulas and Functions

We can exploit DAX formulas to create either calculated columns or measures. Let’s briefly introduce each of these “destinations” before kicking off the examination of our first function.  We define calculated columns in the PowerPivot window, as we discovered in our last section.  By contrast, we create measures within an Excel worksheet, where one or more pivot tables / charts reside which are supported by a PowerPivot table.  The difference in where calculated columns and calculated measures are created is important to understand.

We create a calculated column from a PowerPivot window, either like we did above, by moving to and clicking the column with the header labeled “Add Column,” or by clicking the Add button in the Columns group within the Design ribbon.  (We will create calculated columns via the Add button later.)  Using either option, we input the DAX formula into the formula bar (by typing it, using Autosense, or a combination of both), as we did earlier, and then press ENTER. The PowerPivot Field List for any pre-existing pivot tables will notify us of changes in the underlying PowerPivot window, enabling a Refresh button we can use to update both the field list and the pivot table.

A measure is created from within an Excel workbook, within which a PivotTable / Pivot report must already exist.  Once we have focus within the Pivot Table / report, or within the corresponding PowerPivot Field List, we can either right-click one of the tables in the field list or choose Add New Measure.  Either action launches the Measure Settings dialog, as we shall see, into whose Formula text box we can input a desired DAX formula.  We will get plenty of hands-on exposure to creating measures, as well as opportunities to learn about the importance of measure context, as we move through the levels of our series.  For now, it’s enough to understand that we can use DAX formulas to create calculated columns and measures, and the basic differences between the two.

Next, let’s take a look at our first DAX function.

The RELATED Function

According to the Microsoft DAX Function Reference for PowerPivot, the RELATED() function “returns a related value from another table.”

The source column (“<column>”) is placed within the parentheses to the right of the word “RELATED,” as shown below:

RELATED(<column>)

The <column> placeholder represents the column containing the values that we wish to retrieve via the function.  A single value related to the current row is returned.  RELATED() requires that a relationship is in place between the current table (from which we are using the function) and the table with the related data.  Using RELATED() we specify the column containing the desired data, and RELATED() traces the relevant relationship (many-to-one) to retrieve the value from the column we specify in the related table.

RELATED() performs a lookup, therefore, based upon the relationship in place, and examines all the values within the table we specify (ignoring any filters we have put in place, if applicable).  A relationship must exist, of course, for the function to work.  If there is no relationship, we have to create one to be able to use the function.  (We work with relationships in other steps of our series, so we will become familiar with the straightforward process at a later time.)  RELATED() works much like the Excel VLOOKUP() function, but is significantly more flexible in various ways.  We will encounter various uses of RELATED(), often in conjunction with other DAX functions, throughout this series, where we will get a feel for the natural hierarchies it renders and so forth.

RELATED() can be used in a calculated column expression (the way we’ll work with it in this level), where the current row context is meaningful (we’ll talk about row context shortly).  RELATED() can also be employed as a nested function within an expression using a table scanning function, which gets the current row value and then scans another table for instances of that value.  (SUMX(), which we will examine early in the steps of this series, is one example of a popular table scanning function).  RELATED() allows us to leverage relationships easily and transparently – and in a way that the information consumer is insulated from the underlying database components.

Let’s take a look at an example of the DAX RELATED() function at work within the AdventureWorksDW2008R2 sample we’ve established.

  1. Click the Design tab atop the PowerPivot window.

Illustration 13:  The Design Tab atop the PowerPivot Window

  1. Click the Manage Relationships button on the Design tab, as shown.

Illustration 14:  The Design Tab atop the PowerPivot Window

The Manage Relationships dialog opens, and we see various relationships between tables that were defined within PowerPivot when we clicked the Select Related Tables button in our PowerPivot setup steps.

  1.  Click the fifth row from the top in the table list of the dialog, labeled DimProduct [ProductSubcategoryKey], as depicted, to select it.

Illustration 15:  Our Selection in the Manage Relationships Dialog

  1. Click the Edit button atop the dialog next.

The Edit Relationship dialog opens, and we are presented with the definition of the relationship between the DimProduct and DimProductSubcategory tables, as shown.

Illustration 16:  The Edit Relationship Dialog …

The Edit Relationship dialog tells us that the related columns are DimProduct.ProductSubcategoryKey and DimProductSubcategory.ProductSubcategoryKey.  We are therefore aware of the details of the relationship between the two tables via these columns.  The existence of a relationship will allow us to use the RELATED() function in a way that we can be certain that the data it retrieves is accurate.

  1. Click OK to dismiss the Edit Relationship dialog.
  2. Click the Close button in the Manage Relationships dialog.

We return to the PowerPivot window – DimProduct tab.  Now let’s get some exposure to the RELATED() function.

  1. Scroll over to, and click, the header of the column labeled Add Column, this time to the immediate right of the calculated column we added earlier, ProductLabel.

The Formula bar (fx) is activated, once again, upon selection of the column.

  1. Type (or copy and paste)  the following into the Formula bar:

=RELATED(D

  1. Select the following from the Autosense-enabled selector that appears:

DimProductSubcategory[ProductSubcategoryAlternateKey]

as shown in this partial view:

Illustration 17:  Selecting a Column in a Related Table …

  1. Once the selection appears in the formula bar, add a right parenthesis symbol ( “ )” )  to close the expression, which will then appear in the formula bar as shown:

Illustration 18:  The Completed Expression …

Note that we provide the table name before the column name, as the column resides in another table.

  1. Press the ENTER key.

The DAX formula containing the RELATED() function is accepted and the calculation populates the rows where a SubCategoryAlternateKey exists, leaving blank those where one does not (we can scroll down to see populated rows lower in the tab).

  1. Double click the column header of the calculated column, as we did with our first calculated column.
  2. Type (or copy and paste)  the following into the column heading:

SubCatAltKey

  1. Press the ENTER key.

The column header now displays the desired title.

Illustration 19:  The SubCatAltKey Calculated Column Appears …

Let’s add a couple more columns to reinforce our understanding of basic DAX formula creation and the RELATED() function.

  1. Click the column header labeled Add Column to the immediate right of the newly added SubCatAltKey calculated column.
  1. Type (use Autosense, or copy and paste, as desired)  the following into the Formula bar:

=RELATED(DimProductSubcategory[EnglishProductSubcategoryName])

  1. Press the ENTER key, once again.

The DAX formula containing the RELATED() function is again accepted and the calculation populates the rows where an instance of EnglishProductSubcategoryName exists, leaving blank those where one does not, as before.

  1. Double click the column header of the newly added calculated column, once again.
  2. Type (or copy and paste)  the following into the column heading:

SubCatName

  1. Press the ENTER key.

The column header now displays the desired title, as in previous steps.

Finally, let’s insert column to create a label combining the new SubCatAltKey and SubCatName columns in a manner similar to the way we created our first calculated column, ProductLabel.

  1. Click the column labeled Add Column to the immediate right of the newly added SubCatName calculated column.
  1. Type (or copy and paste, as desired)  the following into the Formula bar:

 = [SubCatAltKey]  & " - " &  [SubCatName]

  1. Press the ENTER key, once again.

The DAX formula containing the RELATED() function is again accepted and the calculation populates the rows, leaving only a solitary hyphen ( “-“ ) in the rows where the SubCatAltKey and SubCatName columns are empty.

  1. Double click the column header of the newly added calculated column, once again.
  2. Type (or copy and paste)  the following into the column heading:

SubCatLabel

  1. Press the ENTER key.

The column header now displays the desired title, as in previous steps.  The new calculated column appears, alongside the calculated columns combined to create it, as partially depicted (for a sample of populated rows) below.

Illustration 20:  The SubCatLabel Calculated Column Appears …

And so we see that our new calculated column delivers the desired results.   Let’s save the PowerPivot work to date, along with spreadsheet with which it is associated.

  1. Using the downward pointing selector in the upper left corner of the PowerPivot window (to the immediate left of the Home and Design tabs), select Save As …, as shown.

Illustration 21:  Saving Our Work to Date …

  1. Name the file ST_DAX01-1.xlsx, and save it in a meaningful location.
  2. Exit Excel 2010 as desired.

We are now positioned to pull our PowerPivot data into Excel and perform analysis via a PivotTable, present data via a Pivot report, and so forth.

We will explore many functions, within the context of various formulas that we craft to meet representative business needs, as we move into subsequent steps’ coverage of DAX, and PowerPivot in general. A grasp of the functions, as well as agile practices surrounding the use of PowerPivot in query and modeling data, will be vital to success in our taking advantage of the vast opportunities that PowerPivot offers us.  Practice with these components will assure that their use comes as second nature, and will create a foundation from which the power and elegance of PowerPivot can be fully exploited.

Summary …

With this article, I introduced the Stairway to PowerPivot and DAX series.  I began by noting that the series is designed to provide hands-on introduction to the basics of PowerPivot and the Data Analysis Expression (DAX) language, with each level progressively exposing an individual function, technique or other component designed to help you to meet specific real-world needs.  We noted that our primary objective within this series is to make those new to PowerPivot more comfortable with creating useful queries, within a business context, using the PowerPivot / DAX combination. Our means to that end will be to obtain hands-on exposure to using PowerPivot in general, while achieving an understanding of the DAX language, using formulas we construct from the DAX functions we introduce within the levels of this Stairways series.

For each function we introduce within the levels of the series, we will discuss what it is designed to produce and its operation in doing so, the syntax with which it is employed and the data that it retrieves and presents.  I hope that this approach will encourage you to take the example code I present in the levels, and try it out, making changes and experimenting, so as to get the feel for the capabilities of DAX within PowerPivot. We’ll illustrate every point with practice examples of  business uses along with useful queries.

In this, our first level of Stairway to PowerPivot and DAX, we introduced PowerPivot and DAX, discussing what you need to have in place to get started with the series.  In preparation for the exercises to follow, we walked through the performance of an import, bringing in several tables from a sample SQL Server database that is readily available via sources we noted on the web.  We next discussed DAX formulas and functions in general, and moved rapidly into our first exposure to DAX.  We looked at the RELATED() function, and exposed a basic approach to using it in a simple scenario.  Finally, we discussed the results we obtained from using the function in our practice examples, and looked forward to similar steps with the many DAX functions in future levels.

This article is part of the parent stairway Stairway to DAX and Power BI

Rate

5 (1)

You rated this post out of 5. Change rating

Share

Share

Rate

5 (1)

You rated this post out of 5. Change rating