Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

Salvo(z) SQL

Adam and Jennifer Salvo are IT professionals with over 10 years of diverse experience. Jennifer is a Business Intelligence developer focusing on the Microsoft BI stack (SSIS, SSAS and SSRS). Her prior work experience includes software development, systems analysis, end-user support, training, and SharePoint administration. Adam is a .NET technical lead with a current emphasis on Dev Ops and Windows Azure. His prior work experience includes .NET development, SQL Server administration, and BizTalk development. They also maintain a personal blog at salvoz.com.

Querying and Optimizing DAX

Over the past few weeks, I’ve been watching some videos from this year’s PASS Summit.  One of the presentations I watched is ‘Querying and Optimizing DAX’ (BIA-321-S) by Alberto Ferrari.  I took some notes during this presentation and am sharing them here.

DAX as a Query Language

  • DAX queries can be written in SSMS; select new MDX query and then write DAX.  There is no dedicated editor for DAX in SSMS at this point in time.
  • Every DAX query starts with an EVALUATE statement followed by any DAX table expression
  • Any DAX function can be used in a query
  • Query returns result as a dataset
  • Measures can be added to queries
    • Do not use memory since they are calculated at query time.
    • Useful during development: define measures in query, debug, deploy code on server.

MDX Calculated Members versus DAX Query Measures

  • Multidimensional (MDX)
    • Calculated members defined in the query can be very slow
    • Limitations in the cache usage
  • Tabular (DAX)
    • Query measures are FAST
    • No performance loss
    • Cache always active, no query versus global cache

Optimizing Tabular versus Multidimensional

  • Tabular is a simple model (no aggregations, no IO concerns – all in memory, fewer options than Multidimensional model)
  • Since there are few options available to optimize the model, optimizing DAX is the key to gain best performance
  • Multidimensional has no query plans; optimizing MDX is difficult to near impossible
  • Tabular makes query plans visible; optimizing DAX is feasible

DAX Query Architecture

  • Engine receives query
    • Is tabular model configured as ‘In-Memory’ or ‘Direct Query’?
      • If ‘Direct Query’
        • Nothing you can do to optimize it. DAX query is translated into SQL and SQL is executed against SQL Server engine. Optimizing happens in SQL Server. Result is returned to caller.
        • To see how the translation happens, use Profiler
      • If ‘In-Memory’
        • Optimization is feasible by analyzing query plan and understanding role of Formula Engine and Storage Engine
        • Formula Engine
          • Rich
          • Single threaded
          • Handles complex expressions, designed for expressivity
        • Storage Engine (VertiPaq / xVelocity)
          • Simple
          • Multi threaded, one core per segment
          • Optimized for speed
          • To optimize DAX, try to push calculations to storage engine instead of formula engine.

DAX Query Evaluation Flow

  • Build DAX Expression Tree
  • Build DAX Logical Query Plan
  • Simplify DAX Logical Query Plan
    • Some VertiPaq queries may be executed in order to gather additional information about data that is to be queried
  • Fire Logical Plan Event
  • Build DAX Physical Plan
  • Fire Physical Plan Event
  • Execute DAX Physical Plan

SQL Server Profiler

  • Catches events from SSAS
    • Query events
    • Query processing

Storage Engine Example Query

EVALUATE

ROW(“Sum”,

               SUMX(‘Internet Sales’,

               [Sales Amount] / [Order Quantity]

               )

  )

Note: ROW returns a table with one row. SUMX does leaf level calculation.

Translated to:

SELECT

       SUM(

          [Internet Sales].[SalesAmount]

           /

         [Internet Sales].[OrderQuantity]

)

FROM

[Internet Sales];

This can be calculated in the Storage Engine because it is a simple division.

Formula Engine and Storage Engine Example Query

EVALUATE

     ROW(“Sum”,

                SUMX(‘Internet Sales’,

                            IF([Sales Amount] > 0,

                                          [Sales Amount] / [Order Quantity]

                                        )

                              )

                   )

Translated to:

SELECT

          SUM([CallbackDataID(

                        IF([Sales Amount] > 0,

                               [Sales Amount] / [Order Quantity]}

                         )

                 )]

                  (   

                    PFDATAID( [Internet Sales].[OrderQuantity] ),

                    PFDATAID( [Internet Sales].[SalesAmount])

                   )

            )

FROM [Internet Sales];

This is too complex for the Storage Engine and involves the Formula Engine as well.  Storage Engine and Formula Engine work in unison to calculate the results, indicated by the CallbackDataID.

  • CallbackDataID Performance
    • Slower than pure VertiPaq
    • Faster than pure Formula Engine
      • Highly parallelized
      • Works on compressed data
    • Does not require materialization
      • No spooling of temp results
      • Less memory used
    • Materializes a single row

Slow Measures and Filters Example Query

DEFINE

MEASURE ‘Internet Sales’[Sales] =

CALCULATE (ROUND (SUM (‘Internet Sales’[Sales Amount] ), 0 ) )

MEASURE ‘Internet Sales’[YTD Sales] =

TOTALYTD ( [Sales] , ‘Date’[Date] )

MEASURE ‘Internet Sales’[QTD Sales] =

TOTALQTD ( [Sales], ‘Date’[Date] )

EVALUATE

FILTER (

     ADDCOLUMNS (

          CROSSJOIN (

                      VALUES (‘Date’[Calendar Year] ),

                      VALUES(‘Date’[Month]),

                      VALUES(‘Date’[Month Name])

            ),

           “Sales”, [Sales],

           “YTD Sales”, [YTD Sales],

           “QTD Sales”, [QTD Sales]

       ),

      NOT ISBLANK( [Sales] )

)

ORDER BY ‘Date’[Calendar Year], ‘Date’[Month]

Issues:

  • The YTD Sales and QTD Sales measures are slow to compute
  • The NOT ISBLANK filter is applied after the computation of the YTD Sales and QTD Sales

Re-Structure the Query to do the filtering before the computation of YTD Sales and QTD Sales:

DEFINE

MEASURE ‘Internet Sales’[Sales] =

CALCULATE (ROUND (SUM (‘Internet Sales’[Sales Amount] ), 0 ) )

MEASURE ‘Internet Sales’[YTD Sales] =

TOTALYTD ( [Sales] , ‘Date’[Date] )

MEASURE ‘Internet Sales’[QTD Sales] =

TOTALQTD ( [Sales], ‘Date’[Date] )

EVALUATE

ADDCOLUMNS (

     FILTER(

          CROSSJOIN (

                   VALUES (‘Date’[Calendar Year] ),

                  VALUES(‘Date’[Month]),

                 VALUES(‘Date’[Month Name])

          ),

         NOT ISBLANK( [Sales] )

     ),

“Sales”, [Sales],

“YTD Sales”, [YTD Sales],

“QTD Sales”, [QTD Sales]

)

ORDER BY ‘Date’[Calendar Year], ‘Date’[Month]

This query runs 7x’s faster than the other!

Joins are Expensive

  • Eliminate/remove if possible
    • From snowflakes
    • From lookup tables
    • From junk dimensions (don’t build junk dimensions, store in fact table)

Cache Usage

  • DAX caches all the VertiPaq query results, doesn’t cache Formula Engine query results
  • Only one level of cache in tabular
  • If your expression hits the Formula Engine heavily, then the cache is not very useful.

General Guidelines

  • Use the simplest formulas possible
    • Prefer ADDCOLUMNS to SUMMARIZE
    • Avoid complex calculations in measures
  • Don’t use error handling functions extensively
    • They are slow
    • Only good in measures
  • Build a correct data model
    • Reducing distinct count of values for columns is the main target
    • Keep simple and clear
  • Push calculations down to VertiPaq engine when possible

Top Three Fears of UDM Developers.  How Does Tabular Compare?

  • Distinct Count:  MUCH faster in tabular and easier to implement. Don’t need to modify data model to do Distinct Count in tabular.
  • Many-to-Many:  Implemented via DAX in tabular.  Exceptional performance at leaf level, better than Multidimensional.
  • Leaf-Level Calculations: No difference between leaf level or aggregate calculation, as long as you just touch storage engine. (SUMX is an iterator, this function was used in demo to analyze leaf level calculation).

Final Thoughts

  • DAX is a simple language  (simple <> easy)
  • Very effective at expressing complex queries
  • Easier to learn than MDX, faster learning curve
  • Optimizing tabular means optimizing DAX
  • Optimizing DAX means pushing calculations to VertiPaq/xVelocity (Storage Engine)
  • Sometimes the data model may need to be changed to achieve optimal performance

Comments

Leave a comment on the original post [salvoz.com, opens in a new window]

Loading comments...