Design considerations for a tabular model

  • I have a star schema already in existence (SQL Server fact table and numerous dimension tables). For a given tabular model I have used that as the basis of my design, grabbing a subset of the tables (the fact plus a few dimensions) and within those tables only selecting those columns that the users are going to need. Each table in my tabular design is a SELECT statement from an underlying table.

    The discussion round the water cooler this morning is that all this could have been done by creating a single view in the source database of the relevant columns in the star schema, and surfacing that view in the tabular model instead of all the individual tables. I did some googling about best practises for data preparation for tabular and didn't seem to find strong views either way. What are the considerations for deciding which is a better way to go about this?

    Cheers

    ...One of the symptoms of an approaching nervous breakdown is the belief that ones work is terribly important.... Bertrand Russell

  • GPO - Tuesday, October 24, 2017 8:08 PM

    I have a star schema already in existence (SQL Server fact table and numerous dimension tables). For a given tabular model I have used that as the basis of my design, grabbing a subset of the tables (the fact plus a few dimensions) and within those tables only selecting those columns that the users are going to need. Each table in my tabular design is a SELECT statement from an underlying table.

    The discussion round the water cooler this morning is that all this could have been done by creating a single view in the source database of the relevant columns in the star schema, and surfacing that view in the tabular model instead of all the individual tables. I did some googling about best practises for data preparation for tabular and didn't seem to find strong views either way. What are the considerations for deciding which is a better way to go about this?

    Cheers

    I've mostly done SSAS multidimensional but I prefer to use views as the source objects for my cubes.  It offers a lot of advantages over direct access to the table.  I can change the column names in my view to be more user friendly, I can filter data, and return only the relevant columns to SSAS.  It also offers some protection if the underlying table changes; of course that depends on the change.

  • I had this same conversation last week about tabular models. The biggest problem I can think of with making a single view is when you need to alter the filter context on related columns. Particularly with the date dimension. You cannot do something simple like "ALL('DimDate')" but must then list all the columns individually. Also think about functions like ISCROSSFILTERED(), etc. Most of those are going to give you problems or just be unusable if you don't group attributes into dimensions. In theory it should compress slightly better in one table, but this is only really relevant if you have large dimensions. In some cases, I may take a single large dimension and add it to the fact table as degenerate dimensions as long as I know that there won't be any cross filtering checks needed.

    This is an opinion and not hard fact. As with all situations in engineering, "it depends" is the true answer. lol.

  • Although you'll get better performance by flattening everything into one table / view, the users' experience might not be as great.
    The data might make better sense if attributes are grouped into their logical dimensions. Just something to consider.

  • @Tom_Hogan Sorry for my lack of clarity Tom. I think I led you to misunderstand my question. I was referring the tabular (i.e. vertipaq) technology specifically, and not about tables versus views, but more about a single view (with all the star schema joins contained) compared to a view of the fact plus a view of dim1 plus a view of dim2 plus a view of dim3 etc etc etc.

    ...One of the symptoms of an approaching nervous breakdown is the belief that ones work is terribly important.... Bertrand Russell

  • Paulo de Jesus - Wednesday, October 25, 2017 8:34 AM

    Although you'll get better performance...

    Thanks Paulo
    I was thinking as I wrote this, that it should be something we can empirically test. Other considerations might include memory footprint. Does one approach occupy more memory than another? How have you tested performance difference?

    ...One of the symptoms of an approaching nervous breakdown is the belief that ones work is terribly important.... Bertrand Russell

  • GPO - Wednesday, October 25, 2017 6:15 PM

    Paulo de Jesus - Wednesday, October 25, 2017 8:34 AM

    Although you'll get better performance...

    Thanks Paulo
    I was thinking as I wrote this, that it should be something we can empirically test. Other considerations might include memory footprint. Does one approach occupy more memory than another? How have you tested performance difference?

    I haven't done any detailed testing of this but from the tabular models that I've created in Azure I noticed that processing was faster with a flattened structure.
    Presumably this is because there is less overhead by not having to join.
    It would be very interesting to record the actual differences. I will definitely report back on this in more detail the next time I get a tabular project.

  • GPO - Wednesday, October 25, 2017 6:12 PM

    @Tom_Hogan Sorry for my lack of clarity Tom. I think I led you to misunderstand my question. I was referring the tabular (i.e. vertipaq) technology specifically, and not about tables versus views, but more about a single view (with all the star schema joins contained) compared to a view of the fact plus a view of dim1 plus a view of dim2 plus a view of dim3 etc etc etc.

    OK.  I probably misread the question then.

    It's been a year or so since I last played with Tabular.  I expect you'd loose flexibility as to how you can relate the data if you did one view.

  • from my own experience with a few Tabular projects performance will depend mostly on the volume, type of data and cardinality of the data (which also affects compression).

    a dimension like structure (fact of 50 Million rows, 20 dimensions of 500 rows each) will most likely compress and perform faster than a flatened version of the same. But with lower volumes it may not be significantly slower (or faster) - but query user experience will not necesseraly be the same.

    And although a flatned structure may work for some cases it will also cause "issues" with users and/or software used to report on the data. In most cases, but not all, having a dimensional like structure will make it easier to understand and to apply filters.

    Having said that it is considered a good practice in some areas to denormalize the higher hierarchy tables into the lower dimensions - this will reduce the number of joins required (which improves performance) at a slightly cost of higher storage - which on the dimensions should not be significant.
    e.g denormalize the snowflakes - but leave them as dimensions.

  • frederico_fonseca - Monday, October 30, 2017 2:14 PM

    ...fact of 50 Million rows, 20 dimensions of 500 rows each...

    Thanks Frederico. Some good suggestions there. Out of interest how much RAM do you allow for the above schema?

    ...One of the symptoms of an approaching nervous breakdown is the belief that ones work is terribly important.... Bertrand Russell

  • GPO - Tuesday, October 31, 2017 3:24 AM

    frederico_fonseca - Monday, October 30, 2017 2:14 PM

    ...fact of 50 Million rows, 20 dimensions of 500 rows each...

    Thanks Frederico. Some good suggestions there. Out of interest how much RAM do you allow for the above schema?

    There is no straight answer there - 50M rows can require 10GB or 50GB depending on the number of columns, data types, cardinality, order rows are loaded, number of rows per segment.

    Example of two completely different models I have at the moment
    Model 1
    - tabular database is 4GB in size
    1 fact table - 20 million rows - 60 columns (20 are dimension keys, remaining are measures) - final tabular model has 350 measures calculated on the fly
    20 dimensions - very small - average 15 rows per dimension with 2 or 3 with over 50 rows.

    Model 2
    - tabular database is 20GB in size
    6 tables
    2 tables with around 1 billion rows each - less than 10 columns on each table but high cardinality in some columns
    remaining tables quite small - less then 100k rows each and few columns each also
    a few dozen measures and a few calculated columns (there are the ones that cause issues while processing the cube and also the ones that require more memory for the model)

    For either of them memory requirement is 1.5 to 3 times the size of the final database if done while there aren't users accessing the models - if there are more memory will be required.

    Model 1 above is running on a crap VM - 12GB of ram with SQL Server also running on it - Tabular memory restricted to 8 GB so process clear required before process full or it blows up the memory
    Query performance is very fast - users don't even notice delay when querying the data.

    Model 2 is running on 256GB of ram - but could be done with a lot less memory

  • GPO - Tuesday, October 31, 2017 3:24 AM

    frederico_fonseca - Monday, October 30, 2017 2:14 PM

    ...fact of 50 Million rows, 20 dimensions of 500 rows each...

    Thanks Frederico. Some good suggestions there. Out of interest how much RAM do you allow for the above schema?

    If you want to know the compression ratio you're getting, you can use the Vertipaq Analyzer. (https://www.sqlbi.com/tools/vertipaq-analyzer/) I recently had a DB I ran this on and it was compressed to 1.87%of the size. I was surprised, but then realized this was a test DB and the data was not all that diverse so the compression ratio was quite good. It's said that the compressed is normally about 10% of the non-compressed side.

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply