Best practice dealing with dead columns in dimensions

  • Good day,

    my question is if there is a way to deal with user related slow performance in Excel in pivot tables.
    The main example is the material dimension in which my users do analysis based on a single material.
    They  search for unique a material_id in the Dimension which has around 9,5 million rows. 

    This causes waiting time because i guess excel has to search through all the 9,5 million rows to find the correct material?
    I can't judge on experience that's why i used the term "large"  in brackets, there must be (especially in bigger companies) dimensions way bigger than mine where people do the same.

    The only thing i could think of as alternative would be adding a hierarchie made of left(material_id,3),(material_id,6),... where users click through the hierarche in excel to find their material id.
    Since my users mostly copy paste the id into the search/filter it's not the best solution?
    What other options/alternatives are there?

    So in general there are quiet long wait times for analysis regarding material, because of the number of rows (atleast that is what ive been told by a consultant).

    Thanks

  • First of all, Excel can only handle 1,048,576 "rows".   That doesn't mean you couldn't fill 9 or 10 columns with that much data, but even if you could have an Excel spreadsheet with 10 million rows, expecting it to search that many rows quickly is absurd.   It has no "indexing" methodology by which to conduct such a search.   That much data belongs in a relational database, where it can be indexed and  made searchable in a much more reasonable time frame.  Expecting Excel to do what it takes an RDBMS to do is not practical.   You could, however, develop some VBA code to do a SQL Server query that can return information based on their typed in value.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • I guess i failed to myself properly, unless its still the same issue.
    Im not talking about searching for material _Id IN the excel file itself, but in the filter option.

  • If you're talking about a filter in an Excel column, yes it's still basically the exact same problem and you still can't possibly have all 9.5 million rows in Excel because it just doesn't allow that many. If you're talking about just selecting values in the Power Query editor, it doesn't necessarily generate very good SQL, you might be able to index columns to help somewhat but it may be better to use stored procedures to supply data instead.

  • andycadley - Thursday, September 13, 2018 12:34 AM

    If you're talking about a filter in an Excel column, yes it's still basically the exact same problem

    This.
    So this is basically an issue regarding the multidimensional model in itself?
    Multidimensional  is not made for Databases of this size/number of rows?
     

    andycadley - Thursday, September 13, 2018 12:34 AM

    you still can't possibly have all 9.5 million rows in Excel because it just doesn't allow that many.

    I still don't know what of my writing so far hinted that i was talking about having all 9,5 million rows in my Excel.
    Its about the performance working with specific ones within the 9,5 million

  • Excel has a lot of things called filters and it's not at all clear which one you are talking about. How is the data getting into excel? Where are the filters being specified?

  • andycadley - Thursday, September 13, 2018 2:40 AM

    Excel has a lot of things called filters and it's not at all clear which one you are talking about. How is the data getting into excel? Where are the filters being specified?

    OH Sorry

    i just added some random columns, but changing (first selection of) the material_id is what takes alot time

    How the data get into excel?
    All i can say is that users add analysisservices as data source by adding the olap data source and the cube.

  • ktflash - Thursday, September 13, 2018 5:02 AM

    andycadley - Thursday, September 13, 2018 2:40 AM

    Excel has a lot of things called filters and it's not at all clear which one you are talking about. How is the data getting into excel? Where are the filters being specified?

    OH Sorry

    i just added some random columns, but changing (first selection of) the material_id is what takes alot time

    How the data get into excel?
    All i can say is that users add analysisservices as data source by adding the olap data source and the cube.

    Then that's a cube performance issue, and not necessarily an Excel issue, unless the process that you use delivers that much data to Excel and expects Excel to filter it...

    SSAS performance is a difficult nut to crack, but large dimensions make it much more challenging.   Seems to me an MDX query might do a better job of retrieving the data for a given material id value.   You also may still have to worry about the limits of Excel if the amount of data being returned is equally voluminous.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • Ok i found a huge reason for my performance issues and i wonder how the best practice is?
    Situation is the following:
    -  theres  huge amount of  columns in the  dimensions which have no according measures, so they are dead weight
    --> what i restrict the number of columns in the dimension by creating a exec that inserts the distinct values for each measuregroup view into an additional table
    --> now i added an additional join into the view for the dimension to only select those columns that have actually measures in the cube

    i could reduce the number of columns by a huge margin, up to 80% and therefore improve the performance
    -->  is there an best practice for this situation? how do you keep the number of rows in a dimension to a minimum

    another step ill be trying to do in talks with the controllers to reduce the number of rows again by aggregating values of a certain age:
    - for example: dimensions older than 4 years dont need to be on the id level, the aggregated level above type is enough
    --> basically all ids older than 4 years will be  reduced to the number of types

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply