Get "latest" value for client data from cube

  • I built a demographics cube based on our client data and this is the first analysis cube I have created and is in production, so forgive my ignorance about this question. We ran an analysis the other day on fiscal year end data and the numbers were not matching up. So I drilled the data down further going to the granular level of the actual client. What I found is that some clients were being counted twice because their "population identifier" changed during the fiscal year. For instance they could have been classified as a minor and then changed to an adult. Now the one client is being counted twice. I remember reading somewhere online about the "SCD" or slowly changing dimension. Is this scenario what that is referring to? I started to build a query using the ROW_NUMBER to get the latest data based on when the query is run and it works, but in the business process I am not sure where to include it. Should it be a named query and a separate dimension? I am getting confused on this issue and any help I would appreciate. If someone can point me in the right direction.

    Thanks

  • An SCD (Slowly Changing Dimension) is usually applied in the back-end database, and it means that if certain attributes change you will track the new and prior values. There are different ways in which you can track these changes, and I'd encourage you to read up about the different types. The most commonly used ones are type 1, type 2 and type 3 changes.

    What you're describing here can typically be dealt with by implement a type 2 SCD, meaning that you will have effective/termination dates within the dimension table and you will insert a new record when certain attributes change (tracking attributes). It is important to note that with all SCD changes, you still retain the business key (customer number in your case) in order for you to uniquely identify the entity. A distinct count of the business key will then give you the correct numbers in the cube.

  • This is a slowly changing dimension scenario. Your implementation depends on whether or not you want to record history (broadly speaking a Type 2 SCD) or not (broadly speaking a Type 1 SCD). An SCD starts at the database design level but if you want to use the logic for SCD 2 in a cube then you need to set that up too.

    It's been covered here before: http://www.sqlservercentral.com/Forums/Topic1065481-17-1.aspx

    Edit: It seems Martin and I are answering questions at the same time 😀 All yours Martin, I'm off for a nap.


    I'm on LinkedIn

  • Thanks guys for the information (I marked both answers as solutions) The business here determined to go with the latest historical data for the time period they are running. So if a client was a child early in the fiscal year, but then turned to an adult mid-way through the year and they are running data for a year-end report, then they will go with the Adult value. This would be SCD Type 1 I know, which helps because right now the actual database warehouse table that this data is coming from, I cannot change. I would need to put in a request to those in charge of that and have it done. So I am left with the cube and its design. My issue (and still not sure how to resolve this) is that when they are analyzing the data for the whole year, their clientID is only counted once, but their population identifier is being counted twice because they are marked as 1 Child and 1 Adult. What I need it to do is to ignore that Child identifier and just count the "latest" value.

    Would that be done in the dimension itself? I don't think it would be a calculated value.

  • You will need a calculated measure to perform the distinct count over that attribute, because what you are trying to do is count at a level different from the grain of that dimension...so even creating a measure group from that dimension table will not give you the ability to do that.

  • Thanks and you have been a tremendous help. Although due to other projects on my plate, I will not be able to revisit this one until tomorrow morning, but I will look at creating a calculated measure for distinct counts

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply