ETL and data model design advice needed

  • I am often required to provide my organization with performance metrics (it is a health care organization so the metrics can be complex and change often). I would like to design something that can be flexible so I can avoid hard-coding values into my process. Often there are groups of departments, lists of diagnoses, and other common conditions that must be met to set up the population for the denominator. For example, the patient must have been discharged between date 1 and date 2 and be of a certain age range at the time of their hospital visit. Beyond those commonalities there are more complex requirements to set up the numerator and denominator based on the specific measure definition. One thing I don't want to do is set up these little silos of measures just because criteria for measures for one purpose don't neatly align with those for others (e.g. measures for patients in the ICU differ from measures for patients in the ER in a number of ways, although they both share the need for a denominator population).

    Question 1: Does it make sense to create a process that sets up the population (patient list) for all related measures that use common initial criteria (like discharge date, age, etc) and to build upon that with separate processes that do the more unique complex filtering and such? Or does it make sense to create the population and processes for each measure separately in their own stored procedures?
    Question 2: Is there a configuration structure that anyone has set up that would allow me to set up criteria definitions which could then be used with dynamic SQL? Is it even a good idea to use dynamic SQL since it is much harder to debug and tune, never mind that it is "hackable"?  (I know, that is two questions...). I also won't be here forever and not everyone is comfortable with reading dynamic SQL. 

    I know there are some really good process designers out there who can weigh in here and I look forward to your ideas.

    Thanks in advance!

  • Based on what you've provided so far, here is some general advice:

    #1   The process should be data driven: all the diagnosis codes, procedure codes, intervention and triage levels, etc. should be contained in normalized lookup tables that the end users themselves can maintain.

    #2   Whatever you do, don't implement a monolithic 10,000 line stored procedure. Instead, consider creating parameter driven stored procedures for each process. Re-usable calculations and other common logic can be implemented in user defined functions. Re-usable procedure based logic (stuff like logging, error trapping, etc.) can be implemented in smaller stored procedures.

    #3   If you do find a need for dynamic SQL, then refer to the following article by Erland Sommarskog as a guide. He also writes about other topics like error handling and sharing data between stored procedure calls that will probably correlate with what you're trying to do.
    http://www.sommarskog.se/dyn-search.html

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Eric, this is exactly what I was thinking... I want to do something "data driven". I just couldn't come up with the term! And thank you for confirming that the monolithic 10,000 line stored procedure that does it all is not appropriate. What I'm struggling with is how to work with the lookup tables and criteria without using dynamic SQL. Here is an example: 

    Two measures for ICU: Average length of stay in the ICU, Mortality rate in the ICU. 
    Each measure is calculated monthly
    Definition of an ICU patient is common to both: Patient who spent 60+ minutes in an ICU department
    I have a lookup table that defines groups of values. It has a Group_ID (surrogate key) GroupCode, a GroupName, a string value and a numeric value and effective dates (unique by GroupCode, value and effective start date). For this purpose I have a bunch of records that are the ICU department IDs so I can look up whether a patient was in one of those departments. In this example, the Group Code = 'ICU' and the name is 'ICU Unit'. Each record has a unique Unit ID in the numeric value column. I also have a lookup table that ties individual measures to the GroupCode records it uses. So my measure for ICU length of stay and for ICU mortality both would be tied to the ICU group code. 

    I also have a dimension for my measure definitions that has a Measure_ID (surrogate key), MeasureCode (business key), a MeasureCategory (for grouping in dashboards), and for reference I have a MeasureValueDataType (Numeric, Integer, varchar, date, etc)  and a MeasureValueType (number, percent, money, etc). I also have a reference column to write out the basic calculation of the measure in pseudo code. 

    One measure uses patients who were discharged in the previous month, but the other uses patients who were admitted in the previous month. 
    How would I drive which column (admit date vs discharge date) to use while maintaining a single process? Should I add a "DateBasisCode" column to my Measures dimension ("A" = admit date, "D" = discharge date) and drive it that way? If I do, then I can't have one list of patients that I can work both measures off of. 
    Your thoughts?

  • Well what kind of reporting tool are you using for the front end?  Like Eric said don't try to have one giant anything that does everything for you.  Identify elements that are going to be common across most or all of the reports, build what you can around those then identify the elements that are going to be different.  How you handle those will likely vary based on what the end users are doing with them and how technically savvy they are and whether they're going to be need everything handed to them in a neat little bundle 🙂   How granular your part will have to be will depend on those answers.

  • Not sure why that matters. I am using SQL Server 2016 to develop a data mart for these measures which will probably be brought into PowerBI.

  • I wouldn't say this is impossible, Diana, but it might be impractical. Reasoning - updates to ICD9, CPT and practice guidelines may mean that you have to redesign your "generic" solution anyway because now we're looking for risk factors that we didn't care about last year (maybe Medtronic had another recall, or studies now show that interferon is doing something wacky it wasn't before, or population mix changes completely because, oh, I don't know, the government decides to get rid of Medicaid completely). Particular metrics like HEDIS/NCQA change so much year over year that you'd spend a ton of time customizing your generic solution anyway.

    So, I would say to either get commitment from management that their metrics will be standard for some period of years, to make it worth it, or just spend as little time as you can building the specific metrics every year.

    But, I wish you luck in whatever solution(s) you come up with

    -------------------------------------------------------------------------------------------------------------------------------------
    Please follow Best Practices For Posting On Forums to receive quicker and higher quality responses

  • LOL, Jonathan! You clearly have walked in my shoes. Yeah, the fact that these change so much over time is problematic. This is really only for internal dashboards and not for government or payor metrics. And the main part of this that is going to be data driven is setting up the numerator data (i.e. the population). Once I have the population, you are correct that the determination of the numerator will be a pain in the butt and change as often as not. If I use the groupers for things like Dx codes and units, though, at least I don't have to change the hard-coded list in every stored procedure. I hope this turns out to be more useful than not but we will certainly see.

  • diana.bodell 56517 - Tuesday, January 23, 2018 7:16 AM

    I am often required to provide my organization with performance metrics (it is a health care organization so the metrics can be complex and change often). I would like to design something that can be flexible so I can avoid hard-coding values into my process. Often there are groups of departments, lists of diagnoses, and other common conditions that must be met to set up the population for the denominator. For example, the patient must have been discharged between date 1 and date 2 and be of a certain age range at the time of their hospital visit. Beyond those commonalities there are more complex requirements to set up the numerator and denominator based on the specific measure definition. One thing I don't want to do is set up these little silos of measures just because criteria for measures for one purpose don't neatly align with those for others (e.g. measures for patients in the ICU differ from measures for patients in the ER in a number of ways, although they both share the need for a denominator population).

    Question 1: Does it make sense to create a process that sets up the population (patient list) for all related measures that use common initial criteria (like discharge date, age, etc) and to build upon that with separate processes that do the more unique complex filtering and such? Or does it make sense to create the population and processes for each measure separately in their own stored procedures?
    Question 2: Is there a configuration structure that anyone has set up that would allow me to set up criteria definitions which could then be used with dynamic SQL? Is it even a good idea to use dynamic SQL since it is much harder to debug and tune, never mind that it is "hackable"?  (I know, that is two questions...). I also won't be here forever and not everyone is comfortable with reading dynamic SQL. 

    I know there are some really good process designers out there who can weigh in here and I look forward to your ideas.

    Thanks in advance!

    Please could you be a little precise ? I'm unable to come to a conclusion from what you've conveyed. But I can help you with the diagram to explain a bit,

    Basically the process of traditional MSBI setup is like this, (have been same)

    Data sources --> ETL --> Database / Data warehouse / Data mart ---> Reporting

    Nowadays when PowerBI / Tableau / Qlikview emerged and competing, Wins over traditional BI tools. 

    For your requirement I'd conclude (partially) ,

    1) Create a SSAS cube with appropriate dims and measures.

    2) Process the SSAS cube and deploy and run.

    3) Use SSIS to run the cube daily. ( SSAS Processing task)

    4) Once all done or dim / measures have been processed, Connect SSAS to PowerBI to pull the records and create dashboards to present the data to clients.

    5) PowerBI support at least 70+ data sources.

    Kindly read thru and reply for any queries.

  • Thank you Subramaniam. I already make heavy use of cubes but in order to do my measures and dashboard I have to first create a data model, which I don't want to do in the cube. The measures are also very different in their scope beyond the definition of the patient population and so a single cube would be unable to do this. Some measures are based on readmissions within 48 hours, some are based on time from ED to ICU bed. These are massive data sets to begin with and, although we use tabular cubes, the memory usage would be exorbitant. While tools like PowerBI, Qlik and Tableau are wonderful, they don't replace the good old data mart and ETL entirely.

  • Just to give an update, I successfully created a flexible stored procedure which creates the denominator data set for each of the 4 measures I have to do first for the ICU dashboard. Based on the measure being processed, it creates a specific data set with both the patients and visits needed but also some measure specific additional data, such as if the patient returned to the ICU during the same hospital stay (which is needed for one of the measures but not all of them). A set of staging tables is created with names based on which measure is being processed (entered by parameter). There are data driven elements in the process, such as department lists, discharge dispositions, date options (previous year, previous month, previous day) that work off of an anchor date, and a date type (admission vs discharge). These are all elements that I could tie to a specific measure in my measure dimension. I used some minor dynamic SQL to truncate and insert into the tables because the naming of the table is based on parameters from the data. Not too risky. 
    Seems to be working well. Now I will be doing separate processes to take the population and determine the numerators for counts and rates, etc. Thanks to all above for advise and guidance.

  • diana.bodell 56517 - Wednesday, January 24, 2018 6:40 AM

    Just to give an update, I successfully created a flexible stored procedure which creates the denominator data set for each of the 4 measures I have to do first for the ICU dashboard. Based on the measure being processed, it creates a specific data set with both the patients and visits needed but also some measure specific additional data, such as if the patient returned to the ICU during the same hospital stay (which is needed for one of the measures but not all of them). A set of staging tables is created with names based on which measure is being processed (entered by parameter). There are data driven elements in the process, such as department lists, discharge dispositions, date options (previous year, previous month, previous day) that work off of an anchor date, and a date type (admission vs discharge). These are all elements that I could tie to a specific measure in my measure dimension. I used some minor dynamic SQL to truncate and insert into the tables because the naming of the table is based on parameters from the data. Not too risky. 
    Seems to be working well. Now I will be doing separate processes to take the population and determine the numerators for counts and rates, etc. Thanks to all above for advise and guidance.

    Sure, Always reach SQL Server Central for ideas...😀😀:D...It's good to hear that you'd succeeded in your own way..

  • null

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Eric, was this post in response to mine? Not sure if you meant it for something else since performance stats was not an issue for me. BTW, we find SQL Sentry One to be a fabulous resource for our DBA.

  • diana.bodell 56517 - Wednesday, January 24, 2018 8:05 AM

    Eric, was this post in response to mine? Not sure if you meant it for something else since performance stats was not an issue for me. BTW, we find SQL Sentry One to be a fabulous resource for our DBA.

    Right, I accidentally posted to wrong forum. We do use SentryOne too.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • diana.bodell 56517 - Wednesday, January 24, 2018 8:05 AM

    Eric, was this post in response to mine? Not sure if you meant it for something else since performance stats was not an issue for me. BTW, we find SQL Sentry One to be a fabulous resource for our DBA.

    It's hard to tell what Eric means sometimes, because.....wait for it.....null is undefined....

    ..sorry

    -------------------------------------------------------------------------------------------------------------------------------------
    Please follow Best Practices For Posting On Forums to receive quicker and higher quality responses

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply