Views vs. UDFs vs. Stored Procedures for Data Warehouse Reports

  • We are working on a data warehouse with a star schema.  Our reporting tools will be SSRS and Tableau.  However, we are having a debate within the team.  Some people want to use views that query the star-schema tables as the source of the reports.  Others are opposed.

    What are the advantages and disadvantages of using views in this situation?  By the way, we are using SQL Server 2016.

  • First, as a general rule: when considering user defined functions (UDFs) the most important thing is if the function is inline or not. If performance is important, the only viable type of function is inline. Non-inline functions (ones that include BEGIN and END) are not the correct answer to anything you want to accomplish IMO.  In SQL Server 2016 the only T-SQL inline UDFs available are inline Table Valued Functions (iTVF).  Views are iTVFs that don't accept parameters. If you need parameters then an iTVF is the way to go, otherwise you can use views.

    In data warehouse environments I like to create a combination of views and indexed views available to the end user via iTVF. Creating a view allows me to force the callers to query the data using best practices I applied to the view. Once the views are in place I make them available only via an ITVF; this forces the end-user to include filters (e.g. start & end dates).

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • This is good information.  My question now is, why even use a UDF as a source for a report?  Why not just have the report user access a view and that's it?

  • imani_technology wrote:

    This is good information.  My question now is, why even use a UDF as a source for a report?  Why not just have the report user access a view and that's it?

     

    This disadvantage of this is you now have to maintain those views in the database and the data warehouse developers might also end up supporting report development.

  • I would recommend to use the Stored Procedure as that restricts caller to use it as is without giving them opportunity to mess with the filter clause that can end up the query performing very badly.

    Views are better option if you simply want all the records and filtration is not required. Kindly note, if you are using multiple tables in view using JOIN then first of all all the rows using the respective join will be returned and then if you are filtering the output of view using where clause then the output shall be filtered out. Also, you cannot perform DML operations in case needed. You cannot leverage the temporary tables as you cannot create them etc.

    Whereas, in Stored Procedure you can define the filter clause and only filtered records shall be used in join in other tables in the query. Other benefit of stored procedure would be, you can define and restrict the WHERE clause which the caller can only use and cannot change which is not possible in views. Your view can be abused by the caller by messing with the filter clause and there is high probability that you have the index and it would not make sense. Also, you can create and use temporary tables, perform DML operations etc.

    One more important thing to consider is always avoid View to View join.

    As far as Table Valued UDF are concerned, you can use it but always note the downside and limitations to use it.

  • Some good info, let me add a couple of more wrinkles.

    First, an inline UDF has a nickname, a parameterized view. It's way to set up a query, with parameters, to retrieve data in a very specified way, functioning like it was a table (or view). This can be really useful. That's why they exist and can be used instead of stored procedures (which don't act like tables) in some situations.

    The weakness to using views is that they will be treated as tables instead of what they are, queries. By this I mean, that people will start to join one view to another instead of creating a new one when a new data requirement entails getting a bit of data from two different views. In this scenario, a query against 20 tables and another query against 20 tables has now been turned into a query against 40 tables. The optimizer starts to have problems. You'll also see people start to nest views, one within another, within another, while joining them to each other too of course. All because they look like tables. The problem is, they are not tables. Treating them as such can lead to very serious performance issues, not to mention maintenance headaches.

    So, yes, views for reporting, generally a good thing. However, discipline in the use of the views is a must, which means educating your users on problematic approaches. For good or for ill, T-SQL (and most other SQL flavors) does not lend itself to code reuse like a proper programming language. So people see views as a way to simply reuse code. "I have my twenty table join. No need to ever write that again. I'll just reference it like it's an object." The optimizer starts to break down as the complexity of queries increases radically through this approach.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • imani_technology wrote:

    This is good information.  My question now is, why even use a UDF as a source for a report?  Why not just have the report user access a view and that's it?

    Views give me the ability to make data available to data consumers using queries developed using best practices. As Grant and I mentioned earlier - inline table valued UDFs are parameterized views. This is true except that you can't add indexes to a function. In a data warehouse environment I often index my views to pre-join and pre-aggregate data.

    You can also use indexed views to clean up non-SARGable queries... For example, everyone has seen this:

    WHERE RTRIM(LTRIM(sometable.column1)) = @SomeParameter

    This is often due to poor app/DB design that I don't have the authority or time to address. Here an index (or table) scan is guaranteed. This is a huge problem if <sometable> has millions of rows and you only need a small subset of rows from this table. I can use an indexed view to "pre-trim" this column, call it column1_clean, add an index to it, then make column1_clean available to end-users (SQL and Report developers.) Once that's in place I could then create an iTVF that accepts @someparameter and a WHERE clause that leverages the new column1_clean. This is how I correct bad behavior without having to yell at end-users and developers.

    In short, an iTVF on top of an indexed view is a very powerful combination.

    Lastly, something I started doing a few years back, which has been a real game changer, is leveraging a tally table for indexed views. This allows me to pre-split strings in poorly designed vendor table, pre-parsing JSON and much more.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • I see your point.  However, isn't the point of a star schema to simplify things so you don't need complex joins?  Why would you need to have a view joined to another view in that situation?

    Also, which would be preferable for report sources -- UDF or stored proc?

  • imani_technology wrote:

    Also, which would be preferable for report sources -- UDF or stored proc?

    Not a comparison I would make. They do different things and will have different uses. Stored procedures are called differently and behave differently. They just don't have a direct comparison.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Okay, that makes sense.

    If my clients mention they want to use a stored procedure as a source for star schema data warehouse reporting, I can explain why that might not be the best idea.  It sounds like views or ITVFs are much better options (or a combination or both).  Is that correct?

    Thanks for the help.

  • Stored procs are fine. Great way to exercise MORE control and avoid all the issues we've listed with joining and nesting views. I don't object to using stored procs. I just don't compare them directly with views and UDFs because each does different things. Think of it like this, you can call a view or a UDF from a stored procedure, but you can't call a stored procedure from a view or UDF. They just operate differently, even if, at the end of the day, any one of these things just has a query inside of it.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Grant Fritchey wrote:

    Stored procs are fine. Great way to exercise MORE control and avoid all the issues we've listed with joining and nesting views. I don't object to using stored procs. I just don't compare them directly with views and UDFs because each does different things. Think of it like this, you can call a view or a UDF from a stored procedure, but you can't call a stored procedure from a view or UDF. They just operate differently, even if, at the end of the day, any one of these things just has a query inside of it.

     

    Not all reporting tools will pull from a SP as easily as they'll pull from a table or a view or a tvf, or might not at all.

  • Grant Fritchey wrote:

    imani_technology wrote:

    Also, which would be preferable for report sources -- UDF or stored proc?

    Not a comparison I would make. They do different things and will have different uses. Stored procedures are called differently and behave differently. They just don't have a direct comparison.

    As a source for an SSRS report - a stored procedure will be much better.  If there are any code changes to the query (without changing the parameters or columns returned) then it can be done without having to re-deploy the report.

    Embedding queries in SSRS means you must re-deploy that report for every change, even if that change is as simple as adding additional filtering to the query.

    A stored procedure can be entirely re-engineered if needed without having to touch the report(s) - as well as being utilized for multiple reports with slightly different requirements but the same general set of data.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • But could the same thing be done with a UDF?

  • imani_technology wrote:

    But could the same thing be done with a UDF?

    Depends on how you write the query using the UDF - because it is still an embedded query in the report and any changes to that query require redeployment of the report.

    I prefer stored procedures because I can easily modify the logic and maintain the interface - so I don't have to redeploy any reports for those types of changes.  Of course, if you add or remove columns - or add/remove parameters you also have to update the report but that is the same as any other method.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply