Stored Procedure Analysis and Design

  • I’m pretty new to Sql and I’ve been given a set of reports I need to develop.

    Specifically, I’ve got an Excel file that says that report x needs to show y and so on. All this data is across various tables. Due to the fact that it’s my first time gathering actual data I would like to know if anybody recommends a way to start this off.

    I’ve seen some people on the Internet say that they SELECT the first column then depending on what they need after that whether it’s another column, a join etc the query gradually grows.

    I would like to know some recommendations for an approach or any rules of thumb for gathering/querying the data that can possibly make things easier so time is spent wisely.

    Thanks

  • There are various ways, all with there own benefits and after a while you just get a feel for it. That said, an approach that often works for for beginners is to start with the "driving table" - that is the one which contains the root of all the data you want in a report. Then one-by-one add joins etc to slowly add in bits from other tables, testing frequently as you go. Don't get too hung up on performance to start with, getting something that works is most important, getting something that works efficiently is the next step.

  • I’ve also heard about sub querying and using temp tables. When do you know when you need these?

  • NikosV - Tuesday, August 7, 2018 7:49 AM

    I’ve also heard about sub querying and using temp tables. When do you know when you need these?

    Basically, as you build the query starting with base or driving table.  I would ask questions as you go along.  If you have a more experienced DBA/Database Developer at your work use them to mentor you.  They can see what you are working and make better suggestions than we might.

  • Makes sense.

    I don't really have anybody to "mentor" me but that's not a problem. What I am trying to do to really grasp these SQL concepts is read my team leaders code. 
    Having done this, I've seen all sorts. CTE's, Subqueries, Temp tables etc. I think I see a "steps" pattern. For example:

    Step 1: SELECT Column1, Column2, Column3,
    FROM (
                 Step 2: Find X from this SET and return X
               )
    WHERE Column3 = 'Value'

    I'm finding that all these nested queries and pretty much most of SQL I've seen at this level is basically saying something like 


    SELECT Columns
    FROM (Select Columns of outer query when you return what I say here)
    WHERE Whatever

    I'm not sure actually I think my examples are crap but I'm getting a "chain" feeling going on. All queries are somehow connected to each other and the final result is specified based on what I've said in my code. So what the first select returns is dependent on what the nested select returns, and that's dependent on what the rest of the code says. I think coding in SQL is like having a bucket of data and just taking out what you want.

    Am I even remotely on track when I say this? Obviously actually coding up solutions is the ultimate teacher but I'm just trying to grasp the concept as a whole for now.

  • The advise of starting with the driver table is good stuff.
    As far as when to use temp tables, CTEs, derived tables, sub-queries... ect... Well... "it depends" on the specific query.

    As a general rule, start with the driver table and keep an eye on the row counts. As you join new tables to the query, you'll notice that your row counts will go up, go down or stay the same.
    It's up to you to determine whether or not the row count change is appropriate.

    For example, let's say that dbo.Orders is your driver table. You then join the dbo.OrderLineItems table to dbo.Orders on OrderID...
    Since each order is made up of one or more line items, you'd expect that joining dbo.OrderLineItems would increase the number of rows being returned (you've gone from returning a single row per order to returning a single row for each line item).
    So, now you need to ask yourself, "do I need line item level detail or do I just want an aggregation of some line item information?".  If the answer is, "yes, I need the line item level detail", then the straight join is appropriate. If, on the other hand, the answer is, "no, I just need a count of items included in a given order", then it may make sense to use a derived table to aggregate the line item information up to the order level and then join to that.

    Now, lets say you need to add shipping information to your query. In this case you'll be joining the dbo.ShippingInfo table to the query. Of course not every order has shipped so not every order has a corresponding row in the shipping table. As a result, adding the shipping table to the query will reduce the number of rows being returned (the shipping table is acting as a filter that reduces the orders being returned to only those that have been shipped).
    Again, you need to ask yourself, "Do I want to see ALL orders or am I only interested in orders that have shipped?". If the answer is, "I need ALL orders", then you'll want to switch from an INNER JOIN to a LEFT OUTER JOIN. In this case, a LEFT OUTER JOIN will prevent the filtering action and simply return NULL shipping values for unshipped orders.

  • Just starting out, stick as much as possible to straight T-SQL. Then, as you start to see odd bits of behavior or weird requirements or odd structures, you can start building derived tables and CTEs and table variables and temporary tables. Some really vague rules of thumb:

    A derived table: SELECT ... FROM (SELECT ... FROM ... WHERE...) WHERE ...
    is best used where you need to define a sub-set of the data that you're going to be joining to the super-set of the data from the driver table.

    A CTE is best used where you need to use the sub-set of the data more than once with different join or filter criteria.

    Temporary tables are when you literally need to break down the code into smaller chunks AND you need to use statistics on the tables as part of JOINs and filtering.

    Table variables are when you need to  do the same thing as temporary tables, but you don't need, or want, the statistics (which can lead to recompiles).

    Those are really general rules just so you have an idea of how to apply things. Everyone else's advice so far is very good. Yeah, this stuff gets thick fast. I'd suggest getting a copy of Itzik Ben-Gan's book on querying T-SQL. Don't bother with the fundamentals book, just go straight for the other one.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Nikos,

    Welcome to the boards and the world of SQL.  You will find the folks on this forum super-helpful but they are all busy people.  I would recommend looking at the staircase series from the homepage to get a quick overview of SQL and it's structure and capabilities.  Try and write some code and when you get stuck, post the real code (or as close to as Intellectual Property will allow) and sample data and you will get really useful and speedy responses.

  • I'm on it thanks.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply