September 20, 2012 at 3:37 am
Hi,
Our company has an ERP software and recently I've been appointed to manage the database, but developers still write their SQL code for every day needs (add a field to a table, change a condition on SP, ...).
I'm trying to write a document to give them so they can "follow" some rules and avoid basic mistakes...
Since it's a developers guide I'm not planing on writing anything about Files, Filegroups, tempdb, ... cause it's a DBA job to manage that (that's me...:-D).
My plan was to have some topics like this (most from a document I got from SQLAuthority and adapted to our reality):
- SQL Server Naming Convetions and Standards
* Use pascal notation for SQL objects
* SP name patterns (not sure to use usp_... or sp[AppName]_[Action][Table])
* Triggers name patters (tr__[action])
* Index name patterns (IX_.., PK....)
* Foreign keys name patterns (FK_..)
* Defaults name patters (DF_...)
* Format SQL syntax to upper case and indent it
* Use the correct datatype for the correct data
* NVARCHAR and NCHAR use twice as much the space VARCHAR and CHAR do... are they necessary?!
- SQL Server Programming Guidelines
* Avoid SELECT * ...
* Avoid temp tables, prefer CTE (unless needed to use in nested SP or triggers)
* Use SET NOCOUNT ON
* Always create FK for referenced columns to check referential integrity instead of triggers or IF (SELECT .. FROM ..)
* Avoid CURSORs, try to replace with WHILE loop (this may be a bit slower but when in a multiprocessing environment it better)
* Comment all code!!!
* Explain LEFT JOIN and how to use the ON for filtering data, showing the examples bellow and explain the differences
SELECT ... FROM p LEFT JOIN o ON p.ID = o.ID WHERE o.Date > '20100101' (equals INNER JOIN)
SELECT ... FROM p LEFT JOIN o ON p.ID = o.ID WHERE o.Date > '20100101' OR o.Date IS NULL (not good...)
SELECT ... FROM p LEFT JOIN o ON p.ID = o.ID AND o.Date > '20100101' (use this instead)
* Declare all variables inside a SP on top and use SET after (query optimizer reuses query plans better this way)
* When using multiple parameters on an SP that can be NULL and use WHERE (@prm1 IS NULL OR tbl.Col = @prm1) AND (@prm2 IS NULL OR tbl.Col2 = @prm2) ... instead of IF @prm1 IS NOT NULL .... IF @prm2 IS NOT NULL .... (makes the code simpler and easier to read)
* Use (SELECT TOP 1 1 FROM ...) if wants to check if a record exists and don't need the resultset
* Avoid NTEXT, TEXT, ... use NVARCHAR(MAX), VARCHAR(MAX), VARBINARY(MAX)...
* If necessary to store binary data use FILESTREAM
* Use dynamic SQL only if strictly necessary
* ALWAYS specify the columns list in an INSERT statement
* Avoid using functions on WHERE clause on the tables' columns: WHERE LEFT(col, 2) = 'Ad => WHERE col LIKE 'Ad%'
* Avoid IN and NOT IN, use EXIST or EXCEPT instead
* Use SET TRANSACTION LEVEL READ UNCOMMITTED if the SP has no need to lock records or the NOLOCK
* For small XML data use nodes() instead of sp_xml_preparedocument.
- SQL Server Query Tunning
* Check a query performance
* Use the execution plan
* Don't forget the SET STATISTICS IO ON !!!
I think this is it.... What do you think? Has anyone made anything like this before or can give me some hints?!
Thanks,
Pedro
September 20, 2012 at 4:35 am
I have built a document like this before. It was a mixed success. Those who I was able to get to use it, loved it, and it made a huge difference in their code. Those who resisted, resisted a lot and I was constantly having to prove why I made the choices I did.
My advice, start small and grow organically. Pick the hill you want to die on, for example, how important is it really that all procedures start with usp, sp, or anything else? Also, as far as that goes, remember that you're making sorting and filtering procedures harder because everything starts with usp. So, is that one really, really important to you? Be sure your advice is 100% accurate and clear. A couple of examples from your list that might not be clear enough:
Avoid using functions on WHERE clause on the tables' columns: WHERE LEFT(col, 2) = 'Ad => WHERE col LIKE 'Ad%'
Actually, the LIKE command in this example would use indexes very well. So it's not something to be avoided. However, move or add the wild card to the front of the code and suddenly it is.
Use SET TRANSACTION LEVEL READ UNCOMMITTED if the SP has no need to lock records or the NOLOCK
This implies that these are OK to use, no down side. This hill, I'd die on. I'd tell them to never use these because of the missing or extra rows of data they can cause.
Avoid temp tables, prefer CTE (unless needed to use in nested SP or triggers)
These are not equivalent. There are lots of places where you might break down code into multiple steps using a temp table that are going to perform much better than trying to cram stuff into a CTE. CTE is just a query. Temp table has statistics. And then you need to address table variables and their appropriate use too.
Anyway, that's the approach I'd take. Go slow, pick your fights, be sure you got everything right.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
September 20, 2012 at 7:28 am
PiMané (9/20/2012)
* SP name patterns (not sure to use usp_... or sp[AppName]_[Action][Table])* Triggers name patters (tr__[action])
Why use prefixes at all? Doesn't add any clarity as to what an object is (you can't EXEC views or select from procedures) and causes problems later if you want to split a table apart and put a view in it's place, now you have to rename everything or have a view called TBL...
* When using multiple parameters on an SP that can be NULL and use WHERE (@prm1 IS NULL OR tbl.Col = @prm1) AND (@prm2 IS NULL OR tbl.Col2 = @prm2) ... instead of IF @prm1 IS NOT NULL .... IF @prm2 IS NOT NULL .... (makes the code simpler and easier to read)
Simpler, easier to read, worse performance. What's important, ease of coding, readability or performance?
* Avoid IN and NOT IN, use EXIST or EXCEPT instead
Why?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
September 20, 2012 at 7:37 am
GilaMonster (9/20/2012)
PiMané (9/20/2012)
* SP name patterns (not sure to use usp_... or sp[AppName]_[Action][Table])* Triggers name patters (tr__[action])
Why use prefixes at all? Doesn't add any clarity as to what an object is (you can't EXEC views or select from procedures) and causes problems later if you want to split a table apart and put a view in it's place, now you have to rename everything or have a view called TBL...
Ok, I'll go on this one 🙂 SQL Server already has its objects grouped by type so no need to add a prefix to have them all together.
How would you suggest it? [Table]_[Action]?
GilaMonster (9/20/2012)
* When using multiple parameters on an SP that can be NULL and use WHERE (@prm1 IS NULL OR tbl.Col = @prm1) AND (@prm2 IS NULL OR tbl.Col2 = @prm2) ... instead of IF @prm1 IS NOT NULL .... IF @prm2 IS NOT NULL .... (makes the code simpler and easier to read)
Simpler, easier to read, worse performance. What's important, ease of coding, readability or performance?
If I have 2 parameters I can have 4 combinations... If by any chance I need to add another parameter I'll have 8 combinations... It's 2^[number of parameters]... that's a very very big IF statement...
Would it be better to build the query and use dynamic SQL?
GilaMonster (9/20/2012)
* Avoid IN and NOT IN, use EXIST or EXCEPT instead
Why?
Is this good?
SELECT Id, Name FROM Employee WHERE Department IN (SELECT Id FROM Departments WHERE Country = 'US')
Or is this better?
SELECT Id, Name FROM Employee e WHERE EXISTS (SELECT 1 FROM Departments d WHERE d.Id = e.Department AND d.Country = 'US')
September 20, 2012 at 7:41 am
PiMané (9/20/2012)
GilaMonster (9/20/2012)
* When using multiple parameters on an SP that can be NULL and use WHERE (@prm1 IS NULL OR tbl.Col = @prm1) AND (@prm2 IS NULL OR tbl.Col2 = @prm2) ... instead of IF @prm1 IS NOT NULL .... IF @prm2 IS NOT NULL .... (makes the code simpler and easier to read)
Simpler, easier to read, worse performance. What's important, ease of coding, readability or performance?
If I have 2 parameters I can have 4 combinations... If by any chance I need to add another parameter I'll have 8 combinations... It's 2^[number of parameters]... that's a very very big IF statement...
Would it be better to build the query and use dynamic SQL?
Yes. http://sqlinthewild.co.za/index.php/2009/03/19/catch-all-queries/
GilaMonster (9/20/2012)
* Avoid IN and NOT IN, use EXIST or EXCEPT instead
Why?
Is this good?
SELECT Id, Name FROM Employee WHERE Department IN (SELECT Id FROM Departments WHERE Country = 'US')
Or is this better?
SELECT Id, Name FROM Employee e WHERE EXISTS (SELECT 1 FROM Departments d WHERE d.Id = e.Department AND d.Country = 'US')
Test them and see?
Or http://sqlinthewild.co.za/index.php/2009/08/17/exists-vs-in/
Oh, and I missed a couple ...
Use SET TRANSACTION LEVEL READ UNCOMMITTED if the SP has no need to lock records or the NOLOCK
How is a developer going to know better than the SQL engine whether he needs locks or not?
See - http://blogs.msdn.com/b/davidlean/archive/2009/04/06/sql-server-nolock-hint-other-poor-ideas.aspx
SELECT ... FROM p LEFT JOIN o ON p.ID = o.ID WHERE o.Date > '20100101' OR o.Date IS NULL (not good...)
SELECT ... FROM p LEFT JOIN o ON p.ID = o.ID AND o.Date > '20100101' (use this instead)
Why is the first not good and are these two queries identical?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
September 20, 2012 at 7:42 am
Ooh, just spotted this, for examples, unless you're trying to set a standard as @parm1, etc. (and if so, no, no, no) I wouldn't use those in any examples. Make all examples follow your standard. Personally I'd strongly recommend parameter names be descriptive so that they're clear.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
September 20, 2012 at 7:45 am
Grant Fritchey (9/20/2012)
Ooh, just spotted this, for examples, unless you're trying to set a standard as @parm1, etc. (and if so, no, no, no) I wouldn't use those in any examples. Make all examples follow your standard. Personally I'd strongly recommend parameter names be descriptive so that they're clear.
Yep.. this is just for the topics draft... what is on ( ) isn't to go on the topics list, just some appointments to get your opinions... 😀
September 20, 2012 at 8:01 am
GilaMonster (9/20/2012)
Oh, and I missed a couple ...Use SET TRANSACTION LEVEL READ UNCOMMITTED if the SP has no need to lock records or the NOLOCK
How is a developer going to know better than the SQL engine whether he needs locks or not?
See - http://blogs.msdn.com/b/davidlean/archive/2009/04/06/sql-server-nolock-hint-other-poor-ideas.aspx
If I have a list operation and the insert/update on the table takes to long (for some whatever reason), can't I put the SET TRANSACTION LEVEL READ UNCOMMITTED if I'm not concerned with dirty reads so the list doesn't have to wait for the insert/update?
Not on "major" operations but on simple list operations for paging and like that...
GilaMonster (9/20/2012)
SELECT ... FROM p LEFT JOIN o ON p.ID = o.ID WHERE o.Date > '20100101' OR o.Date IS NULL (not good...)
SELECT ... FROM p LEFT JOIN o ON p.ID = o.ID AND o.Date > '20100101' (use this instead)
Why is the first not good and are these two queries identical?
Doesn't the OR statement add a Filter to Execution Plan? I made some tests and if the date filter is on the ON clause it's twice faster than having it on the WHERE clause and adding the OR IS NULL.
Thanks for the great links... awesome.. 🙂
September 20, 2012 at 8:49 am
PiMané (9/20/2012)
GilaMonster (9/20/2012)
Oh, and I missed a couple ...Use SET TRANSACTION LEVEL READ UNCOMMITTED if the SP has no need to lock records or the NOLOCK
How is a developer going to know better than the SQL engine whether he needs locks or not?
See - http://blogs.msdn.com/b/davidlean/archive/2009/04/06/sql-server-nolock-hint-other-poor-ideas.aspx
If I have a list operation and the insert/update on the table takes to long (for some whatever reason), can't I put the SET TRANSACTION LEVEL READ UNCOMMITTED if I'm not concerned with dirty reads so the list doesn't have to wait for the insert/update?
Not on "major" operations but on simple list operations for paging and like that...
Sure you can, if you don't mind getting rows duplicated and/or not reading rows at all. It's not just about 'dirty' reads. Read the link I posted. Nolock is NOT a go-faster switch and should not be in a 'standard' as a recommendation.
GilaMonster (9/20/2012)
SELECT ... FROM p LEFT JOIN o ON p.ID = o.ID WHERE o.Date > '20100101' OR o.Date IS NULL (not good...)
SELECT ... FROM p LEFT JOIN o ON p.ID = o.ID AND o.Date > '20100101' (use this instead)
Why is the first not good and are these two queries identical?
Doesn't the OR statement add a Filter to Execution Plan? I made some tests and if the date filter is on the ON clause it's twice faster than having it on the WHERE clause and adding the OR IS NULL.[/quote]
You missed half of my question. Are those two queries identical?
Having a query return incorrect data fast is perhaps not desired.
As for the OR adding a filter, not always, not if the indexes are done properly and shouldn't in the case you have above even if the indexes aren't done properly because both sides of the OR are on the same column
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
September 20, 2012 at 9:12 am
GilaMonster (9/20/2012)
You missed half of my question. Are those two queries identical?Having a query return incorrect data fast is perhaps not desired.
As for the OR adding a filter, not always, not if the indexes are done properly and shouldn't in the case you have above even if the indexes aren't done properly because both sides of the OR are on the same column
Didn't notice that till made the test and analyzed the data.. Thanks for the hint.
A couple of questions:
if I have a table with an index with (col1, col2, col3) and another with (col1, col2) is the 2nd one necessary? will it ever be used, since the 1st one also has its columns?
are sub queries bad?! I read somewhere to avoid sub queries but I made a test:
SELECT p.Product, p.Description, SUM(ps.Stock) FROM Products p LEFT JOIN ProdutStorages ps ON p.Product = ps.Product AND ps.Storage > 60
GROUP BY p.Product, p.Description
SELECT p.Product, p.Description, (SELECT SUM(ps.Stock) FROM ProdutStorages ps WHERE p.Product = ps.Product AND ps.Storage > 60) FROM Products p
The execution plan is pretty much the same, the 2nd query has a Compute Scalar with 0%cost.. The statistics IO are the same...
What should we use?
Thanks,
Pedro
September 20, 2012 at 9:18 am
You are trying to make design decisions before any coding is actually done. The answer is "it depends." What should be done is to develop several alternative solutions and test them, repeatedly. Also, test them against production loads and more if possible to test for scalability. What works well for a small set of data may not work for a larger set or the other other way around as well.
September 20, 2012 at 9:22 am
PiMané (9/20/2012)
A couple of questions:if I have a table with an index with (col1, col2, col3) and another with (col1, col2) is the 2nd one necessary? will it ever be used, since the 1st one also has its columns?
The second is not necessary in most cases. It'll still be used, but it's not necessary.
What should we use?
Whatever you like.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
September 20, 2012 at 9:24 am
Lynn Pettis (9/20/2012)
You are trying to make design decisions before any coding is actually done. The answer is "it depends." What should be done is to develop several alternative solutions and test them, repeatedly. Also, test them against production loads and more if possible to test for scalability. What works well for a small set of data may not work for a larger set or the other other way around as well.
Unfortunately the coding has already been done.... a 10 year bad practices legacy!!
My point is to use good practices has they change what's already written.. the new stuff goes through me so I can "control" it but I can't review a database 1118 tables and 548 SPs and 105 triggers... There's also a lot of business rules and logic in the SW code...
I'm trying to "do the right thing" as "it goooessss"...
September 20, 2012 at 9:30 am
I would also add a bullet point regarding scalar functions vs inline table valued functions vs multi-statement tables valued functions. Try to explain how a scalar function and msTVF is used in an execution plan vs a iTVF. Try using iTVF over scalar and msTVF where it can be done.
I showed our developers the performance and execution impact between these types of functions, and they have never used anything but iTVF since 🙂
September 20, 2012 at 9:38 am
PiMané (9/20/2012)
Lynn Pettis (9/20/2012)
You are trying to make design decisions before any coding is actually done. The answer is "it depends." What should be done is to develop several alternative solutions and test them, repeatedly. Also, test them against production loads and more if possible to test for scalability. What works well for a small set of data may not work for a larger set or the other other way around as well.Unfortunately the coding has already been done.... a 10 year bad practices legacy!!
My point is to use good practices has they change what's already written.. the new stuff goes through me so I can "control" it but I can't review a database 1118 tables and 548 SPs and 105 triggers... There's also a lot of business rules and logic in the SW code...
I'm trying to "do the right thing" as "it goooessss"...
You are missing my point. You can't say, for example, that using correlated subqueries is bad all the time. Different ways of accomplishing the same task should be explored and tested, and the best solution selected.
Even the post above that basically says don't use scalar or multistatement tvf is not always correct. The different solutions need to be tested and the best one selected. This testing should include scalability testing, or as Jeff Moden would do, the million row test (sometimes you may need more!).
Viewing 15 posts - 1 through 15 (of 37 total)
You must be logged in to reply to this topic. Login to reply