Performance Tips Part 1

Performance Tips (part1)

There is no shortage of articles and books about improving performance of a database overall, SQL statements, and other types of performance. For example, http://www.mssqlcity.com/Tips/tipTSQL.htm

, or http://www.sql-server-performance.com/transact_sql.asp

Most

of the times these sources focus on general recommendations. I

would like to review some interesting aspects of the subject performance in

regards to Transact SQL and a batch process design. My material is based on some

real work I’ve done as a developer, data modeler and process architect. Most

if not all the tips that will be discussed may sound obvious to some of

you. However, judging by the number of widely used applications out there in

which these guidelines were not followed, there are a lot of people who can

benefit from reviewing these tips one more time.

can refer you to my previous article “Overview of performance”

This article will proceed with the number of handy examples, situations, and

solutions to look at the performance of a query, stored procedure, or a process

based on a code changes or design changes. Described solutions are based on a

code changes or design changes. In all cases I am not discussing why the initial

design or coding was done in a specific way but simply trying to show some tips

that may help you in your daily work. Some of the situations and solutions may

look like a simple or as an unnecessary change. But I choose them because I was

experiencing the problems with those scenarios. But, you will be right to point

out that some of the situations may never present any performance issues. And I

will point out that, if you have no live (car, house) insurance you may never

experience a problem until one day when incident happening. And it does not mean

that you should not obtain insurance. So, let’s dive into the real scenarios.

Situation 1

Let’s

start with a simple example of an infrequently used type of join – CROSS JOIN.

Let’s assume the requirement to produce a Cartesian product of the tables

involved in the join.

SELECT t1.col1, t1.col2, t2.col1,t2.col2  
 FROM Customer t1 CROSS JOIN  GeneralInfo t2
 WHERE t1.status > 0

While analyzing this statement I found out about some additional details. First, the table GeneralInfo has only one row. My question to a developer was: “What if there is more then one row?” His answer: “Then only the first row should be in the cross join”. Second, the table GeneralInfo has no relationship to the table CUSTOMER and the values in the

GeneralInfo are independent.

Obviously, the above SELECT statement would produce a wrong result should the table GeneralInfo had indeed more than one row, and, in addition to that, the redundant joint would just harm the performance. A correct solution is to get the first row values into variables and use the variables in SELECT statement. It can be done by using TOP keyword or by specifying PK value. For example,

Declare @var1 varchar(50), @var2 varchar(50) 

SELECT   @var1
= col1, @var2 = col2 FROM GeneralInfo where
PKvalue = FirstPKValue
-- or, another way
-- SELECT   TOP
-- 1 @var1 = col1, @var2 = col2 FROM GeneralInfo

SELECT   col1,
col2, @var1, @var2 FROM Customer  WHERE   
status > 0

Situation 2

I call it “One statement vs. many statements in batch or stored procedure”. In many cases when a stored procedure or a batch is developed, I observe a tendency to create one SQL statement no matter how big it is, how many joins it requires, or how many conditions are there in the WHERE clause. It very often leads to one the following negative outcomes: the optimizer becomes “confused” and incapable of choosing the best execution plan, or the tables are kept locked longer than necessary. The solution will be to split one query into a few smaller ones. There are many situations where a procedure (process) will reduce running time significantly.

In addition, let’s make an assumption that a query runs for 30 seconds and when the same query is split into 4 smaller statements with partial outputs directed to a temporary table.

Let’s make an assumption that each statement runs 10 seconds. In general, it means that the process (stored procedure) is running longer (4*10 = 40 seconds vs. 30 seconds with original query) but has less impact to each individual table. And what it ultimately means that the procedure performs better, since the performance is not only a function of elapsed time a process (stored procedure) is taking but in a lot of cases the way a process interacts with the other processes in a multi-user environment. In addition, I can assure you, based on my experience, that in many cases one complex SQL statement runs longer than the multiple statements with temporary tables (variables type of table) it is split into.

Situation 3

Let’s take a look at rather common situation when some

historical data must be archived. In our databases history tables don’t have

any physical relations to any other table(s). They are just stand alone tables.

The following are the steps of a typical process many developers have faced at

least once in the carrier:

1. Create an archiving table.

2. Insert records from the history table into archiving table based on the specific condition(s) (usually a date )

3. Delete records from the history table based on the same condition(s) or based on the join between the archiving and the historical tables

Let’s say we need backup Call_Log_Event table that has 3,000,000 rows. There is a data for 120 days and we should keep rows inserted during last 10 days. Based on the scenario above, if table Call_Log_Event structure is next:

Create table Call_Log_Event 
( call_log_id int identity(1,1), event_desc varchar(255), event_cd char(6)
 , eventdt datetime) 

Create table Call_Log_Event_Archive
 (archive_id int identity(1,1), call_log_id int, event_desc varchar(255)
  , event_cd char(6), eventdt datetime, archivedt datetime)

insert into Call_Log_Event_Archive (call_log_id, event_desc, event_cd , eventdt, archivedt)
 select call_log_id, event_desc, event_cd , eventdt, getdate() from Call_Log_Event
  where datediff(dd,eventdt, getdate()) > 10 

delete from Call_Log_Event  
 where datediff(dd,eventdt, getdate()) > 10

Since the number of records in the archiving table is much larger than the number of remaining records in the history table(s) (otherwise it is senseless to archive the data), the following logic is going to significantly reduce the processing time:

1. Create an archiving table.

2. Insert all records from the history table into the archiving table.

3. Truncate the history table

4. Insert the necessary number of records from the archiving table into the history table based on the specified conditions.

5. Delete records for the archived table based on the same condition or based on the join between the archiving and the history tables (if necessary).

Create table Call_Log_Event_Archive (archive_id int identity(1,1), call_log_id int, event_desc varchar(255)
  , event_cd char(6), eventdt datetime, archivedt datetime) 

insert into Call_Log_Event_Archive (call_log_id, event_desc, event_cd , eventdt, archivedt)
 select call_log_id, event_desc, event_cd , eventdt, getdate() from Call_Log_Event

truncate table Call_Log_Event 

insert into Call_Log_Event_Archive (call_log_id, event_desc, event_cd , eventdt, archivedt)
 select call_log_id, event_desc, event_cd , eventdt, getdate()
  from Call_Log_Event
  where datediff(dd,eventdt, getdate()) <= 10 

delete from Call_Log_Event_Archive 
 where datediff(dd,eventdt, getdate()) <= 10

Step 4 and 5 may be switched. I placed Insert as a step 4 because step 5 may not be always required but is highly recommended to avoid a future confusion

Situation 4

Many sources suggest avoiding cursors whenever possible. The reasoning behind this is that cursors take a lot of the SQL Server resources and can potentially negatively impact performance. In addition, you have to control cursor deallocation in the case of both successful and unsuccessful statement completion. Here are the implementations of a solution with and without a cursor.

-- Solution with a cursor
Begin

DECLARE cur1  CURSOR FOR select name from master..sysdatabases

open cur1

FETCH NEXT FROM cur1

WHILE @@FETCH_STATUS = 0
 BEGIN
  FETCH NEXT FROM cur1
 END

CLOSE cur1

deallocate cur1

end

-- Solution without a cursor
begin

declare @cur1 table (name varchar(50), tid int identity(1,1) )

declare @maxid int, @name varchar(50), @minid

insert into @cur1 (name)
 select name from master..sysdatabases

select @maxid = max(tid), @minid = min(tid) from @cur1
While (@minid <= @maxid)
 begin
  select name from @cur1 where tid = @minid
  set @minid = @minid + 1
 end

end

Situation 5

Very often I see procedures that start a transaction at the very top of the procedure and commit/rollback at the very end. I did talk about it in my previous article (http://www.sqlservercentral.com/columnists/lPeysakhovich/overviewofperformance.asp)

Now, let’s review a similar situation. In this case the process has a loop inside and some logic to control processing based on certain criteria. Let’s make an assumption that the loop is necessity of the process.

Create procedure proc1
as

  Step1 – begin transaction

  Step2 – loop

  Step3 – make calculations based on special criteria

  Step4 – insert/update row

  Step5 – end loop

  Step6 – commit/rollback

Is there a problem?

The procedure keeps the transaction active and locks tables at run time much

longer than it should. Now, even though this the approach may be

justified at times, for example when one need to preserve the data changes while

it runs, in most situations this is not the case. For instance, data

verifications/calculations made against values in the tables that are not a part

of modification statements. The second

procedure has a slightly modified logic. It will run for a shorter amount of time even some additional steps are required and at the same time it has less

interference with other users because of shorter locking time – the

transaction is shorter.

Create procedure proc1 
As

  Step1 – create temporary table (or table variable) with structure of permanent table

Step2 – loop

Step3 – make calculations based on special criteria

Step4 – insert row into temporary storage

Step5 – end loop

Step6 - begin transaction

Step7 – insert/update rows into permanent table in one statement

Step8 – commit/rollback

Result of the process will be the same; all rows inserted/updated or none of them inserted/updated but the logic performs better. The only added complication here is that the error handler must differentiate between errors inside and outside the transaction context. This can be easily achieved by creating a transaction flag variable, for example.

Conclusion

Performance is not only a function of elapsed time a process (stored procedure) is taking but in a lot of cases the way a process interacts with the other processes in a multi-user environment, produce load to the server. Especially if we are talking about batch processes, data load processes and reports. They can run for a long time with multiple processes in parallel. And different technique methods can be used to minimize time and improve the way processes affect users and/or other processes.