Tips (part 2)
In a recent article (Performance Tips Part 1) I discussed some potential performance issues and tips to improve performance. Let’s proceed with performance tips and check another 4 scenarios.
Database has ITEM table with the next structure.
Create table item (item_id int identity(1,1) not null, internal_item_cd char(6) not null , external_item_cd char(6) null, item_name varchar(50) null )
This scenario is required to provide a validation of external_item_cd while inserting a set of records from the process/application to make sure that there are no duplicate records. A unique constraint or a unique index can not be established because the value can be NULL and NULL is not considered as a duplicate. Table size can be in the millions. Initially, the following trigger has been constructed:
CREATE TRIGGER tr1 on ITEM FOR INSERT as Declare @cnt int, @cnt1 int Select @cnt1 = count(*) from inserted where external_item_cd is not null Select @cnt = count(*) FROM inserted itr inner join item i ON itr.external_item_cd = i.external_item_cd WHERE i.external_item_cd is not null and itr.external_item_cd is not null IF (@cnt <> @cnt1) RAISERROR('Duplicate Records', 16, 1) END
The logic always requires a joint between the table inserted and the table ITEM. Can this solution be improved? First, duplicate records can be present in the inserted set. Second, duplicate records can be between table Item and the inserted set of records. Logically, if duplicate records are in the inserted set then by adding a small overhead we may save processing and locking time by the verifying inserted set first. These considerations lead to the following solution:
CREATE TRIGGER tr1 on ITEM FOR INSERT as Declare @cnt int, @cnt1 int Declare @tmp table (external_item_cd char(6) , cnt int) Insert into @tmp(external_item_cd, cnt) Select external_item_cd, count(*) from inserted where external_item_cd is not NULL Group by external_item_cd Select @cnt1 = count(*) from @tmp where cnt > 1 IF (@cnt1 > 0 ) begin RAISERROR('Duplicate Records', 16, 1) rollback Return End Select @cnt1 = count(*) from inserted where external_item_cd is not null Select @cnt = count(*) FROM inserted itr inner join item i ON itr.external_item_cd = i.external_item_cd WHERE i.external_item_cd is not null and itr.external_item_cd is not null IF (@cnt <> @cnt1) begin RAISERROR ('Duplicate Records', 16, 1) Rollback end END
This example shows that there are situations where by adding some additional but very simple logic we may in fact save processing time, system resources, and reduce locking if an error will be detected or in some other situations. I found this technique to be very helpful, especially for the data load processes where there is no way to provide a preliminary verification or when the process has to be tuned to some changed business requirements. And I am not discussing here why a process may have a poor design and/or why management does not want to make the right decision when business dictates the changes. You know – this is the reality. I am simply described how a small overhead may have a big value.
I call it “reversed logic”. This is a simple one. For example, the logic in the trigger has 2 steps:
Step1. Set some variables by querying some permanent tables
Step 2. Main part: IF (update(field1) or update(Field2) – do some processing
Based on the logic, if field1 or field2 is not updated the trigger should do nothing. That means there is no sense to execute Step1. Therefore the logic should be reversed:
Step 1. Verify: IF (update(field1) or Update(Field2)
Step2. Set some variables by selecting from permanent tables
Step 3. Main part
By reversing the logic some unnecessary processing may be eliminated. With certain situations it will increase performance, reduce locking, and minimize usage of server resources. It looks very simple and obvious, but I have seen a number of triggers guilty of imposing some unnecessary processing. Similar situations can be found in stored procedures, where all necessary variables become assigned first by querying the database and only then the input parameters and/or assigned parameters verification is done. Very often it happens when developers use some generically designed trigger or stored procedure templates or simply copying the old code.
When I was working for a telemarketing company we had tables to store call information. The tables, as you can imagine, were very large – 30,000,000 – 50,000,000 rows. One of the tasks was to verify if call for the phone (person) has been already done today or it is a new phone (person). If information was inserted today then the information must be updated or additional data inserted based on existing CALLID. If this is a first call today to the phone/person (even if the phone/person is in a database), then it is treated as a new call. The performance requirement for all verification processes (not only this one, but some others) at a time was defined as no more than 2 seconds.
To make the process perform quicker a permanent temporary table Daily_Calls was created. This table was keeping CALLID, phone, person name, and some other information for each daily call. Verification was always done against this daily table, which would be truncated every night. When the process was invented the daily table growth was 2,000-5,000 rows. Everything was fine up to the point when that figure reached 50,000 -100,000 per day. And suddenly performance for all verification processes became a really serious problem. The research showed that all queries against the table Daily_Calls were making a table scan. Why?
Because the nightly job which used to update statistical information was running at times when record count in the table Daily_Calls was 0, right after it has been truncated. During the next day the table was growing slow enough for the statistical information to remain about the same, which stopped to be the case when the daily growth accelerated.
The solution: schedule a job to update statistical information about the distribution of key values in the specified table every hour with sample of 25%.
UPDATE STATISTICS Daily_Calls WITH SAMPLE 25 PERCENT
This trick eliminated all the issues. Again, I am talking about a practical situation and not about the theory
of what else can be done to solve the problem or how to redesign a process.
Usually this type of situation is called “Load balancing”
The company I am working for has multiple customers and each customer requires that some 7-10 reports refreshed every 2 hours. Each report is scheduled by the Crystal Enterprises product. Periodically, certain users have a complaint about server performance. My analyses revealed that the report developers schedule report runs at will, which created a situation when reports would not be distributed evenly across the time line. This means that at one period of time server may be heavy loaded and another period of time load is very low.
After an analysis next solution was proposed.
- Wrap all customer/database reports into 1 ”bag”. Such as that if we have 20 databases it will be 20 bags. The sequence of reports between bags should stay the same, i.e. longer running reports go first (2-3 minutes), and then shorter running ones (up to 1 min).
- Shift the bags against one another to have a balanced number of reports with similar load power running on the server at any given time.
This simple load balancing solution stopped a reporting scheduling anarchy and increased server’s
Basically, the conclusion is an easy formulated slogan but very often may not be completely understand (or forgotten) by many developers. Performance is not only a function of elapsed time a process (stored procedure) is taking but, in a lot of cases, of the way a process interacts with the other processes in a multi-user environment. Especially if we are talking about batch processes, data load processes and reports. They can run for a long time with multiple processes in parallel. Different techniques can be used to minimize time and improve the way processes affect users and/or other processes. This article (part 1 and part 2) shows some practical situations and methods of performance improvement that were used at a time. There are many other solutions that can be deployed to improve environment and infrastructure performance that did not make it to this article.