The DefaultBufferMaxRows and DefaultBufferSize Properties in SSIS

Shubs, 2018-03-22

Based on my experience as an ETL developer, DefaultBufferMaxRows and DefaultBufferSize are perhaps the most underused Data Flow Task properties for optimizing data extractions. SSIS uses a buffer based architecture and has memory structures called buffers where the data extracted from the source is stored for performing data transformations before the data is sent to the necessary targets. The DefaultBufferMaxRows property suggests how many rows can be stored in these buffers while the DefaultBufferSize (in bytes) suggests the size of the buffer for temporarily holding the rows. These properties have default values assigned to them as shown below (Fig 1).

For smaller data volumes, there is no need to change these properties and the default values usually suffice. However for large data volumes the default values would not necessarily give you the best performance and hence a certain level of manipulation is required for improving the data flow performance. In order to manipulate these properties, we need to know the size per row to estimate how many rows the buffer can hold for a specific value of DefaultBufferSize or to estimate what is the upper limit for buffer size to accommodate a specific number of buffers rows.

For example, if the buffer size was limited to default value of 10MB and the size per row was calculated to be 100 bytes, the DefaultBufferMaxRows can be increased to accommodate 104K+ rows (ie 10485760 bytes/100 rows) or else the SSIS engine would continue to populate the data in batches of 10000 rows. Or if you wanted to limit the DefaultBufferMaxRows to 10000 and the size per row was 1700 bytes, then DefaultBufferSize would need to be increased to around 17 MB to accommodate the DefaultBufferMaxRows default value of 10000 rows or else the data would be populated in batches of around 6000 rows.

The DefaultBufferSize property values can range from 1 MB to 100MB. Higher buffer size translates to more rows in the buffers. However, this would also mean that commit frequency would be much lower as more numbers of rows would need to be populated in buffers before they are flushed out to the destination. Hence for optimal performance we need to maintain a balance between increasing the size of buffers to accommodate more rows and keeping a healthy commit rate. In this article, I will demonstrate that the data flow performance is improved by varying the DefaultBufferMaxRows and DefaultBufferSize properties, however after a certain threshold value for both properties, there is no significant improvement in performance.

In the test, I created a simple data flow task which extracts data from a source table to a target table. The source table has around 77 million rows. The row size based on the table properties, was calculated to be 40 bytes/row. This would imply that for a default value of DefaultBufferSize property, the buffer has a capacity to hold around 250K rows. I thus initially increased the DefaultBufferMaxRows value to 100000 by keeping the DefaultBufferSize constant, executed the package and noted down the time it took for the package to run successfully. Package was then executed every time when either the DefaultBufferSize or DefaultBufferMaxRows property was changed so that buffer could hold more rows.

From the plot (Fig 2), you can clearly see that there was a sharp drop in processing time by changing the DefaultBufferMaxRows property from the default of 10,000 rows to 100,000 rows. But as I increased the value for DefaultBufferMaxRows there was no significant change in processing time. This is because the buffer could not hold the increasing number of rows and package continued to process the data at the maximum upper limit. By increasing the buffer size to twice the default value, there was slight improvement in performance as the buffer could hold more rows. However, as we kept on altering the DefaultBufferSize or DefaultBufferMaxRows values to hold more rows, there was minimum to no performance improvement as more time was required to fill the buffers.

Conclusion

It is thus recommended that we look at altering the DefaultBufferSize or DefaultBufferMaxRows properties so that we can achieve performance gains by allowing the package to process more data in the buffers. However, after a certain value of the two properties a saturation point is reached after which there is little to no performance improvement. Depending on the complexity of the package and environment constraints, the saturation point might be reached much sooner.

From SQL Server 2016, a new Data Flow Task property has been introduced called AutoAdjustBufferSize², which when set to True, will adjust the buffer size automatically to accommodate the number of rows specified by the user. This allows us to change DefaultBufferMaxRows property without worrying about DefaultBufferSize.

References

1. http://www.sql-server-performance.com/2009/ssis-an-inside-view-part-3/?_sm_au_=iVVlsDDHW1fR7llM

2. https://www.mssqltips.com/sqlservertip/4221/improving-data-flow-performance-with-ssis-autoadjustbuffersize-property/?_sm_au_=iVVlsDDHW1fR7llM

SQL Server Integration Services - Loop Containers

by Additional Articles

Database Journal

Integration Services (SSIS)

As we briefly mentioned in the previous installment of this series, among features introduced in the SQL Server 2005 Integration Services there are For and ForEach loops, implemented in the form of containers that can be incorporated into the Control Flow part of a package design.

2005-07-12

1,793 reads

Discuss

Reproduced with kind permission from the blog of Ashvini Sharma (MSFT)

by Additional Articles

SQLDTS.com

Integration Services (SSIS)

InfoPath forms can be saved to XML, these XML Files can later be used in SSIS XMLSource adapter to pull out the data in tables and columns. However, there are some common problems you may meet in these scenarios. This article describes how to work around these potential problems. The issues mentioned in this article is not only specific to InfoPath files, it can also be referenced in other similar situations as well.

2005-06-20

1,328 reads

Discuss

How to Asynchronously Execute a DTS package from ASP or ASP.NET

by Additional Articles

SQLTeam.com

Integration Services (SSIS)

The Data Trasformation Services are a powerful tool, and sometime its features are so useful that you’d like to invoke a DTS package not only from SQL Server but from an external program.

To do this you have several choices: you can use the DTSRun.exe tool or you can do it leveraging the SQL-DMO features.

Unfortunately if you’re developing a web application (ASP, ASP.Net or whatever you use) none of them seems to be the right choice: too much problems, too much effort and a very modest results. In addition none of these solutions can be called asynchronously: if you just need to implement a “fire-and-forget” technique, you just cannot do that!

2005-05-24

2,663 reads

Discuss

Easy Package Configuration

by Additional Articles

SQLDTS.com

Integration Services (SSIS)

One of the age old problems in DTS is moving packages between your development, test and production environments. Typically a series of manual edits needs to be done to all the packages to make sure that all the connection objects are pointing to the correct physical servers. This is time consuming and gives rise to the possibility of human error, particularly if the solution incorporates many DTS packages. Many companies have provided their own custom solutions for managing this problem but these are still workarounds for a problem that is inherently DTS's.

2004-12-14

1,761 reads

Discuss

File Inserter Transformation

by Additional Articles

SQLDTS.com

Integration Services (SSIS)

SQL Server 2005 has made it a lot easier for us to loop over a collection and with each iteration do something with the item retrieved. In this article we are going to show you how to iterate over a folder looking at the files within and doing something with those files. In this instance we will be entering the filename into a SQL Server table and we will then load the actual files we have just found into another SQL Server table. You will note here that there is still the need to load the file names into a table as an intermediate step just as we need to do in SQL Server 2000.

2004-11-17

2,506 reads

Discuss

The DefaultBufferMaxRows and DefaultBufferSize Properties in SSIS

Conclusion

References

Rate

Share

Categories

Share

Rate

The DefaultBufferMaxRows and DefaultBufferSize Properties in SSIS

Conclusion

References

Rate

Share

Categories

Share

Rate

Related content

SQL Server Integration Services - Loop Containers

Reproduced with kind permission from the blog of Ashvini Sharma (MSFT)

How to Asynchronously Execute a DTS package from ASP or ASP.NET

Easy Package Configuration

File Inserter Transformation

Cookies on SQLServerCentral