During my SSWUG webcast I was asked to give some examples on how to tune the DefaultBufferMaxRows and DefaultBufferSize, so here goes.
When a source is pulling in data, it places them into buffers; one row=one buffer. The default setting for DefaultBufferMaxRows=10,000. This means that the source will collect 10k records and then 'dump' them into the pipeline for processing. Now ignoring asynchronous/synchronous components, until all of these buffers are processed by your destination, no more rows will be pulled in. While this sounds like a batch, and you can think of it like this, don't confuse this with the batching at a destination, they don't have to match. I don't believe there is a stated upper limit (I've done 1M).
DefaultBufferSize tells SSIS how much memory to set aside for those 10k buffers. They are structures (don't know what language SSIS is built in) and they need to have memory allocated, but just how much; 10MB (10485760 bytes). The maximum size is 100MB (104857600 bytes). How this parameter works is as follows. As SSIS starts filling buffers it keeps track of the memory allocated. If you have really wide rows (over 10485760/10000=1048 bytes) then you will reach 10MB right around 10k buffers. If you have a row that is 2096 bytes in size, you will hit the limit around 5k buffers.
Now you've probably had the 'Ah Ha!' moment. Now, how do you use this information to improve the performance of your Data Flow Task?
So given the max memory I can allow per task, setting my DefaultBufferMaxRows>200493 will be a waste. There you have it, a simple mechansm to determine the proper settings. More often than not I set my task to 100MB and reset my buffers to fill it. Yes there is a possibility that if you have many packages running simultaneously and they have many Data Flow Tasks and all of them are at 100MB, you will run out of memory on your SSIS machine rather quickly causing the packages to push buffers to disk. If this happens and you notice performance degrading, but it happens randomly, look to lower this number. While it might seem counter intuitive to pull fewer records in each round, if I don't have to write these buffers to disk, it will be a faster total run time.