SSIS best practices - Use partitionaing feature

PDF
One of the main tenets of scalable computing is to partition problems into smaller, more manageable chunks. This allows you to more easily handle the size of the problem and make use of running parallel processes in order to solve the problem faster.

For ETL designs, you will want to partition your source data into smaller chunks of equal size. This latter point is important because if you have chunks of different sizes, you will end up waiting for one process to complete its task. For example, looking at the graph below, you will notice that for the four processes executed on partitions of equal size, the four processes will finish processing January 2008 at the same time and then together continue to process February 2008. But for the partitions of different sizes, the first three processes will finish processing but wait for the fourth process, which is taking a much longer time. The total run time will be dominated by the largest chunk.
partitioninig
To create ranges of equal-sized partitions, use time period and/or dimensions (such as geography) as your mechanism to partition. If your primary key is an incremental value such as an IDENTITY or another increasing value, you can use a modulo function. If you do not have any good partition columns, create a hash of the value of the rows and partition based on the hash value. For more information on hashing and partitioning, refer to the Analysis Services Distinct Count Optimization white paper; while the paper is about distinct count within Analysis Services, the technique of hash partitioning is treated in depth too.

Partitioning tips:

  • Use partitioning on your target table. This way you will be able to run multiple versions of the same package, in parallel, that insert data into different partitions of the same table. When using partitioning, the SWITCH statement is your friend. It not only increases parallel load speeds, but also allows you to efficiently transfer data.
  • As implied above, you should design your package to take a parameter specifying which partition it should work on. This way, you can have multiple executions of the same package, all with different parameter and partition values, so you can take advantage of parallelism to complete the task faster.
  • From the command line, you can run multiple executions by using the “START” command.
 
Home SSIS Tutorial SSIS best practices - Use partitionaing feature