SSIS best practices - Perform tasks ONLY where it performs better

PDF
There are some things that Integration Services does well – and other tasks where using another tool is more efficient. Your tool choice should be based on what is most efficient and on a true understanding of the problem. To help with that choice, consider the following points:

Sorting: Do not sort within Integration Services unless it is absolutely necessary. In order to perform a sort, Integration Services allocates the memory space of the entire data set that needs to be transformed. If possible, presort the data before it goes into the pipeline. If you must sort data, try your best to sort only small data sets in the pipeline. Instead of using Integration Services for sorting, use an SQL statement with ORDER BY to sort large data sets in the database – mark the output as sorted by changing the Integration Services pipeline metadata on the data source.

Set-based operations: There are times where using Transact-SQL will be faster than processing the data in SSIS. As a general rule, any and all set-based operations will perform faster in Transact-SQL because the problem can be transformed into a relational (domain and tuple) algebra formulation that SQL Server is optimized to resolve. Also, the SQL Server optimizer will automatically apply high parallelism and memory management to the set-based operation – an operation you may have to perform yourself if you are using Integration Services. Typical set-based operations include:
  • Set-based UPDATE statements - which are far more efficient than row-by-row OLE DB calls.
  • Aggregation calculations such as GROUP BY and SUM. These are typically also calculated faster using Transact-SQL instead of in-memory calculations by a pipeline.
Delta selection: Delta detection is the technique where you change existing rows in the target table instead of reloading the table. To perform delta detection, you can use a change detection mechanism such as the new SQL Server 2008 Change Data Capture (CDC) functionality. If such functionality is not available, you need to do the delta detection by comparing the source input with the target table. This can be a very costly operation requiring the maintenance of special indexes and checksums just for this purpose. Often, it is fastest to just reload the target table. A rule of thumb is that if the target table has changed by >10%, it is often faster to simply reload than to perform the logic of delta detection.
 
Home SSIS Tutorial SSIS best practices - Perform tasks ONLY where it performs better