Business Intelligence Tutorial, Data Warehouse guide

You are here: Home Abinitio tutorial

Ab Initio ETL Software

AbInitio ETL Software tutorial, best practices, tips

The Ab Initio software is a fourth generation powerful data analysis, batch processing, data manipulation graphical user interface (GUI)-based parallel processing tool which is commonly used to extract, transform and load (ETL) data.

The Ab Initio software is a suite of products which together provide a platform for data processing applications. The Core Ab Initio products are:

  • Co>Operating System
  • The Component Library
  • Graphical Development Environment
  • Enterprise Meta>Environment
  • Data Profiler
  • Conduct>It

This section will include useful tips about Ab Initio usage, best practices guidelines and some optimization methods.




Ab initio - Tips to Improve Performance of graphs

E-mail Print PDF
There are many ways to improve the performance of the graphs in Abinitio. While creating graphs try to optimize performance as far as possible. The following tips can be used:
  1. Try to avoid pulling large tables from the database to the Abinitio server as far as possible.
  2. Try minimizing the number of components in the graphs as far as possible.
  3. Maintain lookups, local variables and global variables for better efficiency.
  4. Try to use partition and gather components for parallel processing.
  5. The manuals also provide adequate information about improving performance.
  6. It is advisable to execute a graph at the prompt mode in the Abinitio server. The scripted version of the graph is available in the run folder of the sand box.
  7. In case you are testing a graph that involves a huge volume of data for errors, initially test it for a few records (using the leading record component or the sample component) before testing it for all the records.
  8. In case you want to view data that is being transformed in the intermediate components use the watcher facility provided by Abinitio. This will help in error correction.
  9. Ensure that all the graphs where we are using RDBMS tables as input, the join condition is on indexed columns. If not then ensure that indexes should be there. This is very important because if indexes are absent then there would be full table scan thereby resulting in very poor performance. Before execution of any graph use Oracle's explain Plan utility to find the execution path of query.
  10. Ensure that if there are indexes on the target table, they are dropped before running the graph and recreated after the graph is run (if oracle is being the database one can use SKIP_INDEX_MAINTENANCE.
  11. If possible try to perform the sort or aggregation operation of data in the source tables at the database server itself (provided you are using RDBMS as a source and not a flat file). SQL order by or group by clause will be much faster than ab initio because invariably the database server would be more powerful than the ab initio server (even otherwise SQL order by or group by is done efficiently (compared to any ETL tool) because Oracle runs the statement in optimal mode).
 
Page 3 of 3
Interview Questions Data minining blog