Data Warehouse Guide
You are here: Home Datastage tutorial

Datastage Tutorial

IBM websphere datastage tips, best practices, optimization tutorials

IBM InfoSphere DataStage is an ETL tool and part of the IBM Information Platforms Solutions suite and the IBM InfoSphere. It uses a graphical notation to construct data integration solutions and is available in various versions such as the Server Edition and the Enterprise Edition.

IBM WebSphere DataStage TX's Solutions-Oriented Architecture is open and scalable, which means we can rapidly adapt our technology to meet specific industry needs - so you can accelerate implementation, reduce risks, and increase operational efficiencies.

This section of DataWG.com will give some of the best practices of datastage including tips and tricks, optimizations and more.




Debugging DataStage parallel jobs

E-mail Print PDF
Some useful tips of how to debug parallel jobs in datastage.
  • Enable the following environment variables in DataStage Administrator:
* APT_PM_PLAYER_TIMING – shows how much CPU time each stage uses
* APT_PM_SHOW_PIDS – show process ID of each stage
* APT_RECORD_COUNTS – shows record counts in log
* APT_CONFIG_FILE – switch configuration file (one node, multiple nodes)
* OSH_DUMP – shows OSH code for your job. Shows if any unexpected settings were set by the GUI.
* APT_DUMP_SCORE – shows all processes and inserted operators in your job
* APT_DISABLE_COMBINATION – do not combine multiple stages in to one process. Disabling this will make it easier to see where your errors are occurring.
  • Use a Copy stage to dump out data to intermediate peek stages or sequential debug files. Copy stages get removed during compile time so they do not increase overhead.
  • Use row generator stage to generate sample data.
  • Look at the phantom files for additional error messages: c:\datastage\project_folder\&PH&
  • To catch partitioning problems, run your job with a single node configuration file and compare the output with your multi-node run. You can just look at the file size, or sort the data for a more detailed comparison (Unix sort + diff commands).
 
Interview Questions Data minining blog