Sunday, March 17, 2013

Informatica Transformation Performance Tuning Tips:

Look up Tips

1) If lookups contain a good deal of data (meaning that it would take far too long to cache and/or will not fit in memory), moving the fields to the source qualifier, and joining with the main table should be considered. The ONLY exception to this should be if when much data is not processed through the lookup.  In such a case using an uncached look up may be considered.

2) If the cache files need to be saved and reused, the transformation can be configured to use a persistent cache. A persistent cache can be used when the lookup  table does not change between session runs. The first time the Informatica Server runs a session using a persistent lookup cache, it saves the cache files to disk instead ofdeleting them. The next time the Informatica Server runs the session, it builds the memory cache from the cache files. If the lookup table changes occasionally, the session properties need to be overridden to re-cache the lookup from the database.

3) Speaking of initial loading overhead, all data read into a cache is read in by the order of the fields listed in the lookup ports. If there is an index that is even partially in this order, the loading of these lookups can be sped up substantially if there is a large amount of data. Also, as many fields as are used in the mapping need to be kept.

Deleting unused columns can save a substantial amount of time.

4) If a lookup is used several times in a mapping, and has not been changed, it will, by default, keep the cache built earlier for subsequent loads in the mapping. If this feature is not needed (for example, the first transform may have updated the data, and now the second uses it in a lookup), ‘Recache from Database’ option needs to be checked.

5) Dynamic lookup cache can be used when the target table is also the lookup table. When a dynamic cache is used, the Informatica Server updates the lookup cache as it passes rows to the target. The Informatica Server builds the cache when it processes the first lookup request. It queries the cache based on the lookup condition for each row that passes into the transformation. When the Informatica Server reads a row from the source, it updates the lookup cache by either Inserting the row into the cache or by Updating the row in the cache or by Making no change to the cache.

6) If the lookup table is on the same database as the source table in the mapping and there is a lot of data which is not feasible to cache, the tables can be joined in the source database rather than using a Lookup transformation.

7) The Informatica Server generates an ORDER BY statement for a cached lookup by default that contains all lookup ports. To increase performance, the default ORDER BY statement can be suppressed and an override ORDER BY with fewer columns can be written.

8) If a Lookup transformation specifies several conditions, the lookup performance can be improved by placing all the conditions that use the equality operator first in the list of conditions that appear under the Condition tab.

Joiner Tips

1) A joiner transformation can be implemented only when both input pipelines begin with the different data source and when both input pipelines originate from different Source Qualifier transformation and when both input pipelines originate from different Normalizer transformation and when both input pipelines originate from the different Joiner transformation and when both input pipelines do not contain an Update Strategy transformation and when both input pipelines do not contain a connected or unconnected Sequence Generator transformation.

2) Performing a joiner in database is much faster than performing a join in the mapping. Performing a database join may not be possible when the sources are flat files or when data is coming from different databases. In all other cases, database join can be implemented by the following ways depending on the requirement.

  • Writing all the join conditions in the Source Qualifier. This can be done simultaneously on more than two input tables.
  • Creating a pre-session stored procedure to join the tables in a database.

3) Always make the source which is expected to get less number of records as the master source. This will reduce the size of the data cache thus reducing the time taken for fetching the data.

Filter Tips:

1) In order to maximize session performance, whenever filter transformation is being used, care should be taken such that it lies as close as possible to the sources in the mapping. Rather than processing the rows that need to be  discarded through the mapping it is recommended to filter unwanted data early in the flow from sources to targets.

2) The Source Qualifier transformation can be used as an alternative to filter transformation. Rather than filtering rows from within a mapping,  the Source Qualifier transformation can be used to filter rows when read from a source. The main difference is that the source qualifier limits the row set extracted from a source, while the Filter transformation limits the row set sent to a target. Since a source qualifier reduces the number of rows used throughout the mapping, it provides better performance. However, the source qualifier can only be used to filter
rows from relational sources, while the Filter transformation filters rows from any type of source. Also, standard SQL queries need to used in Source Qualifier since it runs in the database. The Filter transformation can define a condition using any statement or transformation function that returns either a TRUE or FALSE value.

3) Using complex expressions should be avoided since using simple integer or true/false expressions in the filter condition will highly optimize the Filter transformation’s performance.

4) A Filter transformation can be used to drop rejected rows from an Update Strategy transformation if it is not needed to keep rejected rows.

Aggregator Tips

1) While using Aggregator transformations the group columns should be as simple as possible. Numbers should be used as GROUP BY ports instead of string and dates ports if possible since this will increase the performance. Also using complex expressions in the Aggregator expressions should be avoided.

2) Aggregator expressions performance can be improved by using sorting data as input and using the Aggregator Sorted Input option on. The Sorted Input decreases the use of aggregate caches since the Informatica Server assumes all data is sorted by group. As the Informatica Server reads rows for a group, it performs aggregate calculations and stores group information in memory. The Sorted Input option reduces the amount of data cached  during the session and improves performance. The order by option can be specified in Source Qualifier passing the sorted data to the Aggregator transformation.

3) If changes from the source that changes less than half the target can be captured, Incremental Aggregation can be used to optimize the performance of Aggregator transformations. When using incremental aggregation, captured changes in the source can be applied to aggregate calculations in a session. The Informatica Server updates target  incrementally, rather than processing the entire source and recalculate the same calculations every time the session
is run.

4) The number of connected input/output or output ports should be as less as possible in order to reduce the amount of data the Aggregator transformation stores in the data cache.

5) If a Filter transformation is being used in the mapping, the same needs to be placed before the Aggregator transformation to reduce unnecessary aggregation.

6) Aggregate Transformation object can be replaced with an Expression Transformation object and an Update Strategy Transformation for certain types of Aggregations.

Sorter Tips

1) The Sorter transformation should be avoided whenever possible by implementing the same in Source qualifier transformation using GROUP BY clause.

2) The Sorter transformation can be configured for performing duplicate removal, by selecting the Distinct Output Rows option the Mapping Designer automatically configures all ports as part of the sort key alongside discarding duplicate rows compared during the sort operation.

3) The Sorter Cache Size property is used to indicate the maximum amount of memory that is to be allocated to
perform the sort operation which can be set to any value between one megabyte and four gigabytes. The Informatica Server need to pass all the incoming data into the Sorter transformation before performing the sort operation and will fail if it cannot allocate enough memory for the sort operation. Informatica recommends allocating at least 8,000,000 bytes of physical memory to sort data using the Sorter transformation which is also
the default size. If the amount of incoming data is greater than the amount of Sorter cache size, the data is temporarily stored in the Sorter transformation work directory. The Informatica Server requires disk space of at least twice the amount of incoming data when storing data in the work directory. If the amount of incoming data is
significantly greater than the Sorter cache size, the Informatica Server may require much more than twice the amount of disk space available to the work directory.

Update Strategy Tips:

1) Rejected rows from an Update Strategy are logged to the Bad File. Filtering of these rows should be considered if retaining these rows is not critical because logging causes extra overhead on the engine.

2) When all the input data needs to be inserted into a target table, Insert option should be selected for the Treat
Source Rows As session property. Also, Insert option should be selected for all target instances in the session.

3) When all the input data needs to be deleted, Delete option should be selected for the Treat Source Rows As session property. Also, Delete option should be selected for all target instances in the session.

4) When the input data needs to be updated on the contents of a target table, Update option should be selected for the Treat Source Rows As session property. Also, Update option should be selected for each target instance.

5) When different database operations need to be performed with different rows destined for the same target table, either the DECODE or IIF function can be used to flag rows for different operations (insert, delete, update, or reject). Also, Data Driven should be selected for the Treat Source Rows As session property. Insert, Delete, or one of the Update options should be selected for each target table instance.

6) When the input data needs to be rejected, DECODE or IIF function can be used to specify the criteria for rejecting the row. Also, Data Driven option should be selected for the Treat Source Rows As session property.

Sequence Generator Tips:

1) Whenever ports CURRVAL and NEXTVAL are connected to an output transformation, CURRVAL gets the NEXTVAL value plus one value. If NEXTVAL is not used and CURRVAL alone is connected to a target, CURRVAL with fetch the same constant value for each output record.

2) An update object hooked to a sequence number generator should generally go to one target table. If it is attached to two tables, no matter what objects intervene, each table will have unique numbers. If a different table should have the same sequence number, one table should be read to get the number for the other tables.

3) If there is more than one Informatica mappings which write to the same table (where a sequential value is needed), they should use the sequence number generator as a reusable object or a shortcut. If there are non- Informatica routines which write to the same table, using a trigger or some database method should be considered. (Such as the Transact-SQL identity option).

Stored Procedure Tips:

1) Each time a stored procedure runs during a mapping, the session must wait for the stored procedure to complete in the database. In order to reduce the time taken for this, the input count of rows sent to Stored procedure should be reduced. For this an active transformation can be placed prior to the Stored Procedure transformation to reduce the number of rows that must be passed to the stored procedure or else an expression that tests the values before passing them to the stored procedure can be created to make sure that the value does not really need to be passed.

2) Instead of calculating complex logics in Stored procedure, an expression transformation can be created and most of the logic used in stored procedures can be easily replicated using expressions in the Designer.

3) Whenever changes are made to the Stored procedure in the database or if the input/output parameters or return value is changed in a stored procedure, the Stored Procedure transformation becomes invalid. In such cases the stored procedure definition should either be imported again or should be configured manually by adding, removing, or modifying the appropriate ports.

4) Whenever a stored procedure runs, it issues a status code that notifies whether or not the stored procedure completed successfully. This code is used by the Informatica Server to determine whether to continue running the session or stop. The Workflow Manager can be configured to continue or stop the session in the event of a stored procedure error.

No comments:

Post a Comment