Pages

Thursday, March 21, 2013

What is partition in informatica? Types of partitions

Partition:

  •   A pipeline consists of a source qualifier and all the transformations and targets that receive data from that source qualifier.
  •   By default, the Integration Service creates one partition in every pipeline stage.
  •   Partitioning means to configure how the Integration Service reads data from the source, distributes rows of data to each transformation, and writes data to the target. You can configure the number of source and target connections to use.
  •   When the Integration Service runs the session, it can achieve higher performance by partitioning the pipeline and performing the extract, transformation, and load for each partition in parallel.
Partition points:









  •   Partition points mark thread boundaries and divide the pipeline into stages.The Integration Service redistributes rows of data at partition points.   
  •   If a partition point is set at a Filter transformation and define multiple partitions, the Integration Service can redistribute rows of data among the partitions before the Filter transformation processes the data. 
  •    By default, the Integration Service sets partition points at various transformations in the pipeline
  •    The partition type set at this partition point controls the way in which the Integration Service passes rows of data to each partition
Partition Points – Transformation Partition Points :
  Partition Point
  Description  
  Restrictions
   Source Qualifier
   Normalizer
   Controls how the Integration Service extracts data from the source and passes it to the source qualifier.
  Cannot delete this partition point.
   Rank
   Unsorted Aggregator
 Ensures that the Integration Service groups rows properly before it sends them to the transformation.
  Can delete these partition points if the pipeline contains only one partition or if the Integration Service passes all rows in a group to a single partition before they enter the transformation.
   Target Instances
   Controls how the writer passes data to the targets
   Cannot delete this partition point.
   Multiple Input Group
  The Workflow Manager creates a partition point at a multiple input group transformation when it is configured to process each partition with one thread, or when a downstream multiple input group Custom transformation is configured to process each partition with one thread.
  This ensures that the Integration Service uses one thread to process each partition at a Custom transformation that requires one thread per partition.
   Cannot delete this partition point.


Partition Points – Rules and Guidelines:

The following guidelines apply to adding and deleting partition points:
  •    Cannot create a partition point at a source instance.
  •  Cannot create a partition point at a Sequence Generator transformation or an unconnected   transformation.
  •    Can add a partition point at any other transformation provided that no partition point receives input from more than one pipeline stage. 
  •    Cannot delete a partition point at a Source Qualifier transformation, a Normalizer transformation for COBOL sources, or a target instance. 
  •   Cannot delete a partition point at a multiple input group Custom transformation that is configured to use one thread per partition.
  •  Cannot delete a partition point at a multiple input group transformation that is upstream from a multiple input group Custom transformation that is configured to use one thread per partition.
The following partition types have restrictions with dynamic partitioning: 
  •    Pass-Through : When you use dynamic partitioning, if you change the number of partitions at a partition point, the number of partitions in each pipeline stage changes. 
  •   Key Range: To use key range with dynamic partitioning you must define a closed range of numbers or date keys. If you use an open-ended range, the session runs with one partition.
Types of Partition:

The partition type controls how the Integration Service distributes data among partitions at partition points

Following are the types of Partition:
  •  Database Partitioning
  •  Hash Auto - Keys
  •  Hash User –Keys
  •  Key Range
  •  Pass Through
  •  Round Robin
  •  Dynamic Partitioning
Database Partitioning: 
  •   The session performance can be optimized by using the database partitioning partition type for source and target databases:
  •   In source database partitioning, the Integration Service queries the database system for table partition information and fetches data into the session partitions. 
  •   In target database partitioning, the Integration Service loads data into corresponding database partition nodes.
  •   Use database partitioning for Oracle and IBM DB2 sources and IBM DB2 targets. Use any number of pipeline partitions and any number of database partitions.
  •   Session performance improves when the number of pipeline partitions equals the number of database partitions
The Integration Service uses the following rules for source database partitioning:
  •   In Database partitioning for a database other than Oracle or IBM DB2, the Integration Service reads the data in a single partition and writes a message to the session log
  •   If the number of session partitions is more than the number of partitions for the table in the database, the excess partitions receive no data. The session log describes which partitions do not receive data
  •   If the number of session partitions is less than the number of partitions for the table in the database, the Integration Service distributes the data equally to the session partitions. Some session partitions receive data from more than one database partition.
  •   When database partitioning is used with dynamic partitioning, the Integration Service determines the number of session partitions when the session begins.
  •   Session performance with partitioning depends on the data distribution in the database partitions. The Integration Service generates SQL queries to the database partitions. The SQL queries perform union or join commands, which can result in large query statements that have a performance impact.
Rules and Guidelines for Source Database Partitioning :
  •   Cannot use database partitioning when the session is configured to use source-based or user-defined commits, constraint-based loading, or workflow recovery.
  •   When a source qualifier is configured for database partitioning, the Integration Service reverts to pass-through partitioning under the following circumstances:
  •   The database table is stored on one database partition.
               1.  Run the session in debug mode.
               2.  Specify database partitioning for a session with one partition.
               3.  Use pushdown optimization. Pushdown optimization works with the other partition types.
  •   When an SQL override is created to read database tables and database partitioning is set, the Integration Service reverts to pass-through partitioning and writes a message to the session log.
  •   If a user-defined join is created, the Integration Service adds the join to the SQL statements it generates for each partition.
  •   If a source filter is created, the Integration Service adds it to the WHERE clause in the SQL query for each partition.
Rules and Guidelines for Target Database Partitioning:
  •   Cannot use database partitioning when the session is configured to use source-based or user-defined commit, constraint-based loading, or session recovery.
  •   The target table must contain a partition key, and must link all not-null partition key columns in the target instance to a transformation in the mapping.
  •   Must use high precision mode when the IBM DB2 table partitioning key uses a Bigint field. The Integration Service fails the session when the IBM DB2 table partitioning key uses a Bigint field and low precision mode is used.
  •   If multiple partitions for a DB2 bulk load session is created, use database partitioning for the target partition type. If any other partition type is used, the Integration Service reverts to normal load and writes the following message to the session log:
  •   If a session is configured for database partitioning, the Integration Service reverts to pass-through partitioning under the following circumstances:
               1. The DB2 target table is stored on one node.
               2. Run the session in debug mode using the Debugger.
               3. Configure the Integration Service to treat the database partitioning partition type as pass-through        
                   partitioning and database partitioning is used  for a non-DB2 relational target.

Hash User Keys Partitioning:






  •   Hash function is used to group rows of data among partitions based on a user-defined partition key
  •   Sorter Transformation sorts item by ITEM_DESC.
  •   Therefore, ITEM_ID can be specified as the hash key in Hash User Key Partitioning
  •   If ITEM_DESC is too long then ITEM_ID which is unique for each item can be used.
  •   Hash Function usually processes numerical data more quickly than string data
        Key Range Partitioning:







        •    Distributes rows of data based on a port or set of ports that is defined as the partition key and sends to appropriate partition. 
        •    Key range partitioning at the target partitioned  by ITEM_ID.
        •    Specify Key Ranges for the partitioned port.





















        Pass Through Partitioning:





        •   Reader Thread (First Stage)
        •   Transformation Thread (Second Stage)
        •   Writer Thread (Third Stage)
        •   In pass-through partitioning, the Integration Service processes data without redistributing rows among partitions









        :
        Round Robin Partitioning:






        1. Mapping reads item information from 3 source file - > 
               Source file 1: 80,000 rows, 
               Source file 2: 5,000 rows, 
               Source file 3: 15,000 rows
        2. Use round-robin partitioning where each partition has to  process approximately the same number of rows
        3. Round-robin partitioning at the Filter transformation distributes data and thereby workload evenly

        Dynamic Partitioning:





























        Note: Do not configure dynamic partitioning for a session that contains manual partitions. If dynamic partitioning is set to a value other than disabled and the session is partitioned is manually partition, the session is invalid.

        The following tasks has to be completed to scale session partitions with dynamic partitioning:
        •  Set the Partitioning: The Integration Service increases the number of partitions based on the partitioning method chosen. 
        •   Set session attributes for Dynamic Partitions: Set session attributes that identify source and target file names and directories. The session uses the session attributes to create the partition-level attributes for each partition it creates at run time. 
        •   Configure Partition Types and Partition Points: Edit partition points and partition types using the Partitions view on the Mapping tab of session properties.
        Using Dynamic Partitioning with Partition Types:
        •   Pass-Through Partitioning: If the number of partitions at a partition point are changed, the number of partitions in each pipeline stage changes. If pass-through partitioning is used with a relational source, the session runs in one partition in the stage.
        •  Key Range Partitioning: Must define a closed range of numbers or date keys to use dynamic partitioning. The keys must be numeric or date datatypes. Dynamic partitioning does not scale partitions with key range partitioning on relational targets.
        •  Database partitioning: When use database partitioning, the Integration Service creates session partitions based on the source database partitions. Use database partitioning with Oracle and IBM DB2 sources.
        •   Hash auto-keys, hash user keys, or round-robin: Use hash user keys, hash auto-keys, and round-robin partition types to distribute rows with dynamic partitioning. Use hash user keys and hash auto-keys partitioning when the Integration Service has to distribute rows to the partitions by group. Use round-robin partitioning when the Integration Service has to distribute rows evenly to partitions.
        Rules and Guidelines for Dynamic Partitioning:
        •  Dynamic partitioning uses the same connection for each partition
        •  Cannot use dynamic partitioning with XML sources and targets
        •  Cannot use dynamic partitioning with the Debugger
        •  When dynamic partitioning is set to a value other than disabled, and the session is partitioned manually on the Mapping tab, the session is invalidated
        •  The session fails if a parameter other than $DynamicPartitionCount to set the number of partitions is used
        The following dynamic partitioning configurations cause a session to run with one partition if :
        •  The default cache directory for an Aggregator, Joiner, Lookup, or Rank transformation is overridden. The Integration Service partitions a transformation cache directory when the default is $PMCacheDir. 
        •  The Sorter transformation default work directory is overridden. The Integration Service partitions the Sorter transformation work directory when the default is $PMTempDir.
        •  An open-ended range of numbers or date keys with a key range partition type is used.
        •  Datatypes other than numbers or dates as keys in key range partitioning are used. 
        •  Key range relational target partitioning is used.
        •  A user-defined SQL statement or a user-defined source filter is created.
        •  Dynamic partitioning is set to the number of nodes in the grid, and the session does not run on a grid and a message is logged in the session log. 
        •  Pass-through relational source partitioning is used. 
        •  Dynamic partitioning with an Application Source Qualifier is used.
        •  SDK or PowerConnect sources and targets with dynamic partitioning is used.

        Pipeline Partitioning:









        •   Partitioning the pipeline means performing the extract, transformation, and load for each partition in parallel.
        •   Each mapping contains one or more pipelines where a pipeline consists of  source qualifier, all the transformations and targets
        •   When three partitions are defined across the mapping, the master thread creates three threads at each pipeline stage, for a total of 12 threads.
        Cache Partitioning:



























        •   When a session with multiple partitions is created, the Integration Service may use cache partitioning for the Aggregator, Joiner, Lookup, Rank, and Sorter transformations and allocates the configured  cache size to each partition
        •   To improve performance, configure separate directories for each partition
        •  Each cache contains only the rows needed by that partition. As a result, the Integration Service requires a portion of total cache memory for each partition. 
        Transformation
        Description
          Aggregator Transformation
          Do not have to set a partition point at the Aggregator transformation when multiple partitions in a session with an Aggregator transformation is created
          Joiner Transformation  
          Create a partition point at the Joiner transformation
          Lookup Transformation
          Create a hash auto-keys partition point at the Lookup transformation
          Rank Transformation
          Do not have to set a partition point at the Rank transformation when You multiple partitions in a session with a Rank transformation is created
          Sorter Transformation 
         Do not have to set a partition point at the Sorter transformation when multiple partitions in a session with a Sorter transformation is created

        Partitioning Rules:

        Partition Restrictions for Editing Objects
          When object properties are edited, ability to create multiple partitions in a session or to run an existing session with multiple partitions gets impacted.
        Before  Session Creation
           When a session is created, the Workflow Manager checks the mapping properties. Mappings dynamically pick up changes to shortcuts, but not to reusable objects. Therefore, if a reusable object is edited in the Designer after a mapping is saved and before a session is created, the mapping must be opened and resaved for the Workflow Manager to recognize the changes to the object.
        After Session Creation with Multiple Partitions
          When a mapping is edited after a session with multiple partitions is created, the Workflow Manager does not invalidate the session even if the changes violate partitioning rules. The Integration Service fails the session the next time it runs unless the session is edited so that it no longer violates partitioning rules.
        Partition Restrictions for PowerExchange
          Specify multiple partitions for PowerExchange and PowerExchange Client for PowerCenter

        The following changes to mappings can cause session failure:
        • Delete a transformation that was a partition point.
        • Add a transformation that is a default partition point.
        • Move a transformation that is a partition point to a different pipeline.
        • Change a transformation that is a partition point in any of the following ways:
        -The existing partition type is invalid
        -The transformation can no longer support multiple partitions
        -The transformation is no longer a valid partition point
        • Disable partitioning or change the partitioning between a single node and a grid in a transformation after a pipeline with multiple partitions is created.
        • Switch the master and detail source for the Joiner transformation after a pipeline with multiple partitions is created.

        No comments:

        Post a Comment