Process Control / Audit of Workflows in Informatica

Posted by Badri Narayanan
Comments (19)
September 2nd, 2009

1. Process Control – Definition

Process control or Auditing of a workflow in an Informatica is capturing the job information like start time, end time, read count, insert count, update count and delete count. This information is captured and written into table as the workflow executes

2. Structure of Process Control/Audit table

The table structure of process control table is given below,

Table 1: Process Control structure

PROCESS_RUN_ID Number(p,s) 11 A unique number used to identify a specific process run.
PROCESS_NME Varchar2 120 The name of the process (this column will be populated with the names of the informatica mappings.)
START_TMST Date 19 The date/time when the process started.
END_TMST Date 19 The date/time when the process ended.
ROW_READ_CNT Number(p,s) 16 The number of rows read by the process.
ROW_INSERT_CNT Number(p,s) 16 The number of rows inserted by the process.
ROW_UPDATE_CNT Number(p,s) 16 The number of rows updated by the process.
ROW_DELETE_CNT Number(p,s) 16 The number of rows deleted by the process
ROW_REJECT_CNT Number(p,s) 16 The number of rows rejected by the process.
USER_ID Varchar2 32 The etl user identifier associated with the process.

3.  Mapping Logic and Build Steps

The process control flow has two data flows, one is an insert flow and the other is an update flow. The insert flow runs before the main mapping and update flows runs after the main mapping, this option is chosen in “Target Load Plan”. The source for both the flows could be a dummy source which will return one record as output, for example select ‘process’ from dual or select count(1) from Table_A. The following list of mapping variable is to be created,

Table 2: Mapping Parameter and variables


Steps to create Insert flow:

  • 1. Have “select ‘process’ from dual” as Sequel in source qualifier
  • 2. Have a sequence generator to create running process_run_Id ‘s
  • 3. In an expression SetVariable ($$PROCESS_RUN_ID,NEXTVAL), $$PROCESS_NAME to o_process_name, a output only field
  • 4. In an expression assign $$SessionStarttime to o_Starttime, an output only field
  • 5. In an expression accept the sequence id from sequence generator
  • 6. Insert into target’ process control table’ with all the above three values

Table 3: Process Control Image after Insert flow

START_TMST 8/23/2009 12:23

Steps in main mapping,

  • 1. After the source qualifier, increment the read count in a variable (v_read_count) for each record been read in an expression and SetMaxVariable ($$READ_COUNT,v_read_count)
  • 2. Before the update strategy of target instances, do the same for Insert/Update/Delete counts; all the variables are now set with all their respective counts

Steps to create Update flow:

  • 1. Have “select ‘process’ from dual” as Sequel in source qualifier
  • 2. Use SetMaxvariable to get the process_run_id created in insert flow
  • 3. In an expression assign $$INSERT_COUNT to an o_insert_count, a output only field, assign all the counts in the same way
  • 4. In an expression assign $$SessionEndtime to o_Endtime, an output only field
  • 5. Update the target ‘Process Control Table’ with all the above three values where process_run_id equals the process_run_id generated in Insert flow

Table 4: Process Control Image after Update flow

START_TMST 8/23/2009 12:23
END_TMST 8/23/2009 12:30

4. Merits over Informatica Metadata

This information is also available in Informatica metadata, however maintaining this within our system has following benefits,

  • Need not write complex query to bring in the data from metadata tables
  • Job names need not be mapping names and can be user friendly names
  • Insert/Delete/Update counts of all as well as individual target can be audited
  • This audit information can be maintained outside the metadata security level and can be used by other mappings in their transformations
  • Can be used by mappings that build parameter files
  • Can be used by mappings that govern data volume
  • Can be used by Production support to find out the quick status of load

Comments (19)

Badrinarayanan - July 19th, 2011

Venkat, thanks for trying out, good that you have understood the framework, they way you have assigned"SetVariable" to null before assigning current counts makes sense.. thanks

venkat - June 20th, 2011

Ok I dont know how many of them have really used this...but we used this but there are changes though...Bhadri I think we need to write SETVARIABLE($$VARIABLE,NULL) for all the output ports else we are going to get previous repository saved values.

venkat - June 20th, 2011

Ok The setmaxvariable is going to insert the previous run details when the workflow runs second time. The current row count is inserted the next time.

venkat - June 20th, 2011

Hi Bhadri, How are you using a SETMAX variiable.....this will not work if the session properties are UPDATE ELSE INSERT or UPDATE as UPDATE. And in your main mapping you do not have UPDATE transfromation. yOur screenshots are confusing ...could you please clarify how are you capturing UPDATE and INSERT counts with SETMAXVARIABLE....

khan - September 17th, 2010

Hello Badri, Could you shed some light on the structure of Pre and Post source tables. Are they physical tables ? If not, how do we suppose to create them? Another problem i have been facing is to get the current time of Informatica Server. After every run, the server retains the time and spits out the next time. It continues on and everytime it reflects the timestamp of last run. Is there a way to override that? Thanks.

Gunja - September 13th, 2010

Hi Badri, Thanks for the beautiful post. Its true we often don't get access to the metadata due to security reasons. Just wanted to ask do we have options of capturing rejected rows information like Tony asked above "constraints violation rejects " or any rejects by Oracle??

Badri Narayanan - September 13th, 2010

Yes Ragu, 1) Second pipeline is your original mapping. 2) v_Insert_count port is to keep increment the records marked for insert and assign the v_insert_count value into $$INSERT_COUNT

Ragu - September 13th, 2010

Hi Badri Is it true that second pipeline is our original mapping? wats use of creating variable port of v_insert_count port ?are we doing any calculation there?

Ragu - September 13th, 2010

Hi wats my source structure here for all of my pipelines..

anshuman - August 31st, 2010

thanks for good blog Badri

tony - May 27th, 2010

How can we capture rejects like records rejectd by constraits voilation by oracle if target table is in oracle.

Badri Narayanan - May 20th, 2010

To Bala, There are two choices, Use OPB_WFLOW_RUN or create one I said above. The later has above discussed advantages. To your question, the second one needs to be created manually.

Bala - May 20th, 2010

Hi Badri, I have a basic doubt, Where can we find the audit table, Is this a default table available or we create manually and Is this option available in Infa 7.1?? Please reply. Thanks, Bala

Badri Narayanan - November 24th, 2009

I appreciate your valuable time in bringing more clarity to this purpose, please consider cases where logical delete would be captured as an update in metadata,cases where single update key would result in multiple update to databases. Cases in where our area of interest is a match would chose that way thanks,

srinivas o - November 23rd, 2009

Hi Badri, Yes in prod preferable way is Fail on error, But not all implementaions do that way, I have seen 2 financial institutions who did't agree and wanted to go with process till they reach certain threshhold, But yes I strongly belive in failing workflows as soon as it hits rejection. But I was wondering the information that you stored can be retrived with Query on metadata, Yes like you said complex query, But worth working on it as you don't duplicate what informatica already does and buliding query is one time effort, also you have to capture this info for all Workflows and sessions that you run and maintain that code as well. Of all the benefits I agree with Security aspect and that could be real diffrenciator if you want to go with this approach or use Informatica metadata. Otherwise we have achived most of other benifits using Metedata. But yes Big organizations with very tight standards would't even let you touch the metedata, So this approach hepls in those cases and We have done similar way for one of the financial institutions.

Badri Narayanan - November 19th, 2009

Srinivas, you are right but not Correct. The "fail on error" is to be set as a best practice in production environment. In case of session failure it is outmoded idea to check audits.

srinivas o - November 17th, 2009

This is not a best way as you have not finished inserting data into target when you are incrementing counts. If data is rejected in database and fail on error is not set, this does't work.

Badri Narayanan - November 5th, 2009

Hi Neena, Process Control and Control tables are different. "Process control or Auditing of a workflow in an Informatica is capturing the job information like start time, end time, read count, insert count, update count and delete count". Control tables are used in change data capture in systems like siebel crm. Incremental data loading or change data capture is under progress. Check the blog in a week time to find out that. Thanks, Badri

neena - November 1st, 2009

Hi Badri, Thanks for your posting. I have small doubt, when I was interviewd by a person he was asking me abt Incremental data loading. I have explained to him,but he was asking abt the control tables concept while explaning incremental data loading.Is it the same way we have to follow what ever u have expalined in the "Process Control / Audit of Workflows in Informatica " or is there any other way to follow that? can you please explain me clearly. Thanks for your help in advance.

Comments are closed.