Pattern Matching in Informatica Power Center

Data & AI Solutions

December 20, 2013

We are having several functions in Informatica, and each function will be used depends upon the requirements. One of the most useful and less used functions is Regular Expressions. That is, ‘REG_’ functions. We can find these functions under Data Cleansing in Functions. There are 3 regular expressions available in Informatica: REG_EXTRACT, REG_MATCH and REG_REPLACE. All these regular expressions are pattern based functions rather than format based (example, date format) or data based. Means that, when the records are dynamic we cannot check the same char or string to all the incoming records. Those records can be handled by Regular Expressions.

REG_MATCH

This function will simply return TRUE if the data matches the pattern else FALSE.

Syntax

REG_MATCH (subject, pattern)

Scenario-1:

One of the client requirements is to filter out the records based upon the particular pattern

‘MM-YYYY’ from source files. Since the source files are having the manually entered data, the records are not consistent. If a normal scenario like, Sal >10000 or empid between 1000 and 5000 means, we can use the conditions directly in the filter transformation and filter out the records. But when the source records are dynamic and to use some specific pattern to filter out the records, we don’t have many options here. For the above requirement we have used REG_MATCH function.

REG_MATCH (Value,‘(\w+\d\d-\d\d\d\d)’)

\w+ à is any value that follows an alphanumeric character.

\d à matches one instance of any digit from 0-9.

The records whichever is not meeting the above patternwill be filtered out.Please find the below screenshots, for the source and target records details.

Source

Sample Date

Target

Sample Date Target

REG_REPLACE

It replaces characters in a string with another character pattern. By default, REG_REPLACE searches the input string for the character pattern you specify and replaces all occurrences with the replacement pattern. You can also indicate the number of occurrences of the pattern you want to replace in the string.

Syntax : REG_REPLACE (subject, pattern, replace, numReplacements)

Scenario-2:

One of the client requirements is to normalize the data by replacing the incoming data from inconsistent date format to normalized date format based upon the first occurrence. If the requirement is to replace a particular char or string, generally we will use Repalcechr or Replacestr functions. But, Repalcechr or Replacestr functions will replace all the occurrences of the particular char or string. Since the requirement is to replace a particular pattern with the first occurrence, we have used REG_REPLACE function to address the above requirement. o-by normal’>Scenario-2:
One of the client requirements is to normalize the data by replacing the incoming data from inconsistent date format to normalized date format based upon the first occurrence.
If the requirement is to replace a particular char or string, generally we will use Repalcechr or Replacestr functions. o-by normal’>Scenario-2:
One of the client requirements is to normalize the data by replacing the incoming data from inconsistent date format to normalized date format based upon the first occurrence.
If the requirement is to replace a particular char or string, generally we will use Repalcechr or Replacestr functions. But, Repalcechr or Replacestr functions will replace all the occurrences of the particular char or string. Since the requirement is to replace a particular pattern with the first occurrence, we have used REG_REPLACE function to address the above requirement.
REG_REPLACE (Value, ‘/’,’-’, 1)
1        à Specifies the number of occurrences you want to replace. If you omit this option,
REG_REPLACE will replace all occurrences of the character string.

Source

Sample Source

Target

Sample Target

Here the first occurrence of ‘/’ only has been replaced by ‘-‘.

The remaining occurrences were not replaced with ‘-‘. The similar kind of implementation is not possible through Repalcechr or Replacestr functions.

REG_EXTRACT Extracts sub patterns of a regular expression within an input value.

Syntax

REG_EXTRACT (subject, ‘pattern’, subPatternNum, match_from_start)

Scenario-3:

One of the client requirements is to extract the date in each record and to store the value in the target.

Since the requirement is to extract the particular string based upon the pattern ‘MM-YYYY’,

we can’t use the substr () function here. Because, we can’t determine the exact start and end positions of the substring. So, we have used REG_EXTRACT function to address the requirement. Both subPatternNum, match_from_start mentioned in the syntax are optional. REG_EXTRACT (Value,’ (\w+) (\d\d-\d\d\d\d)’, 2)

\w+ à  is any value that follows an alphanumeric character.

\d à  matches one instance of any digit from 0-9.

2 à Sub pattern number of the regular expression you want to match. Use the following

Guidelines to determine the subpattern number:

No value or 1. Extracts the first regular expression subpattern.
2. Extracts the second regular expression subpattern.
n. Extracts the nth regular expression subpattern.

 

Default is 1.

Source

Pattern Matching Source

Target

Pattern Matching Target

About the Author

Karthik Raja R

Karthik Raja R

Read more Read more image

Related Blogs

Every outcome starts with a conversation

Ready to Pursue Opportunity?

Connect Now

right arrow

ready_to_pursue
Ready to Pursue Opportunity?

Every outcome starts with a conversation