There are so many String manipulation tasks we do in our data warehouse environment. Some of them are easy and few of them are complex.
With the recent introduction of REGULAR EXPRESSION in Teradata, many complex tasks have now become a one-line code.
Below are few cases in which using REGULAR EXPRESSION can be useful:

1) Check if string is made of only 1 character: There may be a case when source is sending you some data with some filler values. If the complete String value consists of same character repeated throughout the string then it is a filler. Now traditionally you may do it by writing a stored procedure which may traverse through string and identify if it consists of different characters or just one. With REGULAR EXPRESSION you can do the check very easily. Let us look at below example:

SELECT col1,CASE WHEN col2=” THEN ‘Same’ ELSE ‘Different’ END AS Messg
FROM (
SELECT
‘11211’ AS COL1,
CASE WHEN REGEXP_REPLACE(COL1,SUBSTR(COL1,1,1)||'{1,100}’,”,1,0,’i’) =” THEN ” ELSE COL1 END AS COL2
)tb1;

The logic used here is we are selecting the first character and replacing it with blank in the string. If the complete string is blank then it is a filler value.

2) We often get the requirement that in a string remove all spaces or remove all blanks present in the string. TRIM function helps in removing spaces which are either leading or trailing. However to remove spaces present with in the string, we can use below regular expression:

SELECT
‘1 2 3 4 5 6′ AS COL1,
REGEXP_REPLACE(COL1,'[[:blank:]]+’,”,1,0,’i’) AS COL2;

The logic used here is we are identifying the space/blank and replacing it with nothing. So all the spaces are gone.

3) We may get the requirement that a string consists of multiple words. However there are uneven number of spaces separating each word. Now the requirement is we want to have only single space between words inside string. For this we can use below regular expression:

SELECT
‘1 2 3 4 5 6′ AS COL1,
REGEXP_REPLACE(COL1,'[[:blank:]]{1,100}’,’ ‘,1,0,’i’) AS COL2;

The logic we have used here is we are identifying blanks/spaces which may extend repeatedly upto 100 characters and replace it by single space.

4) A string may contain alphabets, numeric or symbols. We can use regular expression to keep only alphabets in String and remove rest.

SELECT
'ABC123DEF!@#' AS COL1,
REGEXP_REPLACE(COL1,'[^a-zA-Z]','',1,0,'i') AS COL2;

The logic used here is we are identifying anything but alphabets and replacing it with nothing.

5) A string may contain alphabets, numeric or symbols. We can use regular expression to keep only numeric part in String and remove rest.

SELECT
‘ABC123DEF!@#’ AS COL1,
REGEXP_REPLACE(COL1,'[^0-9]+’,”,1,0,’i’) AS COL2;

The logic used here is we are identifying anything but numbers and replacing it with nothing.

6) A string may contain alphabets, numeric or symbols. We can use regular expression to keep only symbols in String and remove rest.

SELECT
‘ABC123DEF!@#’ AS COL1,
REGEXP_REPLACE(COL1,'[a-zA-Z0-9]+’,”,1,0,’i’) AS COL2;

The logic used here is we are identifying alphabets and numbers and replacing it with nothing.

We will keep updating this post with more regular expression examples in teradata. However if you have any String manipulation requirement, feel free to leave a comment here and we will try to help you.

*Check for apostrophe as it might give some error while copying and running the query.

8 Comments

  1. hi,
    do we have something to remove only junk character from varchar column in Teradata
    ‘Tim,JOHN
    ’tim,JOHN

    1. Author

      Hi Sk
      You have to define junk characters. Is it symbols or whitespaces or both or something else ?
      Alternate way could be that you can keep alpha numeric and remove everything else.

      Thanks
      Nitin

  2. Hi

    What is use of + in below query?
    SELECT
    ‘ABC123DEF!@#’ AS COL1,
    REGEXP_REPLACE(COL1,'[A-ZA-Z0-9]+’,”,1,0,’I’) AS COL2;

    1. Author

      Hi
      ‘+’ signifies at least one occurrence of the elements.

      Cheers
      Raj

  3. Hi,

    If I want to identify the occurence of a curly braces, and pick up all the characters inside them, how can we achieve it.
    e.g Data contains {Name = X}{Age = Y}{Gender = M}, I should get output as
    Name = X, Age = Y, Gender = M. All of them in separate columns each.

    Can you help please

  4. In the above function why [A-ZA-Z]mentioned twice..is it for uppercase and lowercase?

    1. Author

      Hi Jayanthi
      Thanks for pointing it out It is actually A-Za-z however the CSS of webpage converted everything to UPPERCASE.
      I have fixed that.
      Thanks again
      Raj

  5. Hi how to handle null in regexp_replace

Leave a Reply

Your email address will not be published. Required fields are marked *