awk Index to get position matches pattern

Input data as below (filetest.txt):

1|22 JAN Minimum Bal 20.00 | SAT
2|09 FEB Extract bal 168.00BR | REM
3|MIN BAL | LEX

Output should be:
( If there is Date & Month in 2nd field of Input file, It should be seperated else blank. If There is Decimal OR Decimal & Currency in last of the 2nd field of Input file, It should be seperated Else blank )

1|22 JAN | Minimum Bal | 20.00 | SAT
2|09 FEB | Extract bal | 168.00BR | REM
3||MIN BAL || LEX

My Code below. i feel index function wrong, Any help plz..

awk '{if ($2 ~ /^[0-9][0-9] [A-Z][A-Z][A-Z]/) {print $1"|"substr($2, 1, 7)"|"substr($2, 7, 40)} else  print $1"|" "|" $2  }' FS="|" filetest.txt | awk '{if ($3 ~ /[0-9][0-9].*?$/) {s=index($3,/[0-9][0-9].*?$/); print $1"|"$2"|"substr($3, 1, s-1)"|"substr($3, s+1, 20)} else  print $1"|"$2"|"$3"|"  }' FS="|"

Do you really want to truncate the contents of the newly created 3rd field if input field 2 contains more than 47 characters?

Why is the 2nd awk script in your pipeline looking for the currency values in field 3 when the currency values, if present, will be in field 4 by then?

Please explain exactly what the format is for the "decimal" and "decimal & Currency" strings. Will that field always be empty if a monetary amount is not present? Do there really have to be at least two digits in a monetary amount? Should a decimal point be required in a monetary amount? If a decimal point is present in a monetary amount, should there be a required number of decimal digits after that decimal point? Are there minimum and maximum numbers of uppercase letters in a currency code?

Do you really want to truncate the contents of the newly created 3rd field if input field 2 contains more than 47 characters?

  • i dont want to truncate. i didn't get logic so i mentioned values explicitely.

Please explain exactly what the format is for the "decimal" and "decimal & Currency" strings. Will that field always be empty if a monetary amount is not present?

  • "decimal" have two decimal points. "decimal & Currency" have two decimal points and two uppercase alphabets.

Would

awk -F\| '
        {sub (/^[0-9][0-9] [A-Z][A-Z][A-Z]/, "&|", $2)
         sub (/[0-9.]*[0-9][A-Z]* *$/, "|&", $2)
        }
1
' OFS="|" file
1|22 JAN| Minimum Bal |20.00 | SAT
2|09 FEB| Extract bal |168.00BR | REM
3|MIN BAL | LEX

come close to what you need?

Your sample data also has a leading <space> and a trailing <space> around the monetary value. Are they supposed to be there or not?

records of kind 3rd row should be below: (Blank if No date & Month; Blank if no decimal at the end of 2nd field of input file)

3||MIN BAL|| LEX

---------- Post updated at 06:49 AM ---------- Previous update was at 05:57 AM ----------

Hi Don, Leading and trailing spaces are not necessary..

How about

awk -F\| '
        {if (!sub (/^[0-9][0-9] [A-Z][A-Z][A-Z]/, "&|", $2)) $2 = FS $2
         if (!sub (/[0-9.]*[0-9][A-Z]* *$/, "|&", $2)) $2 = $2 FS
        }
1
' OFS="|" file
1 Like

Hi RudiC,
When JSKOBS said ""decimal" have two decimal points", I think the requirement was a single radix character and two decimal digits following it. There was also a requirement for zero or two uppercase letters for the string following the monetary amount.

The following also will reject more false positives on dates and monetary strings and gets rid of leading and trailing spaces in all fields.

awk '
BEGIN {	FS = OFS = "|"
}
{	if(!sub(/(^| )[0-9]*[.][0-9]{2}([A-Z]{2})? *$/, FS "&", $2))
		$2 = $2 FS
	if(!sub(/^ *[0-3][0-9] (JAN|FEB|MAR|APR|MAY|JU[NL]|AUG|SEP|OCT|NOV|DEC)( |$)/, "&" FS, $2))
		$2 = FS $2
	gsub(/(^ *| *$)/, "")
	gsub(/ *\| */, FS)
}
1
' file

which, with the sample input given in post #1 in this thread in a file named file , produces the output:

1|22 JAN|Minimum Bal|20.00|SAT
2|09 FEB|Extract bal|168.00BR|REM
3||MIN BAL||LEX
1 Like