How to insert an array element within regex?

Ophiuchus · September 9, 2013, 10:18pm

Hello to all,

I'm trying to separate the string "str" using a regex within match function.

The substrings that I want to separate, begin with 22, 23, 24 or 25 and followed by 12 or 14 characters. And I want to
replace 22 with MJS, 23 with UYT, 24 with WER and 25 with PIL.

For this string "str", the substrings I want to match are:

22030046075451230200460754516925090046075451

The code I've got so far it works if I write one regex for each substring like this:

if ( match(str, /22(.{10})(.{2})/, t) )
or
if ( match(str, /23(.{10})(.{2})/, t) )
or
if ( match(str, /24(.{10})(.{2})(.{2})/, t) )
or
if ( match(str, /25(.{10})(.{2})/, t) )

But I'm trying to include a "for loop" for each element of array "m[i]" but I don't know how to do it or a better way to do it.

awk 'BEGIN {
str=22030046075451230200460754516925090046075451
p[1]="MJS"
p[2]="UYT"
p[3]="WER" 
p[4]="PIL"

m[1]="22"
m[2]="23"
m[3]="24"
m[4]="25"

for(i=1;i<=4;i++)
if ( match(str, /m(.{10})(.{2})(.{2})?/, t) )
	printf("%s,%s,%s ",p,t[1],t[2],t[3]);
}'

the desired output is:

MJS,0300460754,51 UYT,0200460754,51,69 PIL,0900460754,51

PS: I want to do it in awk, because the string "str" is get inside a major awk script.

Thanks in advance for any help.

pravin27 · September 10, 2013, 9:39am

Try this,

echo "22030046075451230200460754516925090046075451" | awk -F "2[2345]" 'BEGIN{p[22]="MJS,"
p[23]="UYT,"
p[24]="WER,"
p[25]="PIL,"
}
{a=substr($0,1,2); printf p[a];for (i=2;i<=NF;i++) {
len=length($i);lntotal+=len+2; if (len >= 12 ) { printf substr($i,1,10)","substr($i,11,2);} if (len == 14 ){ printf ","substr($i,13);} b=substr($0,lntotal+1,2);if(p) { printf ", "p};
} printf "\n";
}'

drl · September 10, 2013, 9:40am

Hi.

Create a variable with the regular expression characters, then use that variable in the match function.

Best wishes ... cheers, drl

Ophiuchus · September 10, 2013, 4:49pm

Hello pravin27,

Thank you for the help, the issue is that the main awk program has another field separator
and I couldn't include this new FS without affect the rest.

Hello drl,

I've tried put all characters of the regex in a variable but doesn't work, only works when
I write literally the regex within match function.

Thanks for any help.

drl · September 10, 2013, 5:24pm

Hi.

Here's an example:

#!/usr/bin/env bash

# @(#) s2	Demonstrate awk built-in function match(target,re-string).

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C awk

FILE=${1-data2}

pl " Input data file $FILE:"
cat $FILE

pl " Results:"
awk --posix '
	BEGIN { re = "ab{2}" }
	{ if ( match($0,re) ) { print " Matched",re,"in",$0; next} }
	{ print " NO match for",re,"in",$0}
' $FILE

exit 0

producing:

$ ./s2

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian GNU/Linux 5.0.8 (lenny) 
bash GNU bash 3.2.39
awk GNU Awk 3.1.5

-----
 Input data file data2:
a
ab
abb
aab
aabbbb

-----
 Results:
 NO match for ab{2} in a
 NO match for ab{2} in ab
 Matched ab{2} in abb
 NO match for ab{2} in aab
 Matched ab{2} in aabbbb

Note that for some special characters in REs, one must use --posix with gawk.

Best wishes ... cheers, drl