Find all matching words in text according to pattern

Hello dear Unix shell professionals,
I am desperately trying to get a seemingly simple logic to work. I need to extract words from a text line and save them in an array. The text can look anything like that:

aaaaaaa${important}xxxxxxxx${important2}ooooooo${importantstring3}...

I am handicapped though in different regards:

  • Can't use perl
  • Stuck on a ancient GNU bash, version 3.00.16(1)-release (powerpc-ibm-aix5.1)
  • grep -o is not installed

My attempt was this:

line="aaaaaaa${important}xxxxxxxx${important2}ooooooo${importantstring3}";
if [[ $line =~ '(\${[^{]*})' ]]; 
    then
        echo "matching[1]: ${BASH_REMATCH[1]}";
        echo "matching[2]: ${BASH_REMATCH[2]}";
        echo "matching[3]: ${BASH_REMATCH[3]}";
    fi;

Output:

matching[1]: ${important}
matching[2]:
matching[3]:

So it prints the first match correctly, however it ignores all the remaining matches. Please anyone help me with this, I am stuck here for 2 days now :(. If it works with "awk", it should be fine too, but I can't figure out the syntax. Beware that I use a old shell.

Try this:

line='aaaaaaa${important}xxxxxxxx${important2}ooooooo${importantstring3}'
IFS=\$ read -a _a <<< "$line" 
_regex='(\{[^}]+})'
for _e in "${_a[@]}"; do
  [[ $_e =~ $_regex ]] &&
    _n+=( "\$${BASH_REMATCH[0]}" )
done
# your matches are in the _n array

For example:

$ line='aaaaaaa${important}xxxxxxxx${important2}ooooooo${importantstring3}'
_regex='(\{[^}]+})'
$ IFS=\$ read -a _a <<< "$line"
$ _regex='(\{[^}]+})'
$ for _e in "${_a[@]}"; do
>   [[ $_e =~ $_regex ]] &&
>     _n+=( "\$${BASH_REMATCH[0]}" )
> done
# your matches are in the _n array:
$ # your matches are in the _n array:
$ declare -p _n
declare -a _n='([0]="\${important}" [1]="\${important2}" [2]="\${importantstring3}")'
1 Like

Wow! Awesome solution! Many thanks!!!!!

I had to convert parts of it to make it compatible to my old shell, as I got a syntax error but all in all it works perfectly! I even tried to trick it with random "$" or random braces "{", but it still only outputs the correct ones!

line='aaaa$}aaa${important}xxxxxxxx${important2}oo{o$}oo$oo${importantstring3}'
IFS=\$ read -a words <<< "$line" 
regex='(\{[^}]+})'
for e in "${words[@]}"; do
    if [[ $e =~ $regex ]]; then    
        echo "\$${BASH_REMATCH[0]}";
    fi;
done

Thanks again, you made a very happy user :slight_smile:

---------- Post updated at 08:25 AM ---------- Previous update was at 07:05 AM ----------

Though I am satisfied with the solution, as I assume it will not produce errors, I have found something where I could trick it. If I use this line:

line='aaaa$aa{yyy}aaaaaa${important}xxxx

It will print ${yyy} as matching. That is because it only uses the "$" as separator and indirectly allows random characters to follow afterwards. I still wonder if there isn't any regex which will cover this (sorry, I am not the best at expressions and think in pseudo code, but somehow it bugs me):

First one would need to determine that these 2 characters must always come first:
[\$][\{]

Then comes a term where everything is allowed, except these:
[everything allowed except \$,\{]

The previous term is read until the closing bracket comes:
[\}].

This is my naive thinking, but it seems the thought process is easier than the actual implementation.

Something like this:

IFS=\$ read -a words <<< "$line" 
regex='^(\{[^}]+})'
for e in "${words[@]}"; do
    if [[ $e =~ $regex ]]; then    
        echo "\$${BASH_REMATCH[0]}";
    fi;
done

You said that you can't use Perl :slight_smile:

% perl -le'print join $/, shift =~ /\${.*?}/g' 'aaaa$}aaa${important}xxxxxxxx${important2}oo{o$}oo$oo${importantstring3}'
${important}
${important2}
${importantstring3}
% perl -le'print join $/, shift =~ /\${.*?}/g' 'aaaa$aa{yyy}aaaaaa${important}xxxx'
${important}
1 Like

Damn, thanks again!
This works perfectly, although in this case I initially wasn't sure why it worked. But now I realize: you use the first as anchor character "^" to define, that at the beginning of the line the following expression in '(...)' must follow. I was confused initially as the grymoire docs described the anchor to be used "on the beginning of a line" - and then I wasn't sure where the "line" was in this case. Was it the original "$line" or the splitted parts of the line? Obviously in this case every splitted part is its own "line". Thats why it works. Eventually I understood :b:

Regarding Perl: yeah, there was the choice between perl or bash scripts and then the thought came "use something which is always available and more down-to-earth" - and the decision fell to default shell scripts.

While it is an interesting learning experience I have previously used some perl and it was way more comfortable. I am not sure the pure shellscripting decision was right after all, especially seeing that perl is installed on most unix machines anyways...sigh, but what can you do.

Correct, perhaps "the beginning of the string" would be more appropriate.

That's OK, actually. I almost always use only pure shell scripting too, but Perl makes the string manipulation really, really easy.
Moreover, Perl is often available even where bash is not (an old HP-UX springs to mind :)).