Hello dear Unix shell professionals,
I am desperately trying to get a seemingly simple logic to work. I need to extract words from a text line and save them in an array. The text can look anything like that:
So it prints the first match correctly, however it ignores all the remaining matches. Please anyone help me with this, I am stuck here for 2 days now :(. If it works with "awk", it should be fine too, but I can't figure out the syntax. Beware that I use a old shell.
line='aaaaaaa${important}xxxxxxxx${important2}ooooooo${importantstring3}'
IFS=\$ read -a _a <<< "$line"
_regex='(\{[^}]+})'
for _e in "${_a[@]}"; do
[[ $_e =~ $_regex ]] &&
_n+=( "\$${BASH_REMATCH[0]}" )
done
# your matches are in the _n array
For example:
$ line='aaaaaaa${important}xxxxxxxx${important2}ooooooo${importantstring3}'
_regex='(\{[^}]+})'
$ IFS=\$ read -a _a <<< "$line"
$ _regex='(\{[^}]+})'
$ for _e in "${_a[@]}"; do
> [[ $_e =~ $_regex ]] &&
> _n+=( "\$${BASH_REMATCH[0]}" )
> done
# your matches are in the _n array:
$ # your matches are in the _n array:
$ declare -p _n
declare -a _n='([0]="\${important}" [1]="\${important2}" [2]="\${importantstring3}")'
I had to convert parts of it to make it compatible to my old shell, as I got a syntax error but all in all it works perfectly! I even tried to trick it with random "$" or random braces "{", but it still only outputs the correct ones!
line='aaaa$}aaa${important}xxxxxxxx${important2}oo{o$}oo$oo${importantstring3}'
IFS=\$ read -a words <<< "$line"
regex='(\{[^}]+})'
for e in "${words[@]}"; do
if [[ $e =~ $regex ]]; then
echo "\$${BASH_REMATCH[0]}";
fi;
done
Thanks again, you made a very happy user
---------- Post updated at 08:25 AM ---------- Previous update was at 07:05 AM ----------
Though I am satisfied with the solution, as I assume it will not produce errors, I have found something where I could trick it. If I use this line:
line='aaaa$aa{yyy}aaaaaa${important}xxxx
It will print ${yyy} as matching. That is because it only uses the "$" as separator and indirectly allows random characters to follow afterwards. I still wonder if there isn't any regex which will cover this (sorry, I am not the best at expressions and think in pseudo code, but somehow it bugs me):
First one would need to determine that these 2 characters must always come first:
[\$][\{]
Then comes a term where everything is allowed, except these:
[everything allowed except \$,\{]
The previous term is read until the closing bracket comes:
[\}].
This is my naive thinking, but it seems the thought process is easier than the actual implementation.
Damn, thanks again!
This works perfectly, although in this case I initially wasn't sure why it worked. But now I realize: you use the first as anchor character "^" to define, that at the beginning of the line the following expression in '(...)' must follow. I was confused initially as the grymoire docs described the anchor to be used "on the beginning of a line" - and then I wasn't sure where the "line" was in this case. Was it the original "$line" or the splitted parts of the line? Obviously in this case every splitted part is its own "line". Thats why it works. Eventually I understood
Regarding Perl: yeah, there was the choice between perl or bash scripts and then the thought came "use something which is always available and more down-to-earth" - and the decision fell to default shell scripts.
While it is an interesting learning experience I have previously used some perl and it was way more comfortable. I am not sure the pure shellscripting decision was right after all, especially seeing that perl is installed on most unix machines anyways...sigh, but what can you do.
Correct, perhaps "the beginning of the string" would be more appropriate.
That's OK, actually. I almost always use only pure shell scripting too, but Perl makes the string manipulation really, really easy.
Moreover, Perl is often available even where bash is not (an old HP-UX springs to mind :)).