Extracting substring within string between 2 token within the string

jcdole · January 3, 2020, 1:06pm

Hello.

First best wishes for everybody.

here is the input file ("$INPUT1") contents :

BASH_FUNC_message_begin_script%%=() {  local -a L_ARRAY;
BASH_FUNC_message_debug%%=() {  local -a L_ARRAY;
BASH_FUNC_message_end_script%%=() {  local -a L_ARRAY;
BASH_FUNC_message_error%%=() {  local -a L_ARRAY;

This simple sed command works well for some kind of tokens.
These tokens work well

TOKEN1="^BASH_FUNC_"
TOKEN2="\%\%\=\(\)"
#
# # between but excluding TOKEN1 and TOKEN2
#
sed -e 's/'$TOKEN1'\(.*\)'$TOKEN2'.*/\1/'  "$INPUT1"

Which return :

message_begin_script
message_debug
message_end_script
message_error

Another working example

#
# # between but including TOKEN1 and excluding TOKEN2
#
sed -e 's/\('$TOKEN1'.*\)'$TOKEN2'.*/\1/'  "$INPUT1"

which return

BASH_FUNC_message_begin_script
BASH_FUNC_message_debug
BASH_FUNC_message_end_script
BASH_FUNC_message_error

Now I have another token2 of this kind : TOKEN2="local -a L_ARRAY"
And I got an error

TOKEN1="^BASH_FUNC_"
TOKEN2="local -a L_ARRAY"
#
# # between but excluding TOKEN1 and TOKEN2
#
sed -e 's/'$TOKEN1'\(.*\)'$TOKEN2'.*/\1/'  "$INPUT1"

sed -e 's/'^BASH_FUNC_'\(.*\)'local -a L_ARRAY'.*/\1/'  "/tmp/MY_INPUT1.txt" 
sed: -e expression #1, char 24: unterminated `s' command

I have tried to escape space and '-' without success
Any help is welcome

Scrutinizer · January 3, 2020, 1:21pm

Hi, try double quotes:

sed "s/${TOKEN1}\(.*\)${TOKEN2}.*/\1/" "$INPUT1"

The curly braces for variable expansions are good practice within strings, for reasons of readability and also to prevent variable expansion errors.

--
What happens with the single quote construct is that the variable expansions are unprotected, and therefore the shell splits variable TOKEN2 into three fields local -a and L_ARRAY . These three fields are combined with the other characters passed as parameters to the sed command, effectively like so:

sed -e 's/^BASH_FUNC_\(.*\)local' -a 'L_ARRAY.*/\1/'    # wrong

This is why sed complains of an unterminated s-command.

When you use double quotes this will not happen since the variables are expanded, but not field split by the shell, since they are protected by double quotes. Double quotes also allow for simpler and easier to read code..

MadeInGermany · January 3, 2020, 1:43pm

Or

sed 's/'"$TOKEN1"'\(.*\)'"$TOKEN2"'.*/\1/'  "$INPUT1"

The point is, each $var must be within "quotes", so the shell does not do word splitting and filename generation.
Then it must be

TOKEN2="%%=()"

The ( ) is not special in a BRE, but  is.
The -e (code argument follows) is allowed but not needed (because at least one code argument is required).

jcdole · January 3, 2020, 2:14pm

scrutinizer:

Hi, try double quotes:
sed "s/${TOKEN1}$.*$${TOKEN2}.*/\1/" "$INPUT1"
The curly braces for variable expansions are good practice within strings, for reasons of readability and also to prevent variable expansion errors.

--
What happens with the single quote construct is that the variable expansions are unprotected, and therefore the shell splits variable TOKEN2 into three fields local -a and L_ARRAY . These three fields are combined with the other characters passed as parameters to the sed command, effectively like so:
sed -e 's/^BASH_FUNC_$.*$local' -a 'L_ARRAY.*/\1/'    # wrong
This is why sed complains of an unterminated s-command.

When you use double quotes this will not happen since the variables are expanded, but not field split by the shell, since they are protected by double quotes. Double quotes also allow for simpler and easier to read code..

Thank you very much

--- Post updated at 21:14 ---

madeingermany:

Or
sed 's/'"$TOKEN1"'$.*$'"$TOKEN2"'.*/\1/'  "$INPUT1"
The point is, each $var must be within "quotes", so the shell does not do word splitting and filename generation.
Then it must be
TOKEN2="%%=()"
The ( ) is not special in a BRE, but  is.
The -e (code argument follows) is allowed but not needed (because at least one code argument is required).

Thank you