sed multiline substitution if a condition holds

plsrn · October 14, 2011, 11:14am

Hi.

I have the following text file (more precisely a Matlab script):

dummy1 = 5; varx = 'false'; dummy2 = 6;
% ... some general commands
% ...
dummy3 = 7; vary = 'option1'; dummy4 = 8;

Using sed I want to do the following action:

IF varx = 'false' vary='NEW_VALUE_1'
ELSEIF varx='true' vary='NEW_VALUE_2'
ELSE nothing to do
END

varx/y might not be at the beginning/end of the line, as in the example above, and a generic number of spaces is before/after the '=' sign. An unknown number of lines is between the lines containing varx/y (see ...some general commands). Looking at the example above, sed should return the following:

dummy1 = 5; varx = 'false'; dummy2 = 6;
% ... some general commands
% ...
dummy3 = 7; vary = 'NEW_VALUE_1'; dummy4 = 8;

Thanks
Paolo

CarloM · October 14, 2011, 11:35am

I'm not sure I understand your conditions. Is the ELSE ever going to be executed except at the start of the file (i.e. before we encounter the first varx assignment)?

Put another way, what output would you expect from:

dummy3 = 7; vary = 'option1'; dummy4 = 8;
dummy1 = 5; varx = 'false'; dummy2 = 6;
dummy3 = 7; vary = 'option1'; dummy4 = 8;
dummy3 = 7; vary = 'option1'; dummy4 = 8;
dummy1 = 5; varx = 'true'; dummy2 = 6;
dummy3 = 7; vary = 'option1'; dummy4 = 8;

(with whatever in-between the varx/vary lines)

EDIT: A possible (and rather messy :o) awk solution:

#  cat xx.awk
BEGIN {
        FS=";";
        xval=-1;
        if (NEWVAL1=="") NEWVAL1="'optionA'";
        if (NEWVAL2=="") NEWVAL2="'optionB'";
}
$2 ~ /varx *= *'false'/ {
        xval=0;
        print;
}
$2 ~ /varx *= *'true'/ {
        xval=1;
        print;
}
$2 ~ /vary *= */ {
        if (xval==1) {
                printf ("%s; vary = %s", $1, NEWVAL2);
                x=3;
        }
        else if (xval==0) {
                printf ("%s; vary = %s", $1, NEWVAL1);
                x=3;
        }
        else {
                printf ("%s", $1);
                x=2;
        }
        for (i=x;i<=NF;i++) {
                printf ("; %s", $i);
        }
        printf ("\n");
}
$2 !~ /var[xy] *= */ {
        print;
}

# cat xx.txt
dummy3 = 7; vary= 'option1'; dummy4 = 8;
% whatever
dummy1 = 5; varx = 'false'; dummy2 = 6;
% whatever
% whatever
dummy3 = 7; vary =  'option1'; dummy4 = 8;
dummy3 = 7; vary='option1'; dummy4 = 8;
% whatever
dummy1 = 5; varx  = 'true'; dummy2 = 6;
% whatever
% whatever
dummy3 = 7; vary ='option1'; dummy4 = 8;
% whatever

# awk -f xx.awk xx.txt
dummy3 = 7;  vary= 'option1';  dummy4 = 8;
% whatever
dummy1 = 5; varx = 'false'; dummy2 = 6;
% whatever
% whatever
dummy3 = 7; vary = 'optionA';  dummy4 = 8;
dummy3 = 7; vary = 'optionA';  dummy4 = 8;
% whatever
dummy1 = 5; varx  = 'true'; dummy2 = 6;
% whatever
% whatever
dummy3 = 7; vary = 'optionB';  dummy4 = 8;
% whatever

binlib · October 14, 2011, 1:04pm

If you insists on using sed:

sed "/varx =/h
/vary = /{
G
s/\(.*vary = '\)[^']*\('.*\)\n.*varx = 'false'.*/\1NEW_VAL_1\2/
t # optional
s/\(.*vary = '\)[^']*\('.*\)\n.*varx = 'true'.*/\1NEW_VAL_2\2/
}" file

plsrn · October 16, 2011, 3:23pm

Thanks, your solution answered my question. However, you point out another problem that may appear,i.e., what happens if vary precedes varx. To better explain this point, let me express the problem in its original form, hence slightly different from the simplified one of the previous post.
I'm upgrading my personal software and for compatibility reasons I have to change all my old scripts. Striclty speaking I have the following code:

Please, note that var.applyfield1 is a struct variable.

IF var.applyfield1 = true, at the output I want:

therefore both variable assignments regarding var.applyfield1 and var.field1type must be removed and substituted (before or after the "%... some code lines", it doesn't matter) with the new variable var.FIELD1.APPLY

ELSE, IF var.applyfield1 = false, I want:

same as before but with a different value for var.FIELD1.APPLY (empty string).

The new problem that I was mentioning at the beginning of this reply is that awk or sed must operate as well even if var.applyfield1 and var.field1type are exchanged. For instance, with;

I want the same kind of substitution, namely:

The assignments to var.field1type and var.applyfield1 can be safely assumed to appear once in the script.

If possible, I would prefer a solution based on sed because such replacement is just one of many, which can be easily handled with the option -e.

Thanks again,
Paolo

CarloM · October 17, 2011, 5:43am

Will all substitutions involve fieldNtype and applyfieldN? (and do all occurances of those need to be replaced?)

Also, does it matter if you get:

% ...some code lines
var.FIELD1.APPLY = 'type1';

(i.e. with the new statement replacing the last of an apply/type pair) rather than the other way around?

plsrn · October 17, 2011, 5:55am

Yes. Please note that field1type and applyfield1 are just generic names. You can assume that all occurances have to be replaced; if you find easier to substitute just the first occurance, for me is good as well (because each occurance appears once in my scripts).

CarloM · October 17, 2011, 6:10am

Obviously, but what I'm trying to determine is if there's a common pattern to the names that we can use to perform the substitution. Do they all have the form varname.somethingtype / samevarname.applysamesomething (with bold parts as fixed literals), or anything similar?

plsrn · October 17, 2011, 7:07am

varname and samevarname are for sure the same. Concerning somethingtype and applysomething in principle they could be different. For instance I have exactly these names:

that I'd like to transform in:

Concerning your "(i.e. with the new statement replacing the last of an apply/type pair) rather than the the way around?", the answer is yes, you can replace the last assignment.

CarloM · October 17, 2011, 12:34pm

Well, I spent a couple of hours playing in sed and couldn't find a way - but then I'm not very good with sed!

I did manage to create a solution of sorts with awk.

It uses a csv containing the variable name, 'apply' fieldname, 'type' fieldname, and the replacement field name to use. E.g.:

#  cat xf.txt
var,applypol,polmethod,pol.apply
var2,fldapply,fldtype,fld.apply
var4,apply,type,apply

Bash script - creates an awk (actually gawk) program using the csv data, and then calls it:

cat xx.sh
#!/bin/sh

AWKFILE=`basename $0`_$$.awk

[[ -f $AWFILE ]] && rm $AWKFILE

while IFS=',' read VN AN TN RN
do
        cat >> $AWKFILE <<EOFILE
/$VN\.$TN *= *'[^']*'/ {
   match (\$0, "(.*)$VN.$TN *= *'([^']*)';(.*)",statements);

   ${VN}_type=statements[2];

   if ( ${VN}_apply != "" ) {
      if ( ${VN}_apply == "true" ) {
         printf ("%s$VN.$RN = '%s'; %s\n", statements[1], ${VN}_type, statements[3]);
      }
      else {
         printf ("%s$VN.$RN = \"; %s\n", statements[1], statements[3]);
      }
   }
   else {
      printf ("%s%s\n", statements[1], statements[3]);
   }
}
/$VN\.$AN *= *'[^']*'/ {
   match (\$0, "(.*)$VN.$AN *= *'([^']*)';(.*)",statements);

   ${VN}_apply=statements[2];

   if ( ${VN}_type != "" ) {
      if ( ${VN}_apply == "true") {
         printf ("%s$VN.$RN = '%s'; %s\n", statements[1], ${VN}_type, statements[3]);
      }
      else {
         printf ("%s$VN.$RN = \"; %s\n", statements[1], statements[3]);
      }
   }
   else {
      printf ("%s%s\n", statements[1], statements[3]);
   }
}
EOFILE

        if [ -z "$exclstr" ]
        then
                exclstr="$VN.$AN|$VN.$TN"
        else
                exclstr=$exclstr"|$VN.$AN|$VN.$TN"
        fi

done < $2

cat >> $AWKFILE <<EOFILE
\$0 !~ /($exclstr) *= *'[^']*'/ {
   print;
}
EOFILE

gawk -f $AWKFILE $1

rm $AWKFILE

Which seems to work:

# cat xx.txt
% whatever 1
dummy3 = 7; var.polmethod= 'optionA'; dummy4 = 8;
% whatever 2
dummy1 = 5; var.applypol = 'false'; dummy2 = 6;
dummy1 = 5; var2.fldapply = 'false'; dummy2 = 6;
% whatever 3
dummy3 = 7; var2.fldtype= 'option1'; dummy4 = 8;
% whatever 4
dummy3 = 7; var3.fldtype =  'optionB'; dummy4 = 8;
% whatever 5
dummy1 = 5; var3.fldapply  = 'true'; dummy2 = 6;
% whatever 6
% whatever 7
% whatever 8
dummy1 = 5; var4.apply = 'true'; dummy2 = 6;
% whatever 9
% whatever 10
% whatever 11
% whatever 12
dummy3 = 7; var4.type ='option2'; dummy4 = 8;
% whatever 13

# ./xx.sh xx.txt xf.txt
% whatever 1
dummy3 = 7;  dummy4 = 8;
% whatever 2
dummy1 = 5; var.pol.apply = ";  dummy2 = 6;
dummy1 = 5;  dummy2 = 6;
% whatever 3
dummy3 = 7; var2.fld.apply = ";  dummy4 = 8;
% whatever 4
dummy3 = 7; var3.fldtype =  'optionB'; dummy4 = 8;
% whatever 5
dummy1 = 5; var3.fldapply  = 'true'; dummy2 = 6;
% whatever 6
% whatever 7
% whatever 8
dummy1 = 5;  dummy2 = 6;
% whatever 9
% whatever 10
% whatever 11
% whatever 12
dummy3 = 7; var4.apply = 'option2';  dummy4 = 8;
% whatever 13

I'm sure there must be an easier way though (and I suspect enough variables to replace will cause something in there to fail horribly).

plsrn · October 18, 2011, 2:45am

It works great!!! Only three minor problems:

1) I receive the following error message

./xx.sh: 5: [[: not found

2) 'false' and 'true' are actually false and true, without apices. For instance:
var.applypol = false;

3) Output like:
var.pol.apply = ";
should be:

var.pol.apply = '';
with two single apices ' in place of one quote ".

CarloM · October 19, 2011, 11:20am

Don't have access to a box right now to test it, but these changes should work:

Change the hashbang to #!/bin/bash (on my system /bin/sh is a link to bash). Or just change the [[ to a full if.
Change

/$VN\.$AN *= *'[^']*'/ {
   match (\$0, "(.*)$VN.$AN *= *'([^']*)';(.*)",statements);

to

/$VN\.$AN *= *[^ ]*/ {
   match (\$0, "(.*)$VN.$AN *= *([^ ]*) *;(.*)",statements);

Change

printf ("%s$VN.$RN = \"; %s\n", statements[1], statements[3]);

to

printf ("%s$VN.$RN = ''; %s\n", statements[1], statements[3]);

(two places)

plsrn · October 20, 2011, 7:45am

Ok, now it works. I slighlty modified your final suggestion using the following code. Thanks a lot for your (really impressive) effort in helping me!

#!/bin/sh

AWKFILE=`basename $0`_$$.awk

[[ -f $AWFILE ]] && rm $AWKFILE

while IFS=',' read VN AN TN RN
do
cat >> $AWKFILE <<EOFILE
/$VN\.$TN = *'[^']'/ {
match (\$0, "(.*)$VN.$TN = *'([^']*)';(.)",statements);

${VN}_type=statements[2];

if ( ${VN}_apply != "" ) {
if ( ${VN}_apply == "true" ) {
printf ("%s$VN.$RN = '%s'; %s\n", statements[1], ${VN}_type, statements[3]);
}
else {
printf ("%s$VN.$RN = ''; %s\n", statements[1], statements[3]);
}
}
else {
printf ("%s%s\n", statements[1], statements[3]);
}
}
/$VN\.$AN = *[^;]/ {
match (\$0, "(.)$VN.$AN = *([^;]) *;(.)",statements);

${VN}_apply=statements[2];

if ( ${VN}_type != "" ) {
if ( ${VN}_apply == "true") {
printf ("%s$VN.$RN = '%s'; %s\n", statements[1], ${VN}_type, statements[3]);
}
else {
printf ("%s$VN.$RN = ''; %s\n", statements[1], statements[3]);
}
}
else {
printf ("%s%s\n", statements[1], statements[3]);
}
}
EOFILE

if [ -z "$exclstr" ]
then
exclstr="$VN.$AN|$VN.$TN"
else
exclstr=$exclstr"|$VN.$AN|$VN.$TN"
fi

done < $2

cat >> $AWKFILE <<EOFILE
\$0 !~ /($exclstr) = *[^;]/ {
print;
}
EOFILE

gawk -f $AWKFILE $1

rm $AWKFILE