Substitute one line of multiple files according to another file

I need to make ~96 configure files from a template config file which has hundreds of rows that looks like:

template.config:

[LIB]
#average insert size
avg_ins=1000
......
other information omitted

Those config files are named in sequence from S01.config, S02.config, ... etc
with different avg_ins values according to a table

table.file:

S01 600
S02 710
S03 520
S04 450
......

Only one row needs to be changed with different values of avg_ins . What I tried is first use echo to create a tmp file:

for i in S{01..96}; do echo "sed 's/avg_ins=1000/avg_ins=600/' template.config > ${i}.config"; done > temp.sh

then manually change the corresponding avg_ins values in the temp.sh file to have the final script like:

sed 's/avg_ins=1000/avg_ins=600/' template.config > S01.config
sed 's/avg_ins=1000/avg_ins=710/' template.config > S02.config
sed 's/avg_ins=1000/avg_ins=520/' template.config > S03.config
...

Then run the resulted script to have the 96 config files with corresponding avg_ins values.
What is the better way to handle this with awk by read the table.file into array and then substitute the template.config if possible? Thanks a lot!

I would not create a temp.sh file and just do...

for i in S{01..96}; do sed 's/avg_ins=1000/avg_ins=600/' template.config > ${i}.config ; done
1 Like

The problem is replacement avg_ins=600 need change according to the table.file.

How about

awk ' 
NR==FNR         {if (/avg_ins/) {ZW=NR
                                 $0="avg_ins="
                                }
                 TMP[NR]=$0
                 MAX=NR
                 next  
                }
                {SUP[ZW]=$2
                 for (i=1; i<=MAX; i++) print TMP SUP  > $1".config" 
                }
' template table 
S01.config:

[LIB]
#average insert size
avg_ins=600
......
other information omitted
S02.config:

[LIB]
#average insert size
avg_ins=710
......
other information omitted
S03.config:

[LIB]
#average insert size
avg_ins=520
......
other information omitted
S04.config:

[LIB]
#average insert size
avg_ins=450
......
other information omitted
1 Like

Thanks! That's what I was looking for.
Can I ask how ZW increment in this line: SUP[ZW]=$2 ? Or, it does not increment?!

No, it's not incremented, it's pointing always at the same line of the template file containing the avg_ins text. ZW is just a name.

Thanks Rudic!
There is handful information in your script, I got it now.

Depending on the version of awk being used and the number of config files being created, it might be safer to change the line:

                 for (i=1; i<=MAX; i++) print TMP SUP  > $1".config"

to:

                 for (i=1; i<=MAX; i++) print TMP SUP  > ($1".config")
                 close($1".config")
1 Like

Hi, Don!
I am using gawk 4.0.1. Could you please elaborate why it might be safer with your modification?
Thanks!

There's two modifications:

  • the string concatenation might not work correctly for redirection file names on all system unless parenthesized (worked for me...)
  • ALL awk s will run out of file descriptors, some sooner (about 10), others later. So closing them if no more needed puts you on the safe side in either case at the (small) cost of a file operation.
1 Like

Thanks a lot!

Expanding a little bit on what RudiC said: The standards do not specify the precedence of string concatenation and input or output redirection. So the statement:

print TMP SUP > $1".config"

can be evaluated as if it were written as:

print TMP SUP > ($1".config")

(as it is with gawk and many other versions of awk ), or as:

(print TMP SUP > $1)".config"

(as it is with awk on BSD systems, OS X systems, and many others). Explicitly adding the parentheses makes it clear which you want and makes that part of your code work with any version of awk in case you ever try to move your script to a different operating system.

And since I learned how to use awk on a system that only allowed ten open file descriptors (including the current input file and standard output), I always close file descriptors I no longer need unless I know (as part of the script specifications) that fewer than eight additional input and output files will be used during the lifetime of the script.

1 Like