Script using awk to find and replace a line, how to ignore comment lines

LMHmedchem · October 7, 2019, 12:21pm

Hello,

I have some code that works more or less. This is called by a make file to adjust some hard-coded definitions in the src code. The script generated some values by looking at some of the src files and then writes those values to specific locations in other files. The awk code is used to find the line in the src file that needs to be replaced and then print the modified file.

This example sets the size of a block of shared memory

# replace multiplier of SHMEMSIZE in sizedefs.h with $num_pages*pagesize
# generated value to substitute for current line
numpages_replacement_line="#define SHMEMSIZE (4096 * $num_pages)"
# value of comment character for current src file, C code in this case
comment_ch='//'
path to file we are working on
filename='./src/sizedefs.h'
# process file to switch out line to replace
awk -v replace="$numpages_replacement_line" \
    -v comment="$comment_ch" ' { if( substr($0,1,2) == comment )
                                    print $0;
                                 else if($0 ~ /\#define SHMEMSIZE \(4096 /)
                                    print replace;
                                 else
                                    print $0;
                                } ' $filename > tmp
# overwrite original file
mv tmp $filename

There are commented out lines in the src that could match the regex, /\#define SHMEMSIZE \(4096 /, so the first step compares the first two characters of $0 to see if the line is commented out. If so, the line is ignored (printed). If the line is not commented out, there is a check to see if the line contains the regex (the text in red above). Lines that do not match the regex are printed. When the regex is found, the substitute line is printed. Finally, the original file is overwritten by the modified temp file.

This works, but there are some issues.

The first issue is that I believe this will only work if the comment is the first two characters on the line. This doesn't look like it will work if the comment is indented, which is common in c and c++. This was originally written for old FORTRAN code where the comment was always a "C" in the first column of the punch card. Is there a way to ignore leading whitespace and check if "//" is the first two non-whitespace characters?

Second, I was not able to pass in the regex line, meaning the line I was looking for. When I tried something like,

numpages_find_line="\#define SHMEMSIZE \(4096"
-v find="$numpages_find_line"
$0 ~ find

I got an error for unmatched parenthesis and I wasn't able to figure out how to escape it. This means that the awk code has to be hard coded for each find and replace and can't be used as a function, etc.

This also doesn't account for multi-line comments. That isn't an issue in this case, but it would be nice to know how to address that. I guess I would look for the start /* and save lines in an array until the */ and then print the array.

Suggestions would be appreciated,

LMHmedchem

MadeInGermany · October 7, 2019, 5:05pm

The RE searches require some quoting effort.
The index() function searches for plain strings and returns the position.
Also only the comments left from the search string matter.
Two reasons to go for the index() function rather than the RE.

# replace multiplier of SHMEMSIZE in sizedefs.h with $num_pages*pagesize
# generated value to substitute for current line
numpages_find_line="#define SHMEMSIZE (4096 "
numpages_replacement_line="#define SHMEMSIZE (4096 * $num_pages)"
# value of comment character for current src file, C code in this case
comment_ch1='//'
comment_ch2='/*'
# path to file we are working on
filename='./src/sizedefs.h'
# process file to switch out line to replace
awk -v find="$numpages_find_line" \
    -v replace="$numpages_replacement_line" \
    -v comment1="$comment_ch1" \
    -v comment2="$comment_ch2" \
    '
    {
      comm1pos=index($0,comment1)
      comm2pos=index($0,comment2)
      commpos=((comm1pos > 0 && comm1pos < comm2pos || comm2pos == 0 ) ? comm1pos : comm2pos)
# commpos := the leftmost comment
      findpos=index($0,find)
# unchanged if commpos is left from findpos or not found
      if (commpos > 0 && commpos < findpos || findpos == 0)
        print $0;
      else
        print replace;
    }
    ' $filename > tmp
mv tmp $filename

Chubler_XL · October 7, 2019, 5:39pm

Surely there are better ways to do this than changing your source with awk. If this is a once off change have a programmer use an editor.

If it varies a lot why not have another define variable for NUMPAGES this can be passed as a compile time option using -D NUMPAGES=7 or something similar. In the code you can even default to some sane value if this has not been setup eg:

#ifdef NUMPAGES
   #define SHMEMSIZE(4096 * NUMPAGES)
#else
   #define SHMEMSIZE(32768)
#endif

LMHmedchem · October 9, 2019, 1:29pm

madeingermany:

The RE searches require some quoting effort.
The index() function searches for plain strings and returns the position.
Also only the comments left from the search string matter.
Two reasons to go for the index() function rather than the RE.

# replace multiplier of SHMEMSIZE in sizedefs.h with $num_pages*pagesize
# generated value to substitute for current line
numpages_find_line="#define SHMEMSIZE (4096 "
numpages_replacement_line="#define SHMEMSIZE (4096 * $num_pages)"
# value of comment character for current src file, C code in this case
comment_ch1='//'
comment_ch2='/*'
# path to file we are working on
filename='./src/sizedefs.h'
# process file to switch out line to replace
awk -v find="$numpages_find_line" \
   -v replace="$numpages_replacement_line" \
   -v comment1="$comment_ch1" \
   -v comment2="$comment_ch2" \
   '
   {
   comm1pos=index($0,comment1)
   comm2pos=index($0,comment2)
   commpos=((comm1pos > 0 && comm1pos < comm2pos || comm2pos == 0 ) ? comm1pos : comm2pos)
# commpos := the leftmost comment
   findpos=index($0,find)
# unchanged if commpos is left from findpos or not found
   if (commpos > 0 && commpos < findpos || findpos == 0)
   print $0;
   else
   print replace;
   }
   ' $filename > tmp
mv tmp $filename

Thanks for this, it makes the code much more usable since I can call it in a function instead of having to hard code the find line for each instance.

The number of shared memory pages could certainly be passed as a definition to the compiler since that part of the code is in c++. The script also determines a value for NUMPAGES by reading the src that generates the output and calculating the number of bytes to allocate. There are also several related PARAMETER values in old FORTRAN code that can't really be changed from the makefile since the f77 pre-processor is rather limited in that respect. It is possible to process FORTRAN src file with the C pre-processor (since I think that gnu compiles FORTRAN as C anyway). Doing that means that you can't use FORTRAN style includes in those files, so that messes up all of the other includes and defined parameters and such.

I need to run something to determine the size of the new output. There always seem to be issues in running a script from make and retrieving data if what you want is more than something simple. I could run something like

NUMPAGES=$(shell sizes.sh)

to get the number of pages, but that script determines several things that need to be modified in other sections of the code. I would either have to call the script many times, have many scripts, or write the data to a file, etc. It just seams easier to have the script make the changes since it needs to run anyway and the data is already in scope.

This is only an issue when the size of the output changes, so it doesn't come up as often as you might think. It is now automated by running make -f makefile resize all . The resize rule runs the script to determine the new sizes and makes the required changes in all the necessary sections of the code. The relevant objects and bin files are also deleted. The all rule then re-compiles and rebuilds the applications.

None of this is a perfect solution.

LMHmedchem

system · May 18, 2020, 4:59pm

Moderator comments were removed during original forum migration.