You can find lots of examples in the UNIX & Linux Forums of awk
scripts that work on two or more input files and that produce two or more output files (although we don't need to do the latter in this case).
In an awk
program, each group of statements is of the general form:
condition { action }
Before any lines are read from any of the input files named as operands, the commands specified in the actions of all groups with the condition BEGIN
(if there are any) are executed in the order in which they appear in the awk
program. There aren't any BEGIN
sections in my code for this thread.
After all files of the input files named as operands have been processed, all commands specified in the actions of all groups with the condition END
(if there are any) are executed in the order in which they appear in the awk
program.
All other groups are processed in the order in which they appear in the awk
program for every record (with default options, each input line in each file is a record) is processed. If the condition for a group evaluates to a non-zero numeric value or to a non-empty string string value (i.e., evaluates to TRUE), the statements in the action for that group are executed in order; otherwise the statements in that group are skipped for that input record. If there is no condition at the start of a group, the commands in the action in that group are always executed. If the condition evaluates to TRUE and the action and braces ( {
and }
) are omitted, a default action of print
(which prints the current state of the current input record) is performed.
I will assume that you can read the manual page on for awk
(by giving the command man awk
at a ksh
primary prompt in your shell window) to see what the standard awk
variables, functions, and statements do. I would hope that the comments I supplied in each group explain what that group is trying to do.
The first group:
{ # Get rid of <carriage-return> at end of line if there
# is one. Set cr to <carriage-return> if there was one; otherwise
# set it to an empty string.
cr = sub(/\r$/, "") ? "\r" : ""
}
(with no condition is executed for every record read from both input files and does exactly what the comments say it does.
The second group:
FNR == NR {
# For all lines in the first input file...
# Set a search string as an index in the add_com[] array corresponding
# to this input line.
add_com["test \"analog/" $0 "\""]
next
}
is executed when the condition FNR == NR
evaluates to TRUE. It evaluates to TRUE when the Number of Records read from the current File ( FNR
) is equal to the Number of Records read from all files ( NR
) which happens when any line from the 1st input file is being processed. The next
statement in this action causes all remaining statements in the current action (if there are any) and in any following groups to be skipped for this input record, causes the next available input record to be read, and starts processing groups in order for that new input record. The combination of the action and the next
statement guarantee that the following group will not be performed for records read from the 1st input file.
The third group:
{ # For all lines in the second input file, look for a match in add_com[].
for(i in add_com)
if(index($0, i)) {
# Match found.
# Set this line in the output buffer to include a
# comment and put back the <carriage-return> if there
# was one.
o[FNR] = $0 " ! comment" cr
# Note that a modification was made.
mod = 1
next
}
# No match found.
# Copy this line to output buffer unchanged (restoring the
# <carriage-return> if there was one).
o[FNR] = $0 cr
}
even though there is no condition is only executed for input files after the 1st input file (and in this code there are only two input files). This group copies the input records as they are read into an output buffer array ( o[]
) with the index in the array being the current input file record number after searching for and updating any lines that contain the key strings created from lines found in the 1st input file.
The fourth group:
END { # If any changes were made, copy the new contents of the second file
# back into that file.
if(mod)
for(i = 1; i <= FNR; i++)
print o > FILENAME
}
with the action END
(as described before) evaluates to FALSE for every line read from the two input files and is only processed after end-of-file is reached on both input files. As noted in the comments, this group copies the accumulated output buffer back into the last input file. The number of lines found in the last input file ( FNR
) and the pathname of the last input file ( FILENAME
) remain valid during any END
actions.