Replacing matched patterns in multiple files with awk

karlmalowned · January 12, 2016, 5:40pm

Hello all,

I have since given up trying to figure this out and used sed instead, but I am trying to understand awk and was wondering how someone might do this in awk.

I am trying to match on the first field of a specific file with the first field on multiple files, and append the second field of the first file on multiple files (preferably inplace).

Example:
file1

1000/3333,20150101-96
1000/4444,20150102-02
1000/5555,20150103-29
1000/6666,20150104-67

file2

1000/3333
9999/9999

file3

1000/6666
8888/8888

Preferred output:
file2

1000/3333,20150101-96
9999/9999

file3

1000/6666,20150104-67
8888/8888

I have seen plenty of examples on how to work with two files, or storing one field in an array and only acting on that one field.

I stole the code below and modified it for my purpose, but I still don't understand how I could perform this on multiple files or if it's possible to edit in-place (gawk -i inplace):

#!/bin/bash
INPUTFILE="/home/username/blah/file1"
DATAFILE="/home/username/blah/file2"
OUTFILE="/home/username/pleasework.out"

awk 'BEGIN {
while (getline < "'"$INPUTFILE"'")
{
split($0,a,",");
name=a[1];
date=a[2];

key=date
data=name
nameofarray[data]=key;
}
close("'"$INPUTFILE"'");

while (getline < "'"$DATAFILE"'")
{
var=nameofarray[$0];
print $0","var > "'"$OUTFILE"'"; 
}
}'

Don_Cragun · January 12, 2016, 6:37pm

Maybe you wanted something more like:

#!/bin/bash
# Move to the directory where the input files are located.
cd "/home/username/blah"

awk '
BEGIN {	# Set input and output field separators.
	FS = OFS = ","
}
FNR == NR {
	# Get keys from 1st input file.
	key[$1] = $2
	next
}
function copyback() {
	# After we have read the last line of input from a file, rewrite it
	# with updated contents if anything was changed.
	if(nc) {
		for(i = 1; i <= lines; i++)
			print d > filename
		close(filename)
	}
}
FNR == 1 {
	# When we see the first line of subsequent files, update the previous
	# input file...
	copyback()
	# and get ready for the current input file...
	filename = FILENAME
	lines = nc = 0
}
{	# Gather data from current input file...
	if($1 in key) {
		# Add data gathered from first input file
		d[++lines] = $1 OFS key[$1]
		# Increment the number of changes made.
		nc++
	} else {	# Just save the input line unchanged.
		d[++lines] = $0
	}
}
END {	# When we hit EOF on the last input file, update the last input file...
	copyback()
}' file1 file2 file3

If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk .

karlmalowned · January 13, 2016, 7:55am

This is amazing. Thank you very much Don, and I really do appreciate the details in the comments!!