I have a bit of a complex problem that I would like to solve with
awk
. It is essentially a 2-part problem.
I have a large directory of files with the same format, each with 266 lines.
The first 206 lines of each file are filled with attribute information.
Then the following 60 lines consist of 202 values separated by commas.
The first position in each of these sixty lines is a word (string value), and the last position in each of these sixty lines is a number (1 or 0).
Is it possible to change the last slot ($202) numeric value of lines that contain certain strings that are indicated in a separate file?
To visualize the problem.
My data file looks like this:
@RELATION relationData
@ATTRIBUTE att0 STRING
@ATTRIBUTE att1 NUMERIC
@ATTRIBUTE att2 NUMERIC
@ATTRIBUTE att3 NUMERIC
....
@ATTRIBUTE att200 NUMERIC
@ATTRIBUTE class {1,0}
@DATA
hall,1,2,3,...,201,0
cat,1,2,3,...,201,1
dog,1,2,3,...,201,1
feather,1,2,3,...,201,1
I have a second file with a list of words (1 per line):
cat
feather
I want to change the final numeric value on those lines that contain a word in the second file to 0, so that my file result is:
@RELATION relationData
@ATTRIBUTE att0 STRING
@ATTRIBUTE att1 NUMERIC
@ATTRIBUTE att2 NUMERIC
@ATTRIBUTE att3 NUMERIC
....
@ATTRIBUTE att200 NUMERIC
@ATTRIBUTE class {1,0}
@DATA
hall,1,2,3,...,201,0
cat,1,2,3,...,201,0
dog,1,2,3,...,201,1
feather,1,2,3,...,201,0
Is this possible to do with
awk
? Any suggestions of how to tackle this problem?
Perhaps something like this:
awk -v ip1="$INPUT1" -v ip2="$INPUT2" '{gsub( /String1/, ip1);gsub( /String2/, ip2);print}' file
which I found HERE can be modified?