Search replace with awk using 2 files

owwow14 · October 16, 2014, 5:59am

I have a bit of a complex problem that I would like to solve with

awk

. It is essentially a 2-part problem.

I have a large directory of files with the same format, each with 266 lines.
The first 206 lines of each file are filled with attribute information.
Then the following 60 lines consist of 202 values separated by commas.
The first position in each of these sixty lines is a word (string value), and the last position in each of these sixty lines is a number (1 or 0).
Is it possible to change the last slot ($202) numeric value of lines that contain certain strings that are indicated in a separate file?

To visualize the problem.
My data file looks like this:

@RELATION relationData

@ATTRIBUTE att0 STRING
@ATTRIBUTE att1 NUMERIC
@ATTRIBUTE att2 NUMERIC
@ATTRIBUTE att3 NUMERIC
....
@ATTRIBUTE att200 NUMERIC

@ATTRIBUTE class {1,0}

@DATA
hall,1,2,3,...,201,0
cat,1,2,3,...,201,1
dog,1,2,3,...,201,1
feather,1,2,3,...,201,1

I have a second file with a list of words (1 per line):

cat
feather

I want to change the final numeric value on those lines that contain a word in the second file to 0, so that my file result is:

@RELATION relationData

@ATTRIBUTE att0 STRING
@ATTRIBUTE att1 NUMERIC
@ATTRIBUTE att2 NUMERIC
@ATTRIBUTE att3 NUMERIC
....
@ATTRIBUTE att200 NUMERIC

@ATTRIBUTE class {1,0}

@DATA
hall,1,2,3,...,201,0
cat,1,2,3,...,201,0
dog,1,2,3,...,201,1
feather,1,2,3,...,201,0

Is this possible to do with

awk

? Any suggestions of how to tackle this problem?

Perhaps something like this:

awk -v ip1="$INPUT1" -v ip2="$INPUT2" '{gsub( /String1/, ip1);gsub( /String2/, ip2);print}' file

which I found HERE can be modified?

Akshay_Hegde · October 16, 2014, 7:19am

words to search

[akshay@nio tmp]$ cat word 
cat
feather

Data file

[akshay@nio tmp]$ cat myfile 
@RELATION relationData

@ATTRIBUTE att0 STRING
@ATTRIBUTE att1 NUMERIC
@ATTRIBUTE att2 NUMERIC
@ATTRIBUTE att3 NUMERIC
....
@ATTRIBUTE att200 NUMERIC

@ATTRIBUTE class {1,0}

@DATA
hall,1,2,3,...,201,0
cat,1,2,3,...,201,1
dog,1,2,3,...,201,1
feather,1,2,3,...,201,1

Code executed

[akshay@nio tmp]$ awk 'FNR==NR{A[$1];next}($1 in A){$NF=0}1' word FS=',' OFS=',' myfile

Output

@RELATION relationData

@ATTRIBUTE att0 STRING
@ATTRIBUTE att1 NUMERIC
@ATTRIBUTE att2 NUMERIC
@ATTRIBUTE att3 NUMERIC
....
@ATTRIBUTE att200 NUMERIC

@ATTRIBUTE class {1,0}

@DATA
hall,1,2,3,...,201,0
cat,1,2,3,... 201,0
dog,1,2,3,...,201,1
feather,1,2,3,... 201,0

owwow14 · October 16, 2014, 9:18am

Thank you Akshay, However, there are 2 input files.
Is there a specific order I should put the

List file

and the

Data file

?

Akshay_Hegde · October 16, 2014, 9:46am

Yes .

RavinderSingh13 · October 16, 2014, 9:49am

Hello owwow14,

Following may help you in same. Using Akshay's approach but changing the file names to be more understandable format.

awk 'FNR==NR{A[$1];next}($1 in A){$NF=0}1' list FS=',' OFS=',' data

EDIT: yes order of files matters here, as condition FNR==NR will be true only when first file is getting read.

Thanks,
R. Singh