How to change certain values in a file

redse171 · September 13, 2012, 5:31pm

Hi all,

i need help to replace certain values in a file. I need the script to check and match the ID and exNum1. if match, values in $3 (file2.txt) need to replace the value for 'START' (file1.txt) for each match.

The sample structure is like this:-

File1.txt

ID   P_6
START    235411
END    18763
//
ID    P_10
START    631012
END    32814
//
ID    P_133
START    389417
END    314124
//

File2.txt

ex    21204    22151    P_6    
S     21204    22151    P_6     exNum 2
ex    22217    22765    P_6    
S     22217    22765    P_6     exNum 1
ex    37193    37735    P_10    
S     37193    37735    P_10     exNum 5
ex    37862    38019    P_10    
S     37862    38019    P_10     exNum 4
ex    38076    38835    P_10    
S     38076    38835    P_10     exNum 3
ex    38880    39050    P_10    
S     38880    39050    P_10     exNum 2
ex    39093    39644    P_10    
S     39093    39644    P_10     exNum 1
ex    42305    42440    P_133    
S     42305    42440    P_133     exNum 3
ex    42496    42656    P_133    
S     42496    42656    P_133     exNum 2
ex    42657    42674    P_133    
S     42657    42674    P_133     exNum 1

Output (file1.txt) should be updated like this:

ID    P_6
START   22765 
END    18763
//
ID    P_10
START    39644
END    32814
//
ID    P_133
START    42674
END    314124
//

I parsed structure of file1.txt using PERL before but i don't know how to change the certain value for a field in it. Any help would be appreciated. Thanks

RudiC · September 13, 2012, 6:12pm

You might want to try this one:

awk  'NR==FNR{if ($0~/exNum 1/) {rp[++cnt]=$3};next};
      /START/    {$2 = rp[++xtr]}
      1
    ' file2 file1

The first line collects all occurrences of field3 in lines with exNum 1 into an array, the second one replaces every START record's field 2 with the previously collected values in the order they were collected, and the 1 in the third line is an always TRUE pattern to print every line in file 1, eventually with replaced field 2.

redse171 · September 13, 2012, 7:21pm

Hi RudiC,

It worked when i used the sample above. But, unfortunately it did not work when i use my real data. I noticed that because my real file have more IDs that are not match with file2. I have tried changing your script but the output that i got is ridiculous.

when the file.txt is like this:

ID    P_200 START    12412 END    12444 // ID    P_6  START   235411   END    18763  //  ID    P_10  START    631012  END    32814  //  ID    P_60 START    3112 END    3281 // ID    P_9 START    5812 END    6112 // ID    P_133 START    389417  END    314124  //

the correct output supposed to be like this:

ID    P_200
START    12412
END    12444
//
ID    P_6 
START   22765 
 END    18763 
// 
ID    P_10 
START    39644 
END    32814 
//
 ID    P_60
START    3112
END    3281
//
ID    P_9
START    5812
END    6112
//
ID    P_133
START    42674 
END    314124 
//

The values of START for ID P_200, P_60 and P_9 should remain unchanged as there are no match in file2 for all of them.appreciate your help on this. Thanks

RudiC · September 15, 2012, 3:15pm

OK, try this (assuming you have a file with line breaks):

awk     'NR==FNR{if ($0~/exNum 1/) {rp[$4]=$3};next};
         /ID/           {tmp = $2}
         /START/        {if (tmp in rp) $2 = rp[tmp]}
         1
        ' file2 file1

btw, why don't you give the correct sample data in the first place?

redse171 · September 15, 2012, 4:40pm

Hi RudiC,

Yes, your new code worked great! Thanks..

yeah, i did not give the right sample. Sorry about that. :o. Should be more careful next time.