Using tr, sed or awk to delete text from nth column only

Hi everyone, this is my first post here, I hope someone can help me.

I have a file which I need to delete characters '_F3' from the end of the text in the first column. The problem is that the characters may also occur elsewhere in the file (i.e. second columns onwards). I tried sed (thinking I was a linux genius) and then realised that there were rows where the character is not in the first column, but appears in a later column, and I was deleting the second occurrence of '_F3' when I only actually want to delete it if it is in the first column.

The command I was using was:

sed 's/_F3//' filename > newfilename

I need to retain everything in the file except the trailing _F3 in the first column, and write to a new file.

I have spent ~3h trying to find a solution to this, and think its probably an awk command I need, but I am afraid my awk skills are 0. :wall:

Please can someone help me out!

Many thanks
Helen

Welcome the the forum.

This might help:

awk '{gsub("_F3","",$1)}1' filename > newfilename  

Yes that has worked, but unfortunately I seem to have lost the formatting of my file now. I think it was tab delimited before, now it just has a blank space between columns, and my next process wont accept this.

Helen

Before:

1329_105_1480_F3        355     chr1    13484   1       50M     =       13572   123     CAGCTGCACCACTGCCTGGCGCTGTGCCCTTCCTTTGCTCTGCCCGCTGG      ,,@NPYG423BC553AC.2BPB;:7OH0.-=><1,I3!5=D<4)-OD=44      NM:i:0  NH:i:4  CC:Z:chr12      CP:i:92080      XS:A:+  HI:i:0
1863_1224_411_F3        99      chr1    13484   3       50M     =       13572   123     CAGCTGCACCACTGCCTGGCGCTGTGCCCTTCCTTTGCTCTGCCCGCTGG      UUZ_[][VXY[XRNOYJFURZZULJZ_ZOQ[SRRTW@CPBJHJWU\FAMM      NM:i:0  NH:i:2  CC:Z:chr2       CP:i:114357483  XS:A:+  HI:i:0

After:

1329_105_1480 355 chr1 13484 1 50M = 13572 123 CAGCTGCACCACTGCCTGGCGCTGTGCCCTTCCTTTGCTCTGCCCGCTGG ,,@NPYG423BC553AC.2BPB;:7OH0.-=><1,I3!5=D<4)-OD=44 NM:i:0 NH:i:4 CC:Z:chr12 CP:i:92080 XS:A:+ HI:i:0
1863_1224_411 99 chr1 13484 3 50M = 13572 123 CAGCTGCACCACTGCCTGGCGCTGTGCCCTTCCTTTGCTCTGCCCGCTGG UUZ_[][VXY[XRNOYJFURZZULJZ_ZOQ[SRRTW@CPBJHJWU\FAMM NM:i:0 NH:i:2 CC:Z:chr2 CP:i:114357483 XS:A:+ HI:i:0

Try this

$ awk '{gsub("_F3","",$1)}1' FS=OFS file
1329_105_1480        355     chr1    13484   1       50M     =       13572   123     CAGCTGCACCACTGCCTGGCGCTGTGCCCTTCCTTTGCTCTGCCCGCTGG      ,,@NPYG423BC553AC.2BPB;:7OH0.-=><1,I3!5=D<4)-OD=44      NM:i:0  NH:i:4  CC:Z:chr12      CP:i:92080      XS:A:+  HI:i:0
1863_1224_411        99      chr1    13484   3       50M     =       13572   123     CAGCTGCACCACTGCCTGGCGCTGTGCCCTTCCTTTGCTCTGCCCGCTGG      UUZ_[][VXY[XRNOYJFURZZULJZ_ZOQ[SRRTW@CPBJHJWUFAMM      NM:i:0  NH:i:2  CC:Z:chr2       CP:i:114357483  XS:A:+  HI:i:0  
$ 

Helen

awk '{a=x;l=split($1,_,"_");if(_[l]=="F3"){for(i=0;++i<l;){a=(a?a"_":x)_};sub($1,a)}}1' file

Danmero

Thank you very much. The script is taking a long time to run but looks like it is going to work perfectly (works on a small extract of the file).

Thank you so much for your help. :b:

Helen

perl -plne 's/^(\w+)_F3/$1/' datafile
1 Like