Hi,
I have a file like this (about 8 columns in total, this being the 2nd column)
gi_49482297_ref_YP_039521.1_
gi_49482297_ref_YP_039521.1_
gi_49482315_ref_YP_039539.1_
gi_49482315_ref_YP_039539.1_
I want to remove the _ at the end of the line.
And at later stages I would want to replace the _ with another character perhaps.
how can I do it using awk or sed?
Any help would be highly appreciated.
Hello Syeda,
Following may help you in same, let's say you have a Input_file as follows(which is an example as you haven't shown us complete input and didn't tell us about field separator so I am taking it as a test, where field separator is a space and which has 7 columns in it.)
Input_file:
cat Input_file
Ravinder gi_49482297_ref_YP_039521.1_ TESTing test123 sixth_column_ seventh eight_column_test
TEST121 gi_49482297_ref_YP_039521.1_ TESTing test123 sixth_column_ seventh eight_column_test
TEST1211 gi_49482315_ref_YP_039539.1_ TESTing test123 sixth_column_ seventh eight_column_test
TEST12134 gi_49482315_ref_YP_039539.1_ TESTing test123 sixth_column_ seventh eight_column_test
Now following code may help in same.
awk '{for(i=1;i<=NF;i++){if(i==2){sub(/\_$/,X,$i)} else {sub(/\_$/,"_new charachter",$i)};}} 1' Input_file
Output will be as follows.
Ravinder gi_49482297_ref_YP_039521.1 TESTing test123 sixth_column_new charachter seventh eight_column_test
TEST121 gi_49482297_ref_YP_039521.1 TESTing test123 sixth_column_new charachter seventh eight_column_test
TEST1211 gi_49482315_ref_YP_039539.1 TESTing test123 sixth_column_new charachter seventh eight_column_test
TEST12134 gi_49482315_ref_YP_039539.1 TESTing test123 sixth_column_new charachter seventh eight_column_test
Where I am changing 2nd columns _
with NULL and other columns (only 5th column in my example file) _
with a string _new charachter
which you can put it as per your requirement into code. Let us know if this helps you.
Thanks,
R. Singh
1 Like
Thanks R. Singh but I am not really getting it, possibly because i have a very limited knowledge of awk commands.
what do I have to do if I only want to remove the _ from 2nd column? I have tried using the first part of your code but its not working.
awk '{for(i=1;i<=NF;i++){if(i==2){sub(/\_$/,X,$i)}
what am I doing wrong?
Hello Syeda,
If you want to only substitute $2
's _
present at last of $2
then following may help you. As you had mentioned in first post that you need to substitute other columns _
too so I have taken POST#2 example, please try following and let me know if this helps you.
Input_file:
cat Input_file
Ravinder gi_49482297_ref_YP_039521.1_ TESTing test123 sizth_column_ seventh eight_column_test
TEST121 gi_49482297_ref_YP_039521.1_ TESTing test123 sizth_column_ seventh eight_column_test
TEST1211 gi_49482315_ref_YP_039539.1_ TESTing test123 sizth_column_ seventh eight_column_test
TEST12134 gi_49482315_ref_YP_039539.1_ TESTing test123 sizth_column_ seventh eight_column_test
awk '{sub(/\_$/,X,$2);print}' Input_file
Output will be as follows.
Ravinder gi_49482297_ref_YP_039521.1 TESTing test123 sizth_column_ seventh eight_column_test
TEST121 gi_49482297_ref_YP_039521.1 TESTing test123 sizth_column_ seventh eight_column_test
TEST1211 gi_49482315_ref_YP_039539.1 TESTing test123 sizth_column_ seventh eight_column_test
TEST12134 gi_49482315_ref_YP_039539.1 TESTing test123 sizth_column_ seventh eight_column_test
Thanks,
R. Singh
1 Like
Oh yes I got it. thanks.
now i can change the code into
awk '{sub(/\_$/,"anything",$2);print}
to print anything I want at the end of column 2.
Thanks a lot
One thing more, how can I specify the specific position at which I want to make the change? I mean if I want to change something that is not at the end of the column.
Hello Syeda,
Here is an example suppose you want to substitute the 2nd occurrence of _
in $2
then following may help you.
Input_file:
Ravinder gi_49482297_ref_YP_039521.1_ TESTing test123 sizth_column_ seventh eight_column_test
TEST121 gi_49482297_ref_YP_039521.1_ TESTing test123 sizth_column_ seventh eight_column_test
TEST1211 gi_49482315_ref_YP_039539.1_ TESTing test123 sizth_column_ seventh eight_column_test
TEST12134 gi_49482315_ref_YP_039539.1_ TESTing test123 sizth_column_ seventh eight_column_test
Following is the code for same.
awk -vvar=2 '{split($2, A,"_");{for(i=1;i<=length(A);i++){if((i-1)==var){k=""} else {k="_"};q=q?q k A:A};$2=q;;q=""}} 1' Input_file
Output will be as follows.
Ravinder gi_49482297ref_YP_039521.1_ TESTing test123 sizth_column_ seventh eight_column_test
TEST121 gi_49482297ref_YP_039521.1_ TESTing test123 sizth_column_ seventh eight_column_test
TEST1211 gi_49482315ref_YP_039539.1_ TESTing test123 sizth_column_ seventh eight_column_test
TEST12134 gi_49482315ref_YP_039539.1_ TESTing test123 sizth_column_ seventh eight_column_test
Here I have given a variable named var=2
in my code as I wanted to change only second occurrence in $2
of _
.
You could change it accordingly as per your requirement too. Hope this helps.
Thanks,
R. Singh
1 Like