ATOM 5181 N AMET K 406 12.440 6.552 25.691 0.50 7.37 N
ATOM 5182 CA AMET K 406 13.685 5.798 25.578 0.50 5.87 C
ATOM 5183 C AMET K 406 14.045 5.179 26.909 0.50 5.07 C
ATOM 5184 O MET K 406 14.595 4.083 27.003 0.50 7.07 O
ATOM 5185 CB MET K 406 14.812 6.674 25.044 0.50 6.80 C
ATOM 5191 C BMET K 406 14.044 5.177 26.910 0.50 5.15 C
ATOM 5192 O BMET K 406 14.589 4.078 27.004 0.50 7.09 O
ATOM 5197 N ALA K 407 13.718 5.884 27.972 1.00 5.30 N
ATOM 5198 CA ALA K 407 14.077 5.408 29.309 1.00 6.16 C
ATOM 5202 N AARG K 408 12.186 3.982 29.147 0.50 6.55 N
ATOM 5203 CA AARG K 408 11.407 2.745 29.387 0.50 7.31 C
I would like to remove the first character from the fourth column only if the column has four characters. (in-place editing)
Desired output
ATOM 5181 N MET K 406 12.440 6.552 25.691 0.50 7.37 N
ATOM 5182 CA MET K 406 13.685 5.798 25.578 0.50 5.87 C
ATOM 5183 C MET K 406 14.045 5.179 26.909 0.50 5.07 C
ATOM 5184 O MET K 406 14.595 4.083 27.003 0.50 7.07 O
ATOM 5185 CB MET K 406 14.812 6.674 25.044 0.50 6.80 C
ATOM 5191 C MET K 406 14.044 5.177 26.910 0.50 5.15 C
ATOM 5192 O MET K 406 14.589 4.078 27.004 0.50 7.09 O
ATOM 5197 N ALA K 407 13.718 5.884 27.972 1.00 5.30 N
ATOM 5198 CA ALA K 407 14.077 5.408 29.309 1.00 6.16 C
ATOM 5202 N ARG K 408 12.186 3.982 29.147 0.50 6.55 N
ATOM 5203 CA ARG K 408 11.407 2.745 29.387 0.50 7.31 C
Could you please try the following and let me know if this helps you.
1st code is as follows.
sed 's/.MET/MET/g; s/.ARG/ARG/g' remove_char_4th_column
Output will be as folllows.
ATOM 5181 N MET K 406 12.440 6.552 25.691 0.50 7.37 N
ATOM 5182 CA MET K 406 13.685 5.798 25.578 0.50 5.87 C
ATOM 5183 C MET K 406 14.045 5.179 26.909 0.50 5.07 C
ATOM 5184 O MET K 406 14.595 4.083 27.003 0.50 7.07 O
ATOM 5185 CB MET K 406 14.812 6.674 25.044 0.50 6.80 C
ATOM 5191 C MET K 406 14.044 5.177 26.910 0.50 5.15 C
ATOM 5192 O MET K 406 14.589 4.078 27.004 0.50 7.09 O
ATOM 5197 N ALA K 407 13.718 5.884 27.972 1.00 5.30 N
ATOM 5198 CA ALA K 407 14.077 5.408 29.309 1.00 6.16 C
ATOM 5202 N ARG K 408 12.186 3.982 29.147 0.50 6.55 N
ATOM 5203 CA ARG K 408 11.407 2.745 29.387 0.50 7.31 C
2nd code is as follows.
sed 's/AMET/MET/g; s/BMET/MET/g; s/AARG/ARG/g' remove_char_4th_column
Output will be as follows.
ATOM 5181 N MET K 406 12.440 6.552 25.691 0.50 7.37 N
ATOM 5182 CA MET K 406 13.685 5.798 25.578 0.50 5.87 C
ATOM 5183 C MET K 406 14.045 5.179 26.909 0.50 5.07 C
ATOM 5184 O MET K 406 14.595 4.083 27.003 0.50 7.07 O
ATOM 5185 CB MET K 406 14.812 6.674 25.044 0.50 6.80 C
ATOM 5191 C MET K 406 14.044 5.177 26.910 0.50 5.15 C
ATOM 5192 O MET K 406 14.589 4.078 27.004 0.50 7.09 O
ATOM 5197 N ALA K 407 13.718 5.884 27.972 1.00 5.30 N
ATOM 5198 CA ALA K 407 14.077 5.408 29.309 1.00 6.16 C
ATOM 5202 N ARG K 408 12.186 3.982 29.147 0.50 6.55 N
ATOM 5203 CA ARG K 408 11.407 2.745 29.387 0.50 7.31 C
Where I am having the input provided bby you in a file named remove_char_4th_column.
Thank you for your answer. I need in-place editing because I have lot of files like this. In the given example, the name of strings are AMET, BMET, AARG and ALA. In other files, the name of strings are different. So I think, your code is difficult for me to use for multiple files.
Inline change...
Try this... Works for the given pattern...
sed -i 's/\(.* \).*\(... [A-Z] [0-9].*\)/\1\2/g' infile
-bash-3.2$ sed 's/\(.* \).*\(... [A-Z] [0-9].*\)/\1\2/g' infile
ATOM 5181 N MET K 406 12.440 6.552 25.691 0.50 7.37 N
ATOM 5182 CA MET K 406 13.685 5.798 25.578 0.50 5.87 C
ATOM 5183 C MET K 406 14.045 5.179 26.909 0.50 5.07 C
ATOM 5184 O MET K 406 14.595 4.083 27.003 0.50 7.07 O
ATOM 5185 CB MET K 406 14.812 6.674 25.044 0.50 6.80 C
ATOM 5191 C MET K 406 14.044 5.177 26.910 0.50 5.15 C
ATOM 5192 O MET K 406 14.589 4.078 27.004 0.50 7.09 O
ATOM 5197 N ALA K 407 13.718 5.884 27.972 1.00 5.30 N
ATOM 5198 CA ALA K 407 14.077 5.408 29.309 1.00 6.16 C
ATOM 5202 N ARG K 408 12.186 3.982 29.147 0.50 6.55 N
ATOM 5203 CA ARG K 408 11.407 2.745 29.387 0.50 7.31 C
Pattern is built assuming there will be a single alphabet (here it is K) followed by a space and a number.
@ahamed
You remove to much with this. Column 4 should be at least 3 characters long.
You have 2 in row 4,5,8,9
@hasanabdulla
Is format always the same in all the file? Then use Chublers solution
Are there other combination of letters than MET ALA ARG ? Then do not use sed solution that just replace text
This test if field 4 has more than 3 characters, if it does, remove the first character.
Not sure why it mess up the format and how to fix it.
awk '{$4=(length($4)>3)?substr($4,2):$4}1' file
ATOM 5181 N MET K 406 12.440 6.552 25.691 0.50 7.37 N
ATOM 5182 CA MET K 406 13.685 5.798 25.578 0.50 5.87 C
ATOM 5183 C MET K 406 14.045 5.179 26.909 0.50 5.07 C
ATOM 5184 O MET K 406 14.595 4.083 27.003 0.50 7.07 O
ATOM 5185 CB MET K 406 14.812 6.674 25.044 0.50 6.80 C
ATOM 5191 C MET K 406 14.044 5.177 26.910 0.50 5.15 C
ATOM 5192 O MET K 406 14.589 4.078 27.004 0.50 7.09 O
ATOM 5197 N ALA K 407 13.718 5.884 27.972 1.00 5.30 N
ATOM 5198 CA ALA K 407 14.077 5.408 29.309 1.00 6.16 C
ATOM 5202 N ARG K 408 12.186 3.982 29.147 0.50 6.55 N
ATOM 5203 CA ARG K 408 11.407 2.745 29.387 0.50 7.31 C