How to remove those sequence with same amino acid?What command line I should type?

patrick_chia · January 20, 2009, 7:17am

My input is listed as:
giNumber RefAminoAcid VarAminoAcid
10190711 P P
10190711 D D
109255248 I A
110349771 A D

My desired output is:
giNumber RefAminoAcid VarAminoAcid
109255248 I A
110349771 A D

*Those with same amino acid, I want delete it and just remain those different amino acid one at the end.
What command line I should type?
Thanks you and appreciate your advise.

Franklin52 · January 20, 2009, 8:01am

Try this:

awk 'NR==FNR{a[$1]++;next}a[$1]==1' file file

Regards

koti_rama · January 20, 2009, 8:07am

try like this:
s1-> this contains you source date
out-> out put file

while read line
do
acid=`echo $line | awk ' FS=" " {print $1}'`
count=`grep $acid s1 | wc -l`
if [ "$count" -gt "1" ]; then
echo " Acid $acid exit in out file"
continue;
else
echo "$line" >>out.log
fi
done<s1

Perderabo · January 20, 2009, 8:35am

Actually that first field is not an amino acid. Field 2 and field 3 must be different from each other. If this understanding is right, then

$
$ cat file
giNumber RefAminoAcid VarAminoAcid
10190711 P P
10190711 D D
109255248 I A
110349771 A D
$
$
$  awk '$2 != $3' file
giNumber RefAminoAcid VarAminoAcid
109255248 I A
110349771 A D
$
$

The Single Letter Amino Acid Code

patrick_chia · January 20, 2009, 8:50pm

perderabo:

Actually that first field is not an amino acid. Field 2 and field 3 must be different from each other. If this understanding is right, then
$
$ cat file
giNumber RefAminoAcid VarAminoAcid
10190711 P P
10190711 D D
109255248 I A
110349771 A D
$
$
$  awk '$2 != $3' file
giNumber RefAminoAcid VarAminoAcid
109255248 I A
110349771 A D
$
$
The Single Letter Amino Acid Code

Perderabo, thanks a lot for your help. Your method is faster and effective. Really help me a lot.