HI,
I have a file with 2 columns:
ENSG00000003137,ENST00000001146
ENSG00000003137,ENST00000412253
ENSG00000003402,ENST00000309955
ENSG00000003402,ENST00000443227
ENSG00000003402,ENST00000341222
and I want to retain only the first entry while ignoring the rest. The output should look like this:
ENSG00000003137,ENST00000001146
ENSG00000003402,ENST00000309955
I have tried using awk : awk '!a[$1$2]++'
but it does not work.
Kindly help.
I think you need to specify a field separator as a comma.
Owner@Owner-PC ~
$ awk -F, '!a[$1]++' filename
ENSG00000003137,ENST00000001146
ENSG00000003402,ENST00000309955
Owner@Owner-PC ~
$ awk '!a[$1]++' filename
ENSG00000003137,ENST00000001146
ENSG00000003137,ENST00000412253
ENSG00000003402,ENST00000309955
ENSG00000003402,ENST00000443227
ENSG00000003402,ENST00000341222
I used the sample data
2 Likes
anbu23
4
$ sort -t"," -k1,1 -u file
ENSG00000003137,ENST00000001146
ENSG00000003402,ENST00000309955
awk -F, 'a[$1]++==0' filename
is quick and dirty because it stores an unnecessary integer value.
The full and efficient code is
awk -F, '!($1 in a) { a[$1]; print }' filename
That you can condense again to an implicit print
awk -F, '!(($1 in a) || a[$1])' filename
or
awk -F, '!($1 in a) && !a[$1]' filename
1 Like