remove duplicates based on single column

Hello,

I am new to shell scripting. I have a huge file with multiple columns for example:

I have 5 columns below.

HWUSI-EAS000_29:1:105 + chr5 76654650 AATTGGAA HHHHG
HWUSI-EAS000_29:1:106 + chr5 76654650 AATTGGAA B@HYL
HWUSI-EAS000_29:1:108 + chr5 76654650 AATTGGAA C:)ADH
HWUSI-EAS000_29:1:110 - chr6 86754325 GATCGTAA YYCHY

I want to remove duplicates based on column 4 (7664650). In the above case it should list me only row1 and row 4

Any help on this is greatly appreciated.

Thanks,

Diya

awk '{a[$4]++}!(a[$4]-1)' file
1 Like

Thank you.. It worked exactly as what i needed.

awk '!a[$4]++' infile
$ echo "HWUSI-EAS000_29:1:105 + chr5 76654650 AATTGGAA HHHHG
HWUSI-EAS000_29:1:106 + chr5 76654650 AATTGGAA B@HYL
HWUSI-EAS000_29:1:108 + chr5 76654650 AATTGGAA CADH
HWUSI-EAS000_29:1:110 - chr6 86754325 GATCGTAA YYCHY" | sort -k4,4 -u

HWUSI-EAS000_29:1:105 + chr5 76654650 AATTGGAA HHHHG
HWUSI-EAS000_29:1:110 - chr6 86754325 GATCGTAA YYCHY