Hi experts,
I have a very long file that looks about like this.
aaad_1577 64000
aaad_1577 72000
aaad_1577 72000
aaad_1577 65000
aaad_1577 65000
(...aaad about a thousand times...)
bbbd_2002 56000
bbbd_2002 57000
bbbd_3045 57000
cccd_3452 150000
dddd_6014 150000
dddd_6014 150000
dddd_6014 150000
(...dddd about a thousand times...)
I want to ignore the rows where the first column values occur fewer than handful of times, say 5 times.
It would be helpful if I could see how many occurrences I'm getting before I ignore them so I can go from this:
aaad_1577 64000 1005
aaad_1577 72000 1005
aaad_1577 72000 1005
aaad_1577 65000 1005
aaad_1577 65000 1005
(...aaad about a thousand times...)
bbbd_2002 56000 2
bbbd_2002 57000 2
bbbd_3045 57000 1
cccd_3452 150000 1
dddd_6014 150000 1003
dddd_6014 175000 1003
dddd_6014 150000 1003
(...dddd about a thousand times...)
to using this:
awk '{ if ($3>3) print $0}' [file]
and get this:
aaad_1577 64000 1005
aaad_1577 72000 1005
aaad_1577 72000 1005
aaad_1577 65000 1005
aaad_1577 65000 1005
(...aaad about a thousand times...)
dddd_6014 150000 1003
dddd_6014 175000 1003
dddd_6014 150000 1003
(...dddd about a thousand times...)
Thank you!