Help with awk for selecting lines in a file avoiding repetition

Hello,

I am using Awk in UBUNTU 12.04.

I have a file as following with 48,432,354 lines and 4 fields.
The file has this structure (There are repetitions of the first column in several lines)

AB_14 S54 A G
AB_14 S55 A A
AB_14 S56 G G
GO_15 S45 T A
GO_15 S46 A A
PT_16  S33 C C
PT_16  S34 G A
PT_16  S35 T T
PT_16  S36 T A

What I want to have as an outcome is this:

AB_14 S54 A G
GO_15 S45 T A
PT_16  S33 C C

That is to have a file only with the first lines of the first file. To mention, I have a file only with the list of names of the first file like this if that can be useful.

AB_14 
GO_15 
PT_16  

Thank you very much in advance.

awk '!a[$1]++' infile
1 Like

PERFECT! It worked, thank you very very much.

1 Like

Hi Mr complex,

could you please let me know how it works ?

awk '!a[$1]++' infile

1 Like

Only difference, is that in this case, only look at column #1

1 Like