Print row if value in column 1 is the first occurence

Hi All,

I would like to have a script which is able to perform the below.
Print the whole row if column1 which is "0001" for the below example is the first occurrence. Subsequent "0001" occurrence will not be printed out and so on.

Can any expert help ?

Input:

0001 k= 40
0001 k= 2
0002 k= 1
0003 k= 1
0004 k= 77
0004 k= 1
0005 k= 88
0005 k= 6

Output:

0001 k= 40
0002 k= 1
0003 k= 1
0004 k= 77
0005 k= 88

$ cat buf                                                                                                                                                               
0001 k= 40
0001 k= 2
0002 k= 1
0003 k= 1
0004 k= 77
0004 k= 1
0005 k= 88
0005 k= 6
$ perl -n -e '($num) = split /=/; next if $found[$num]; print; $found[$num] = 1' buf  
0001 k= 40
0002 k= 1
0003 k= 1
0004 k= 77
0005 k= 88

$cat test
0001 k= 40
0001 k= 2
0002 k= 1
0003 k= 1
0004 k= 77
0004 k= 1
0005 k= 88
0005 k= 6

for i in `cat test|cut -d" " -f1`
do
grep "$i" test | head -1 >> out.txt
done
sort -u out.txt

0001 k= 40
0002 k= 1
0003 k= 1
0004 k= 77
0005 k= 88

And the Python approach

#!/usr/local/bin/python

keys = {}

input = file('test')
for line in input:
   key = line.split(' ', 1)[0]
   if key not in keys:
      keys[key] = 1
      print line,

Like a FAQ :slight_smile:
Awk:

awk '!x[$1]++' file

Use nawk or /usr/xpg4/bin/awk on Solaris.

Perl:

perl -ane'print unless $x{$F[0]}++' file

Brilliant, radoulov! :slight_smile:

Hi radoulov,

The perl code seems to work but not the awk.
Can you help ? I am using solaris by the way.

Also, can you explain your perl code so that i can understand better. What is the function of "-ane"

$ nawk '!x[$1]++' file
x[$1]++': Event not found
$ awk '!x[$1]++' file
x[$1]++': Event not found
$ /usr/xpg4/bin/awk '!x[$1]++' file
x[$1]++': Event not found

It's your shell which is breaking the awk script. Put it in a file or switch to a shell which doesn't barf on the ! character.

I'm sure you will be able to peruse the "perlrun" manual page to figure out what the -a -n -e switches do. Basically -n says loop over the input like awk and -a says tokenize the input like awk, and -e is like in sed to pass a script directly on the command line.