Removing Duplicate Lines per Section

petersf · January 11, 2010, 11:57am

Hello,
I am in need of removing duplicate lines from within a file per section.

File:

 
ABC1          012345 header
ABC2             7890-000
ABC3    012345 Header Table
ABC4
ABC5      593.0000     587.4800
ABC5     593.5000     587.6580  <= dup need to remove
ABC5     593.5000     587.6580
ABC5     594.0000     588.0971
ABC5     594.5000     588.5361
ABC1          67890 header
ABC2            1234-0001
ABC3      67890 Header Table
ABC4
ABC5     594.5000     588.5361  <= to keep in this section
ABC5     601.0000     594.1603
ABC5     601.5000     594.6121
ABC5     602.0000     595.0642
ABC5     602.0000     595.0642  <= dup need to remove
ABC1         345678 header

I need to remove duplicates from each section. Each section starts with ABC1 and all duplicates within that section need to be pushed into another file.

From my research within the forum, I have been working with the following command:
awk '/ABC1/ ( ABC1 = $2 ) !x[ABC1,$0]++' File

However, I must be doing something wrong because it is not removing the duplicates. What am I doing wrong or is there a better way?

I am currently using Bourne Shell (/bin/sh)

Thank you!

ahmad.diab · January 11, 2010, 12:09pm

use the curly ones {} not () :-

awk '/ABC1/{ ABC1 = $2 } !x[ABC1,$0]++'  File

o/p
ABC1 012345 header
ABC2 7890-000
ABC3 012345 Header Table
ABC4
ABC5 593.0000 587.4800
ABC5 593.5000 587.6580
ABC5 594.0000 588.0971
ABC5 594.5000 588.5361
ABC1 67890 header
ABC2 1234-0001
ABC3 67890 Header Table
ABC4
ABC5 594.5000 588.5361
ABC5 601.0000 594.1603
ABC5 601.5000 594.6121
ABC5 602.0000 595.0642
ABC1 345678 header

;);)

jim_mcnamara · January 11, 2010, 12:14pm

awk '{ if ($1=="ABC1") {delete arr}
         if ( !arr[$0]++) { print $0} } ' file > newfile

petersf · January 11, 2010, 1:43pm

Thank you both!

That solved the problem and gave me a new way of removing duplicates.

Appreciate the help!

rdcwayx · January 11, 2010, 7:08pm

If these duplicate lines are always next each other as in your sample, you can use "uniq" directly

uniq input.txt

Below command will export the duplicates to another file:

uniq -d input.txt > newfile

petersf · January 18, 2010, 10:04am

Thank You All - I got the script going!