Remove sections based on duplicate first line

Hi,

I have a file with many sections in it. Each section is separated by a blank line.
The first line of each section would determine if the section is duplicate or not.
if the section is duplicate then remove the entire section from the file.

below is the example of input and output. Wherein, the lines starting with *& is the first line and there are 2 sections with the same first line. I need to delete one of them.

Input:
*& abc def
1
2
3
4
5

*& cde efg
1
2
3

*& abc def
1
2
3
4
5
Output:
*& cde efg
1
2
3

*& abc def
1
2
3
4
5

Thanks for your help!!

Hello,
If order out of sections is not important, with (gnu) awk:

awk 'BEGIN{RS='\n\n'};{A[$0]=1};END{for (h in A) print h,"\n"}' file

Regards.

1 Like

That works if DOS <CR> line terminators are removed from the input file. Try also

awk '/^\*\&/ {STOP=($0 in T); T[$0]} /^ *$/ {STOP=0} !STOP' file
1 Like

Thanks for your help. your code worked fine. I had already tried similar code but the difference was I didn't set RS, and instead of A[$0] =1 I assigned A[$0]=$0 and the array was getting jumbled up. Do you know the reason?

Rudic - I dont quite understand this code. can you please help me understand?

awk '/^\*\&/ {STOP=($0 in T); T[$0]} /^ *$/ {STOP=0} !STOP' file4

Thank you both for your help!!

awk '/^\*\&/ {STOP=($0 in T)            # if header (identified by *&) is known, stop the printing
              T[$0]                     # remember the header line next time
             } 
     /^ *$/  {STOP=0}                   # empty line: reenable printing
     !STOP                              # use default action: print, if NOT STOPped
    ' file

By default, Record Separator is one '\n' that represent end of line, if RS is set to '\n\n', for awk, one record (line) is terminate by '\n\n'.
With this way, one line is one section.

Regards.