I have a file with many sections in it. Each section is separated by a blank line.
The first line of each section would determine if the section is duplicate or not.
if the section is duplicate then remove the entire section from the file.
below is the example of input and output. Wherein, the lines starting with *& is the first line and there are 2 sections with the same first line. I need to delete one of them.
Thanks for your help. your code worked fine. I had already tried similar code but the difference was I didn't set RS, and instead of A[$0] =1 I assigned A[$0]=$0 and the array was getting jumbled up. Do you know the reason?
Rudic - I dont quite understand this code. can you please help me understand?
awk '/^\*\&/ {STOP=($0 in T) # if header (identified by *&) is known, stop the printing
T[$0] # remember the header line next time
}
/^ *$/ {STOP=0} # empty line: reenable printing
!STOP # use default action: print, if NOT STOPped
' file
By default, Record Separator is one '\n' that represent end of line, if RS is set to '\n\n', for awk, one record (line) is terminate by '\n\n'.
With this way, one line is one section.