From this, I want to remove duplicates, not all, but only those that are just above the repeated value. In other words, retain the second repetition, but only if it follows the first occurrence. I want to run this comparison ignoring the values before -, but retaining them in the results.
Thanks for responding, here is a simplified case. Say I have this as input
1-num1
2-num2
3-num2
4-num3
5-num3
2-num2
Now what I want to do is not just find repetitions and remove them, but to find repetitions that are only in the next line and remove the first occurrence of that value. Repetitions are checked on $2 with FS as "-". So the output should be
Ask sed to set a branch target, quit at EOF, read next line into buffer, if identical, reduce and go to target print the first line, remove it and branch to target.
sed '
:loop
$q
N
s/^\(.*\)\n\1$/\1/
t loop
P
s/.*\n//
t loop
'
I think I have a simpler awk script that does what you said you want, but I don't understand why 4-num3 appears in what you say the output should be. That is the 1st line in the input file that has num3 after the hyphen and there is another line later that contains num3 so I thought you wanted that line to be dropped from the output.
Requirements creep is everywhere! So, when you get to each line, search the rest of the file for a duplicate and if so drop it? N*(N-1)/2 reads? We could:
number the lines,
sort in descending line number
sort unique on line part (keeps ony the first of the key)
sort on line number
remove the line numbers.
sed '#' infile | sed '
N
s/\n/ /
' | sort -nr | sort -u +1 -2 | sort -n | sed '
s/^[1-9][0-9]* //
' >out_file