Remove subsequent duplicate only

Hi,

I've been trying to dig myself out of this, but nothing has worked out yet.

I have an input like this:

1-Num1
1-Num2
2-Num3
3-Num4
1-Num5
3-Num11
2-Num11
1-Num13
1-Num16
3-Num18
4-Num19
2-Num20
1-Num22
3-Num23
1-Num24

From this, I want to remove duplicates, not all, but only those that are just above the repeated value. In other words, retain the second repetition, but only if it follows the first occurrence. I want to run this comparison ignoring the values before -, but retaining them in the results.

Someone please help me out with this.

Thanks!

Not sure I understand. Pls post desired output and the logics how it's derived.

Hi,

Thanks for responding, here is a simplified case. Say I have this as input

1-num1
2-num2
3-num2
4-num3
5-num3
2-num2

Now what I want to do is not just find repetitions and remove them, but to find repetitions that are only in the next line and remove the first occurrence of that value. Repetitions are checked on $2 with FS as "-". So the output should be

1-num1
3-num2
4-num3
5-num3
2-num2

.

Here is a not so elegant approach:

awk '{a[i++]=$0} END {for (j=i-1; j>=0;) print a[j--] }' filename | awk '!a[$1]++' | awk '{a[i++]=$0} END {for (j=i-1; j>=0;) print a[j--] }'
1 Like

Ask sed to set a branch target, quit at EOF, read next line into buffer, if identical, reduce and go to target print the first line, remove it and branch to target.

sed '
:loop
$q
N
s/^\(.*\)\n\1$/\1/
t loop
P
s/.*\n//
t loop
'

Hi Jamie,
check this out:

awk '!d[$0]++' file
1-num1
2-num2
3-num2
4-num3
5-num3

jamie_123 wants to remove first occurrence of the duplicate, not the second. Your code with remove the second and subsequent occurrences.

This is why we have to reverse the lines of the file first, then remove the duplicate and finally reverse the lines back.

Less is more

uniq

I think I have a simpler awk script that does what you said you want, but I don't understand why 4-num3 appears in what you say the output should be. That is the 1st line in the input file that has num3 after the hyphen and there is another line later that contains num3 so I thought you wanted that line to be dropped from the output.

Try:

awk 'BEGIN{FS = OFS = "-"}
{       f1[NR] = $1
        c[f2[NR] = $2]++
}
END {   for(i = 1; i <= NR; i++)
                if(c[f2] > 1)
                        c[f2] = 1
                else    print f1, f2
}' input

produces the output:

1-num1
3-num2
5-num3
2-num2

when given the input:

1-num1
2-num2
3-num2
4-num3
5-num3
2-num2

Requirements creep is everywhere! So, when you get to each line, search the rest of the file for a duplicate and if so drop it? N*(N-1)/2 reads? We could:

  1. number the lines,
  2. sort in descending line number
  3. sort unique on line part (keeps ony the first of the key)
  4. sort on line number
  5. remove the line numbers.
sed '#' infile | sed '
  N
  s/\n/ /
 ' | sort -nr | sort -u +1 -2 | sort -n | sed '
  s/^[1-9][0-9]* //
 ' >out_file

Sort kept EDP alive in the bad old days.

got it bipinajith thanks,
> This is why we have to reverse the lines of the file first, then remove the duplicate and finally reverse the lines back.

Here it is to remove 1st occurrence of the duplicate entries :

tac file|awk '!d[$0]++'|tac
1-num1
3-num2
4-num3
5-num3
2-num2
1 Like

Whoa...Thank you! U guys r gr8. I will start digging through these solutions.. :slight_smile: :slight_smile: :wink: