Remove subsequent duplicate only

jamie_123 · February 22, 2013, 10:05am

Hi,

I've been trying to dig myself out of this, but nothing has worked out yet.

I have an input like this:

1-Num1
1-Num2
2-Num3
3-Num4
1-Num5
3-Num11
2-Num11
1-Num13
1-Num16
3-Num18
4-Num19
2-Num20
1-Num22
3-Num23
1-Num24

From this, I want to remove duplicates, not all, but only those that are just above the repeated value. In other words, retain the second repetition, but only if it follows the first occurrence. I want to run this comparison ignoring the values before -, but retaining them in the results.

Someone please help me out with this.

Thanks!

RudiC · February 22, 2013, 10:46am

Not sure I understand. Pls post desired output and the logics how it's derived.

jamie_123 · February 22, 2013, 11:20am

Hi,

Thanks for responding, here is a simplified case. Say I have this as input

1-num1
2-num2
3-num2
4-num3
5-num3
2-num2

Now what I want to do is not just find repetitions and remove them, but to find repetitions that are only in the next line and remove the first occurrence of that value. Repetitions are checked on $2 with FS as "-". So the output should be

1-num1
3-num2
4-num3
5-num3
2-num2

.

Yoda · February 22, 2013, 11:36am

Here is a not so elegant approach:

awk '{a[i++]=$0} END {for (j=i-1; j>=0;) print a[j--] }' filename | awk '!a[$1]++' | awk '{a[i++]=$0} END {for (j=i-1; j>=0;) print a[j--] }'

DGPickett · February 22, 2013, 11:58am

Ask sed to set a branch target, quit at EOF, read next line into buffer, if identical, reduce and go to target print the first line, remove it and branch to target.

sed '
:loop
$q
N
s/^\(.*\)\n\1$/\1/
t loop
P
s/.*\n//
t loop
'

rveri · February 22, 2013, 3:19pm

Hi Jamie,
check this out:

awk '!d[$0]++' file

1-num1
2-num2
3-num2
4-num3
5-num3

Yoda · February 22, 2013, 3:31pm

jamie_123 wants to remove first occurrence of the duplicate, not the second. Your code with remove the second and subsequent occurrences.

This is why we have to reverse the lines of the file first, then remove the duplicate and finally reverse the lines back.

DGPickett · February 22, 2013, 3:59pm

Less is more

uniq

Don_Cragun · February 22, 2013, 4:13pm

jamie_123:

Hi,

Thanks for responding, here is a simplified case. Say I have this as input
1-num1
2-num2
3-num2
4-num3
5-num3
2-num2
Now what I want to do is not just find repetitions and remove them, but to find repetitions that are only in the next line and remove the first occurrence of that value. Repetitions are checked on $2 with FS as "-". So the output should be
1-num1
3-num2
4-num3
5-num3
2-num2
.

I think I have a simpler awk script that does what you said you want, but I don't understand why 4-num3 appears in what you say the output should be. That is the 1st line in the input file that has num3 after the hyphen and there is another line later that contains num3 so I thought you wanted that line to be dropped from the output.

Try:

awk 'BEGIN{FS = OFS = "-"}
{       f1[NR] = $1
        c[f2[NR] = $2]++
}
END {   for(i = 1; i <= NR; i++)
                if(c[f2] > 1)
                        c[f2] = 1
                else    print f1, f2
}' input

produces the output:

1-num1
3-num2
5-num3
2-num2

when given the input:

1-num1
2-num2
3-num2
4-num3
5-num3
2-num2

DGPickett · February 22, 2013, 4:29pm

Requirements creep is everywhere! So, when you get to each line, search the rest of the file for a duplicate and if so drop it? N*(N-1)/2 reads? We could:

number the lines,
sort in descending line number
sort unique on line part (keeps ony the first of the key)
sort on line number
remove the line numbers.

sed '#' infile | sed '
  N
  s/\n/ /
 ' | sort -nr | sort -u +1 -2 | sort -n | sed '
  s/^[1-9][0-9]* //
 ' >out_file

Sort kept EDP alive in the bad old days.

rveri · February 22, 2013, 11:04pm

got it bipinajith thanks,
> This is why we have to reverse the lines of the file first, then remove the duplicate and finally reverse the lines back.

Here it is to remove 1st occurrence of the duplicate entries :

tac file|awk '!d[$0]++'|tac

1-num1
3-num2
4-num3
5-num3
2-num2

jamie_123 · February 23, 2013, 7:06am

Whoa...Thank you! U guys r gr8. I will start digging through these solutions..