Thanks for the assistance. That works wonderfully. May I ask for some further guidance breaking down the command so I may understand it?
Are we outputting if the current line does not equal the immediately preceding 2 lines (LAST1 || LAST2) and then incrementing the lines for the next iteration?
By extension, this also deals with the type 1 duplicates yes?
Here is my effort to translate using input
Lines of input are enumerated to for easier grasp and are not in actual file / input the program is processing.
#
# Condition construct is met on line 1
# LAST2 is empty, LAST1 is defined as current processing line, or $0
#
1 a
#
# Condition construct is met on line 2
# LAST2 is defined as LAST1 (previous line), LAST1 as current processing line, or $0
# We do that till line 6, since condition is met, replacing the values of LAST1 / LAST2 accordingly.
#
2 b
3 c
4 d
5 e
6 f
#
# In this moment, on line 7, value of LAST1 is "f", while LAST2 is "e".
# Condition construct is not met for lines 7 to 10.
# LAST1/LAST2 do not change, nor those lines will be in output
#
7 e
8 f
9 e
10 f
#
# On line 11 LAST1 or LAST2 condition construct is met again.
# LAST2 is declared as "f", and LAST1 as "a" or $0 or current processing line
# The program continues to operate as above.
#
11 a
12 b
13 c
14 d
15 e
16 f
The solution is a lookup buffer of two, implemented by the two variables LAST1 and LAST2.
The following has a configurable buffer depth
awk '
{
# preset: print
prt=1
# dont print if found in buf
for (i=1; i<=d; i++) if (buf[i%d]==$0) {
prt=0
break
}
if (prt==1) print $0
buf[NR%d]=$0
}
' d=2 file
With d=1 it will detect the repetition d d but not the e f e f
With d=3 it would also detect a repetition g h i g h i ...