I have a file with one column. There are a few replicas in this column, that is some lines look exactly the same. I want to know the ones that occur twice.
awk arrays are associative - they hash aray indexes.
The syntax says: add one to the array element indexed zero.
But, since the ++ is after the arr[] it means evaluate the value of arr[] before you add one.
So - if arr[ $0 ] is one -- meaning it has been seen before - print $0 because it is a duplicate, then add one to arr[ $0 ]. Now: arr[ $0 ] == 2 so we never print it again no matter how many times it appears.
in Awk uninitialized variables have value zero (or null, depending on the context)
the '++' operator is used for adding one, it can be used to increment a variable either before or after taking its value.
Consider this:
$ print 'one
two
two
three
three
three'|awk '{printf "$0 is %s, first x[$0] is %s ,",$0,x[$0]++}{print "then x[$0] is",x[$0]}'
$0 is one, first x[$0] is 0 ,then x[$0] is 1
$0 is two, first x[$0] is 0 ,then x[$0] is 1
$0 is two, first x[$0] is 1 ,then x[$0] is 2
$0 is three, first x[$0] is 0 ,then x[$0] is 1
$0 is three, first x[$0] is 1 ,then x[$0] is 2
$0 is three, first x[$0] is 2 ,then x[$0] is 3
So now it should be easier to understand, this:
$ print 'one
two
two
three
three
three'|awk 'x[$0]++==1'
two
three
... and this:
$ print 'one
two
two
three
three
three'|awk '++x[$0]==2'
two
three