In each row there could be repetition of a word. I want to delete all repetitions and keep unique occurrences.
Example:
a+b+c ab+c ab+c
abbb+c ab+bbc a+bbbc
aaa aaa aaa
Output:
a+b+c ab+c
abbb+c ab+bbc a+bbbc
aaa
In each row there could be repetition of a word. I want to delete all repetitions and keep unique occurrences.
Example:
a+b+c ab+c ab+c
abbb+c ab+bbc a+bbbc
aaa aaa aaa
Output:
a+b+c ab+c
abbb+c ab+bbc a+bbbc
aaa
One way:
awk 'function cleanarray ( i) {for(i in a) delete a}
function printarray ( i) {for(i in a) printf("%s ",i);print ""}
NF{cleanarray()
for(i=1;i<=NF;i++) a[$i]
printarray()}' infile
Perfect solution if you don't care for the order of fields. If you do, it may disappoint you, as the order of elements supplied is not granted in a for (i in a)
construct. Then you may want to try
awk '{delete a
for (i=1; i<=NF; i++)
{for (j=1; j<=i; j++) {if ($i == a[j]) break}
if (j>i) a[++ix]=$i
}
for (i=1; i<=ix; i++) printf("%s ", a); print ""; ix=0
}
' file
The delete array
statement is NOT supported in all awks, fallback to previous proposal , then. This proposal is clumsier than the previous solution, but it keeps the order of fields, suppressing their later occurrences.
if delete array
is not supported, the work-around is split("",array)
Or just:
awk '{for(i=1; i<NF; i++) for(j=i+1; j<=NF; j++) if($i==$j) $j=x; $0=$0; $1=$1 }1' file
Brilliant!
But - wouldn't an assignment to $j suffice to rebuild $0? So - why the $0 and $1 assignment? I tried this:
$ awk '{for(i=1; i<NF; i++) for(j=i+1; j<=NF; j++) if($i==$j) $j=x}1' file
and it's working, too.
HI. that would work, yes but it would introduce excess whitespace. An extra recalculation of the fields ( $0=$0
) first reduces the number of fields (if duplicates were removed), so that after that, by recalculating the record ( $1=$1
) the excess whitespace gets removed...