Unique words in each line

Viernes · January 27, 2013, 6:46am

In each row there could be repetition of a word. I want to delete all repetitions and keep unique occurrences.

Example:

a+b+c ab+c ab+c
abbb+c ab+bbc a+bbbc
aaa aaa aaa

Output:

a+b+c ab+c 
abbb+c ab+bbc a+bbbc
aaa

elixir_sinari · January 27, 2013, 7:00am

One way:

awk 'function cleanarray (   i) {for(i in a) delete a}
function printarray (   i) {for(i in a) printf("%s ",i);print ""}
NF{cleanarray()
for(i=1;i<=NF;i++) a[$i]
printarray()}' infile

RudiC · January 27, 2013, 11:00am

Perfect solution if you don't care for the order of fields. If you do, it may disappoint you, as the order of elements supplied is not granted in a for (i in a) construct. Then you may want to try

awk '{delete a
      for (i=1; i<=NF; i++)
        {for (j=1; j<=i; j++) {if ($i == a[j]) break}
         if (j>i) a[++ix]=$i
        }
      for (i=1; i<=ix; i++) printf("%s ", a); print ""; ix=0
     }
    ' file

The delete array statement is NOT supported in all awks, fallback to previous proposal , then. This proposal is clumsier than the previous solution, but it keeps the order of fields, suppressing their later occurrences.

vgersh99 · January 27, 2013, 11:06am

if delete array is not supported, the work-around is split("",array)

Scrutinizer · January 27, 2013, 12:10pm

Or just:

awk '{for(i=1; i<NF; i++) for(j=i+1; j<=NF; j++) if($i==$j) $j=x; $0=$0; $1=$1 }1' file

RudiC · January 27, 2013, 1:03pm

Brilliant!
But - wouldn't an assignment to $j suffice to rebuild $0? So - why the $0 and $1 assignment? I tried this:

$ awk '{for(i=1; i<NF; i++) for(j=i+1; j<=NF; j++) if($i==$j) $j=x}1' file

and it's working, too.

Scrutinizer · January 27, 2013, 1:17pm

HI. that would work, yes but it would introduce excess whitespace. An extra recalculation of the fields ( $0=$0 ) first reduces the number of fields (if duplicates were removed), so that after that, by recalculating the record ( $1=$1 ) the excess whitespace gets removed...