Duplicates to be removed

prvnrk · July 10, 2008, 10:45am

Hi,

I have a text file with 2000 rows and 2000 columns (number of columns might vary from row to row) and "comma" is the delimiter.

In every row, there maybe few duplicates and we need to remove those duplicates and "shift left" the consequent values.

ex:

111 222 111 555
444 999 666 999 777

o/p must be like below:
111 222 555
444 999 666 777

TIA
Prvn

radoulov · July 10, 2008, 11:07am

Use nawk or /usr/xpg4/bin/awk on Solaris:

awk -F, '{
  for (f=1; f<=NF; f++)
    if (!_[$f]++)
      printf $f (f != NF ? FS : RS)
  split("", _)  
      }' input

With GNU Awk you can use delete _ instead of split.

vgersh99 · July 10, 2008, 11:11am

nawk -f prv.awk myFile.txt

prv.awk:

{
  for(i=1; i<=NF; i++) {
    if ($i in arr) continue
    printf("%s%s", $i, OFS)
    arr[$i]
  }
  printf ORS
  split("", arr)
}

prvnrk · July 10, 2008, 11:19am

Thanks for your replies.

Vgersh - Your solution worked (as space is the delimiter). I'm sorry that i did not use "comma" in the example. Actually the delimiter is "comma" as mentioned in the post.

Please advise.

Prvn

radoulov · July 10, 2008, 11:31am

With Perl:

perl -F, -lane'$, = ","; 
  print grep !$_{$_}++, @F;   
    undef %_' input

vgersh99 · July 10, 2008, 11:36am

use radoulov's code

prvnrk · July 10, 2008, 1:06pm

Thanks radoulov,

Your awk solution worked great!

Prvn