Help with File processing

karumudi7 · October 10, 2011, 10:04am

I have a file with 3 columns, ID1, ID2, Text:

ID1; ID2; Text
1; X; aa
1; X; bb
1; Y; cc
1; Y; dd
2; X; ee
2; X; ff
2; Y; gg
2; Y; hh
2; Z; ii
2; Z; jj

Need o/p as:

ID; X; Y; Z
1; aa, bb; cc, dd; 
2; ee, ff; gg, hh; ii, jj

[/FONT]

Like concatenation of Text column data and groupby based on ID.....

radoulov · October 10, 2011, 10:32am

awk -F\; 'END {
  print idc, id2c
  for (i = 0; ++i <= i1;) {
    printf "%s", id1 OFS    
    for (j = 0; ++j <= i2;)
      printf "%s", (key[id1, id2[j]] (j < i2 ? OFS : RS))
      }
  }
NR == 1 { idc = $1; next }  
{
  key[$1, $2] = key[$1, $2] ? key[$1, $2] sep $3 : $3
  _id1[$1]++ || id1[++i1] = $1 
  if (!_id2[$2]++) {
    id2[++i2] = $2
    id2c = id2c ? id2c OFS $2 : $2
    }    
  }' sep=, OFS=\; infile

If you have GNU awk the code could be shorter

karumudi7 · October 10, 2011, 10:42am

Error:

$  awk -F\; 'END {
>   print idc, id2c
>   for (i = 0; ++i <= i1;) {
>     printf "%s", id1 OFS
>     for (j = 0; ++j <= i2;)
>       printf "%s", (key[id1, id2[j]] (j < i2 ? OFS : RS))
>       }
  }
> NR == 1 { idc = $1; next }
>   }
> NR == 1 { idc = $1; next }
> {
>   key[$1, $2] = key[$1, $2] ? key[$1, $2] sep $3 : $3
>   _id1[$1]++ || id1[++i1] = $1
  if (!_id2[$2]++) {
>   if (!_id2[$2]++) {
>     id2[++i2] = $2
>     id2c = id2c ? id2c OFS $2 : $2
>     }
>   }' sep=, OFS=\; test.file

awk: syntax error near line 6
awk: illegal statement near line 6
awk: syntax error near line 9
awk: bailing out near line 9

 $

radoulov · October 10, 2011, 10:47am

Try nawk, instead of awk:

nawk -F\; 'END { ...

karumudi7 · October 10, 2011, 11:00am

Its worked...but not exact o/p.
O/p:

$  nawk -F\; 'END {
>   print idc, id2c
  for (i = 0; ++i <= i1;) {
>   for (i = 0; ++i <= i1;) {
>     printf "%s", id1 OFS
>     for (j = 0; ++j <= i2;)
>       printf "%s", (key[id1, id2[j]] (j < i2 ? OFS : RS))
>       }
>   }
> NR == 1 { idc = $1; next }
> {
>   key[$1, $2] = key[$1, $2] ? key[$1, $2] sep $3 : $3
>   _id1[$1]++ || id1[++i1] = $1
>   if (!_id2[$2]++) {
>     id2[++i2] = $2
>     id2c = id2c ? id2c OFS $2 : $2
>     }
>   }' sep=, OFS=\; test.file
ID1; X; Y; Z;
1; aa, bb; cc, dd;;
2; ee, ff; gg, hh; ii, jj;
;;;;
 
$

Last line have ;;;;
How to get rid of this??

radoulov · October 10, 2011, 11:05am

Do you have an empty line at the end of the file? Try this:

awk -F\; 'END {
  print idc, id2c
  for (i = 0; ++i <= i1;) {
    printf "%s", id1 OFS    
    for (j = 0; ++j <= i2;)
      printf "%s", (key[id1, id2[j]] (j < i2 ? OFS : RS))
      }
  }
NR == 1 { idc = $1; next }  
NF {
  key[$1, $2] = key[$1, $2] ? key[$1, $2] sep $3 : $3
  _id1[$1]++ || id1[++i1] = $1 
  if (!_id2[$2]++) {
    id2[++i2] = $2
    id2c = id2c ? id2c OFS $2 : $2
    }    
  }' sep=, OFS=\; infile

karumudi7 · October 10, 2011, 1:00pm

Many thanks....
Yah it have an empty row at the end.

If possible, can u explain me the code....

---------- Post updated at 10:00 PM ---------- Previous update was at 09:43 PM ----------

But if I increase the i/p rows same empty semicolns are appearing......

ID1; ID2; Text
1; X; aa
1; X; bb
1; Y; cc
1; Y; dd
2; X; ee
2; X; ff
2; Y; gg
2; Y; hh
2; Z; ii
2; Z; jj
3; w; ll
3; w; mm
3; v; nn
3; u; oo

o/p:

ID1; X; Y; Z; w; v; u
1; aa, bb; cc, dd;;;;
2; ee, ff; gg, hh; ii, jj;;;
3;;;; ll, mm; nn; oo

radoulov · October 10, 2011, 3:34pm

What should be the output, based on this input?
X, Y and Z are not present for id1 3, so they are reported as empty fields.