I have a file with 3 columns, ID1, ID2, Text:
ID1; ID2; Text
1; X; aa
1; X; bb
1; Y; cc
1; Y; dd
2; X; ee
2; X; ff
2; Y; gg
2; Y; hh
2; Z; ii
2; Z; jj
Need o/p as:
ID; X; Y; Z
1; aa, bb; cc, dd;
2; ee, ff; gg, hh; ii, jj
[/FONT]
Like concatenation of Text column data and groupby based on ID.....
awk -F\; 'END {
print idc, id2c
for (i = 0; ++i <= i1;) {
printf "%s", id1 OFS
for (j = 0; ++j <= i2;)
printf "%s", (key[id1, id2[j]] (j < i2 ? OFS : RS))
}
}
NR == 1 { idc = $1; next }
{
key[$1, $2] = key[$1, $2] ? key[$1, $2] sep $3 : $3
_id1[$1]++ || id1[++i1] = $1
if (!_id2[$2]++) {
id2[++i2] = $2
id2c = id2c ? id2c OFS $2 : $2
}
}' sep=, OFS=\; infile
If you have GNU awk the code could be shorter
Error:
$ awk -F\; 'END {
> print idc, id2c
> for (i = 0; ++i <= i1;) {
> printf "%s", id1 OFS
> for (j = 0; ++j <= i2;)
> printf "%s", (key[id1, id2[j]] (j < i2 ? OFS : RS))
> }
}
> NR == 1 { idc = $1; next }
> }
> NR == 1 { idc = $1; next }
> {
> key[$1, $2] = key[$1, $2] ? key[$1, $2] sep $3 : $3
> _id1[$1]++ || id1[++i1] = $1
if (!_id2[$2]++) {
> if (!_id2[$2]++) {
> id2[++i2] = $2
> id2c = id2c ? id2c OFS $2 : $2
> }
> }' sep=, OFS=\; test.file
awk: syntax error near line 6
awk: illegal statement near line 6
awk: syntax error near line 9
awk: bailing out near line 9
$
Try nawk, instead of awk:
nawk -F\; 'END { ...
Its worked...but not exact o/p.
O/p:
$ nawk -F\; 'END {
> print idc, id2c
for (i = 0; ++i <= i1;) {
> for (i = 0; ++i <= i1;) {
> printf "%s", id1 OFS
> for (j = 0; ++j <= i2;)
> printf "%s", (key[id1, id2[j]] (j < i2 ? OFS : RS))
> }
> }
> NR == 1 { idc = $1; next }
> {
> key[$1, $2] = key[$1, $2] ? key[$1, $2] sep $3 : $3
> _id1[$1]++ || id1[++i1] = $1
> if (!_id2[$2]++) {
> id2[++i2] = $2
> id2c = id2c ? id2c OFS $2 : $2
> }
> }' sep=, OFS=\; test.file
ID1; X; Y; Z;
1; aa, bb; cc, dd;;
2; ee, ff; gg, hh; ii, jj;
;;;;
$
Last line have ;;;;
How to get rid of this??
Do you have an empty line at the end of the file? Try this:
awk -F\; 'END {
print idc, id2c
for (i = 0; ++i <= i1;) {
printf "%s", id1 OFS
for (j = 0; ++j <= i2;)
printf "%s", (key[id1, id2[j]] (j < i2 ? OFS : RS))
}
}
NR == 1 { idc = $1; next }
NF {
key[$1, $2] = key[$1, $2] ? key[$1, $2] sep $3 : $3
_id1[$1]++ || id1[++i1] = $1
if (!_id2[$2]++) {
id2[++i2] = $2
id2c = id2c ? id2c OFS $2 : $2
}
}' sep=, OFS=\; infile
1 Like
Many thanks....
Yah it have an empty row at the end.
If possible, can u explain me the code....
---------- Post updated at 10:00 PM ---------- Previous update was at 09:43 PM ----------
But if I increase the i/p rows same empty semicolns are appearing......
ID1; ID2; Text
1; X; aa
1; X; bb
1; Y; cc
1; Y; dd
2; X; ee
2; X; ff
2; Y; gg
2; Y; hh
2; Z; ii
2; Z; jj
3; w; ll
3; w; mm
3; v; nn
3; u; oo
o/p:
ID1; X; Y; Z; w; v; u
1; aa, bb; cc, dd;;;;
2; ee, ff; gg, hh; ii, jj;;;
3;;;; ll, mm; nn; oo
What should be the output, based on this input?
X, Y and Z are not present for id1 3, so they are reported as empty fields.