Splitting data from one row as multiple columns

annazpereira · November 18, 2010, 5:21pm

Hi

I have a file containing some data as follows:

11-17-2010:13:26     64    4  516414 1392258  
11-17-2010:13:26     128   4  586868  695603  
11-17-2010:13:26     256   4  474937 1642294  

11-17-2010:13:32     64    4  378715 1357066  
11-17-2010:13:32     128   4  597981 1684006  
11-17-2010:13:32     256   4  468922 1673001  

11-17-2010:13:36     64    4  367817 1553394  
11-17-2010:13:36     128   4  579270 1526043  
11-17-2010:13:36     256   4  474098 1629830  

11-17-2010:13:37     64    4  383585 1492919  
11-17-2010:13:37     256   4  599585 1036624

I need to use awk to print out the values in column 4 as separate columns for each distict value in column 1 as follows:

516414  378715 367817 383585
586868  597981 579270 599585
474937  468922 474098

I tried to store the distinct values of column1 in a variable and use that in a loop to store the values in an array to be printed later, but I seem to be doing something majorly wrong.

a=`awk '{print $1}' 4.txt | sort -u`
j=0
for i in ${a[@]}
do 
j=`expr $j + 1`
val[j]=`awk -v dat=\`echo $i\` '$1=i {print $4}' data.txt`; echo $j
done

Can someone help me?
Thanks

ctsgnb · November 18, 2010, 5:31pm

awk '{print$4}' infile | egrep -ve '^[:blank:]*$' | xargs -n4 echo

annazpereira · November 18, 2010, 6:05pm

Thanks for the quick reply Cstgnb. I guess I was not too clear in what I needed. The script you provided converts the desired output column to a row. So instead of getting:
516414 378715 367817 383585
586868 597981 579270 599585
474937 468922 474098
I get:

516414 586868 474937 378715
597981 468922 367817 579270
474098 383585 402510 599585
385237 601330 484801 374488
574312 481109 392561 584313

Also I cannot use 'xargs -n4' as I need to print out all the values in column 4 (that have a matching column 1 value) as separate columns.

Thanks

---------- Post updated at 06:05 PM ---------- Previous update was at 05:46 PM ----------

Another way to look at it is to be able to print sets of values from the 4th column as a new columns if there exists a row of blank spaces.

cabrao · November 18, 2010, 6:19pm

Lame solution :o

$ for i in 1 5 9 13 2 6 10 14 3 7 11; do awk -v var=$i 'NR==var {print $4}' FILE; done | xargs -n4 echo
516414 378715 367817 383585
586868 597981 579270 599585
474937 468922 474098

annazpereira · November 18, 2010, 6:30pm

Thanks Cabrao, but I cannot use NR to separate the values since the number of records will vary from time to time!!

danmero · November 18, 2010, 6:36pm

# cat file
11-17-2010:13:26     64    4  516414 1392258
11-17-2010:13:26     128   4  586868  695603
11-17-2010:13:26     256   4  474937 1642294

11-17-2010:13:32     64    4  378715 1357066
11-17-2010:13:32     128   4  597981 1684006
11-17-2010:13:32     256   4  468922 1673001

11-17-2010:13:36     64    4  367817 1553394
11-17-2010:13:36     128   4  579270 1526043
11-17-2010:13:36     256   4  474098 1629830

11-17-2010:13:37     64    4  383585 1492919
11-17-2010:13:37     256   4  599585 1036624
# awk '{_=NR%4}_{a[_]=((a[_])?a[_]FS:x)$4;l=(_>l)?_:l}END{for(i=0;++i<=l;)print a}'  file
516414 378715 367817 383585
586868 597981 579270 599585
474937 468922 474098

annazpereira · November 18, 2010, 6:46pm

Wow Danmero, you are very very close to the solution, against my data file I get :

516414   474098 402510 385237
586868 378715   599585 601330
474937 597981 367817 484801

So I get values from other rows in column 4 mixed in too. I don't quite understand your solution.
{_=NR%4}_{a[_]=((a[_])?a[_]FS:x)$4;l=(_>l)?_:l} What exactly is this doing?

I don't know if NR%4 will work since there can be a row of spaces after 3/4/5/6 rows of data

danmero · November 18, 2010, 6:52pm

Try this one and will see

awk 'END{for(i=0;++i<=l;)print a}!NF{l=(_>l)?_:l;_=0}NF{_++;a[_]=((a[_])?a[_]FS:x)$4}'  file

rdcwayx · November 18, 2010, 7:01pm

Copy the testawk.awk from the solution: Row to column transpose

$ cat testawk.awk

{
for (i=1;i<=NF;i++)
{
 arr[NR,i]=$i;
 if(big <= NF)
  big=NF;
 }
}
 
END {
  for(i=1;i<=big;i++)
   {
    for(j=1;j<=NR;j++)
    {
     printf("%s\t",arr[j,i]);
    }
    printf("\n");
   }
}

awk 'BEGIN{RS=""}{for (i=4;i<=NF;i=i+5) printf $i FS;printf ORS}' infile  |awk -f testawk.awk

516414  378715  367817  383585
586868  597981  579270  599585
474937  468922  474098

---------- Post updated at 11:01 AM ---------- Previous update was at 10:54 AM ----------

@

Because the number of records will vary from time to time!!

$ cat infile
11-17-2010:13:26     64    4  516414 1392258
11-17-2010:13:26     128   4  586868  695603
11-17-2010:13:26     256   4  474937 1642294

11-17-2010:13:32     64    4  378715 1357066
11-17-2010:13:32     128   4  597981 1684006
11-17-2010:13:32     128   4  597981 1684006
11-17-2010:13:32     256   4  468922 1673001

11-17-2010:13:32     256   4  468922 1673001

11-17-2010:13:36     64    4  367817 1553394
11-17-2010:13:36     128   4  579270 1526043
11-17-2010:13:36     256   4  474098 1629830

11-17-2010:13:37     64    4  383585 1492919
11-17-2010:13:37     256   4  599585 1036624

My code:

awk 'BEGIN{RS=""}{for (i=4;i<=NF;i=i+5) printf $i FS;printf ORS}' infile  |awk -f testawk.awk

516414  378715  468922  367817  383585
586868  597981          579270  599585
474937  597981          474098
        468922

Your code:

awk '!NF{l=(_>l)?_:l;_=0}NF{_++;a[_]=((a[_])?a[_]FS:x)$4}END{for(i=0;++i<=l;)print a}' infile
516414 378715 468922 367817 383585
586868 597981 579270 599585
474937 597981 474098
468922

You know the difference now.

danmero · November 18, 2010, 7:15pm

Yes .. but I don't get the point ... late in the night.

Another one ,

awk '{a[$2]=((a[$2])?a[$2] FS:x)$4;c=(c>$2)?c:$2}END{for(i=0;i++<=c;) if(a)print a}' file

annazpereira · November 18, 2010, 7:29pm

Thanks a ton Danmero and rdcwayx for looking at this. Danmero, the columns are still getting messed up when I use your solution Thanks for the try though!!
Rdcwayx your solution is perfect!!!! Thank you!! I had 16 columns in my data file so I updated the loop accordingly. There is still a slight modification I need to do. Is there a way to add zeros for rows that have no value?

rdcwayx · November 18, 2010, 8:26pm

awk '
BEGIN{RS=""} 
NR==FNR{max=(NF>max)?NF:max;next} 
{for (i=4;i<=max;i=i+5) printf ($i=="")?"000000" FS:$i FS;printf ORS}' infile infile |awk -f testawk.awk

516414  378715  468922  367817  383585
586868  597981  000000  579270  599585
474937  597981  000000  474098  000000
000000  468922  000000  000000  000000

annazpereira · November 18, 2010, 8:32pm

Wohoooooooooo!!! It works it works!!! Thankyou rdcwayx. I have been struggling with this all day. I needed this data to be formatted this way so that it can be used to draw a wallchart using gnuplot.
Thankyou very very much everyone for your inputs!!!

ctsgnb · November 18, 2010, 8:47pm

n=$(awk '!/^$/{print$4}' infile | wc -l) ; awk '!/^$/{print$4}' infile | eval split -l $((1+$n/4)) - myF ; paste myF* | tee output

[ctsgnb@shell ~]$ n=$(awk '!/^$/{print$4}' infile | wc -l) ; awk '!/^$/{print$4}' infile | eval split -l $((1+$n/4)) - myF ; paste myF* | tee output
516414  378715  367817  383585
586868  597981  579270  599585
474937  468922  474098
[ctsgnb@shell ~]$ cat output
516414  378715  367817  383585
586868  597981  579270  599585
474937  468922  474098
[ctsgnb@shell ~]$ rm myF*
[ctsgnb@shell ~]$

danmero · November 18, 2010, 9:41pm

All in one call

# awk 'END{Y=(Y<y)?y:Y;m=1;while(m<x){for(n=0;n++<Y;){printf((a[m,n])?a[m,n]:z)FS}print((a[m,n])?a[m,n]:z);m++}}!NF{Y=(Y<y)?y:Y;x++;y=0}NF{a[++y,x]=$4}' x=1 z=000000 file
516414 378715 468922 367817 383585
586868 597981 000000 579270 599585
474937 597981 000000 474098 000000
000000 468922 000000 000000 000000

@rdcwayx you surprise me today

durden_tyler · November 19, 2010, 10:19pm

Or you could use Perl -

$
$
$ cat f41
11-17-2010:13:26     64    4  516414 1392258
11-17-2010:13:26     128   4  586868  695603
11-17-2010:13:26     256   4  474937 1642294
 
11-17-2010:13:32     64    4  378715 1357066
11-17-2010:13:32     128   4  597981 1684006
11-17-2010:13:32     256   4  468922 1673001
 
11-17-2010:13:36     64    4  367817 1553394
11-17-2010:13:36     128   4  579270 1526043
11-17-2010:13:36     256   4  474098 1629830
 
11-17-2010:13:37     64    4  383585 1492919
11-17-2010:13:37     256   4  599585 1036624
$
$
$
$ perl -ane 'BEGIN {@x=([])}
 if (!/^\s*$/){
   if ($. == 1 or $F[0] eq $prev) {push @{ $x[$#x] }, $F[3]}
   else {push @x, [$F[3]]}
   $prev = $F[0];
 }
 END {
   for $i (0..$#x) {$max = $#{$x[$i]} > $max ? $#{$x[$i]} : $max}
   for $i (0..$max) {
     for $j (0..$#x) {print ${$x[$j]}[$i],"\t"}
     print "\n";
   }
 }' f41
516414  378715  367817  383585
586868  597981  579270  599585
474937  468922  474098
$
$
$

tyler_durden

---------- Post updated at 10:19 PM ---------- Previous update was at 05:20 PM ----------

And a variant of the script for the same data file -

$
$
$ perl -ane '
 if (!/^\s*$/){
   if ($F[0] eq $prev) {push @{$x[$#x]},$F[3]}
   else {push @x,[$F[3]]}
   $max = $#{$x[$#x]} // $max;
   $prev = $F[0];
 }
 END {
   for $i (0..$max+1) {
     for $j (0..$#x) {print ${$x[$j]}[$i],"\t"}
     print "\n";
   }
 }' f4
516414  378715  367817  383585
586868  597981  579270  599585
474937  468922  474098
$
$
$

tyler_durden

ctsgnb · November 26, 2010, 5:38pm

awk '{print$4}' infile | egrep -ve '^[:blank:]*$' | pr -t -4 -

summer_cherry · November 30, 2010, 4:51am

my $cnt;
my $max;
while(<DATA>){
  my @tmp  =split;
  if(exists $hash{$tmp[0]}){
     $hash{$tmp[0]}->{++$cnt}=$tmp[3];
  }
  else{
    $max=($cnt>$max)?$cnt:$max;
    $cnt=0;
    $hash{$tmp[0]}->{0}=$tmp[3];
    $hash{$tmp[0]}->{-1}=$.;
  }
}
$max=($cnt>$max)?$cnt:$max;
for(my $i=0;$i<$max;$i++){
foreach my $key(sort {$hash{$a}->{-1} <=> $hash{$b}->{-1}} keys %hash){
 print $hash{$key}->{$i}," ";
}
print "\n";
}
__DATA__
11-17-2010:13:26     64    4  516414 1392258
11-17-2010:13:26     128   4  586868  695603
11-17-2010:13:26     256   4  474937 1642294

11-17-2010:13:32     64    4  378715 1357066
11-17-2010:13:32     128   4  597981 1684006
11-17-2010:13:32     256   4  468922 1673001

11-17-2010:13:36     64    4  367817 1553394
11-17-2010:13:36     128   4  579270 1526043
11-17-2010:13:36     256   4  474098 1629830

11-17-2010:13:37     64    4  383585 1492919
11-17-2010:13:37     64    4  383585 1492919