Hi
I have a file containing some data as follows:
11-17-2010:13:26 64 4 516414 1392258
11-17-2010:13:26 128 4 586868 695603
11-17-2010:13:26 256 4 474937 1642294
11-17-2010:13:32 64 4 378715 1357066
11-17-2010:13:32 128 4 597981 1684006
11-17-2010:13:32 256 4 468922 1673001
11-17-2010:13:36 64 4 367817 1553394
11-17-2010:13:36 128 4 579270 1526043
11-17-2010:13:36 256 4 474098 1629830
11-17-2010:13:37 64 4 383585 1492919
11-17-2010:13:37 256 4 599585 1036624
I need to use awk to print out the values in column 4 as separate columns for each distict value in column 1 as follows:
516414 378715 367817 383585
586868 597981 579270 599585
474937 468922 474098
I tried to store the distinct values of column1 in a variable and use that in a loop to store the values in an array to be printed later, but I seem to be doing something majorly wrong.
a=`awk '{print $1}' 4.txt | sort -u`
j=0
for i in ${a[@]}
do
j=`expr $j + 1`
val[j]=`awk -v dat=\`echo $i\` '$1=i {print $4}' data.txt`; echo $j
done
Can someone help me?
Thanks
ctsgnb
November 18, 2010, 5:31pm
2
awk '{print$4}' infile | egrep -ve '^[:blank:]*$' | xargs -n4 echo
Thanks for the quick reply Cstgnb. I guess I was not too clear in what I needed. The script you provided converts the desired output column to a row. So instead of getting:
516414 378715 367817 383585
586868 597981 579270 599585
474937 468922 474098
I get:
516414 586868 474937 378715
597981 468922 367817 579270
474098 383585 402510 599585
385237 601330 484801 374488
574312 481109 392561 584313
Also I cannot use 'xargs -n4' as I need to print out all the values in column 4 (that have a matching column 1 value) as separate columns.
Thanks
---------- Post updated at 06:05 PM ---------- Previous update was at 05:46 PM ----------
Another way to look at it is to be able to print sets of values from the 4th column as a new columns if there exists a row of blank spaces.
cabrao
November 18, 2010, 6:19pm
4
Lame solution :o
$ for i in 1 5 9 13 2 6 10 14 3 7 11; do awk -v var=$i 'NR==var {print $4}' FILE; done | xargs -n4 echo
516414 378715 367817 383585
586868 597981 579270 599585
474937 468922 474098
Thanks Cabrao, but I cannot use NR to separate the values since the number of records will vary from time to time!!
danmero
November 18, 2010, 6:36pm
6
# cat file
11-17-2010:13:26 64 4 516414 1392258
11-17-2010:13:26 128 4 586868 695603
11-17-2010:13:26 256 4 474937 1642294
11-17-2010:13:32 64 4 378715 1357066
11-17-2010:13:32 128 4 597981 1684006
11-17-2010:13:32 256 4 468922 1673001
11-17-2010:13:36 64 4 367817 1553394
11-17-2010:13:36 128 4 579270 1526043
11-17-2010:13:36 256 4 474098 1629830
11-17-2010:13:37 64 4 383585 1492919
11-17-2010:13:37 256 4 599585 1036624
# awk '{_=NR%4}_{a[_]=((a[_])?a[_]FS:x)$4;l=(_>l)?_:l}END{for(i=0;++i<=l;)print a}' file
516414 378715 367817 383585
586868 597981 579270 599585
474937 468922 474098
Wow Danmero, you are very very close to the solution, against my data file I get :
516414 474098 402510 385237
586868 378715 599585 601330
474937 597981 367817 484801
So I get values from other rows in column 4 mixed in too. I don't quite understand your solution.
{_=NR%4}_{a[_]=((a[_])?a[_]FS:x)$4;l=(_>l)?_:l} What exactly is this doing?
I don't know if NR%4 will work since there can be a row of spaces after 3/4/5/6 rows of data
danmero
November 18, 2010, 6:52pm
8
Try this one and will see
awk 'END{for(i=0;++i<=l;)print a}!NF{l=(_>l)?_:l;_=0}NF{_++;a[_]=((a[_])?a[_]FS:x)$4}' file
rdcwayx
November 18, 2010, 7:01pm
9
Copy the testawk.awk from the solution: Row to column transpose
$ cat testawk.awk
{
for (i=1;i<=NF;i++)
{
arr[NR,i]=$i;
if(big <= NF)
big=NF;
}
}
END {
for(i=1;i<=big;i++)
{
for(j=1;j<=NR;j++)
{
printf("%s\t",arr[j,i]);
}
printf("\n");
}
}
awk 'BEGIN{RS=""}{for (i=4;i<=NF;i=i+5) printf $i FS;printf ORS}' infile |awk -f testawk.awk
516414 378715 367817 383585
586868 597981 579270 599585
474937 468922 474098
---------- Post updated at 11:01 AM ---------- Previous update was at 10:54 AM ----------
@
Because the number of records will vary from time to time!!
$ cat infile
11-17-2010:13:26 64 4 516414 1392258
11-17-2010:13:26 128 4 586868 695603
11-17-2010:13:26 256 4 474937 1642294
11-17-2010:13:32 64 4 378715 1357066
11-17-2010:13:32 128 4 597981 1684006
11-17-2010:13:32 128 4 597981 1684006
11-17-2010:13:32 256 4 468922 1673001
11-17-2010:13:32 256 4 468922 1673001
11-17-2010:13:36 64 4 367817 1553394
11-17-2010:13:36 128 4 579270 1526043
11-17-2010:13:36 256 4 474098 1629830
11-17-2010:13:37 64 4 383585 1492919
11-17-2010:13:37 256 4 599585 1036624
My code:
awk 'BEGIN{RS=""}{for (i=4;i<=NF;i=i+5) printf $i FS;printf ORS}' infile |awk -f testawk.awk
516414 378715 468922 367817 383585
586868 597981 579270 599585
474937 597981 474098
468922
Your code:
awk '!NF{l=(_>l)?_:l;_=0}NF{_++;a[_]=((a[_])?a[_]FS:x)$4}END{for(i=0;++i<=l;)print a}' infile
516414 378715 468922 367817 383585
586868 597981 579270 599585
474937 597981 474098
468922
You know the difference now.
danmero
November 18, 2010, 7:15pm
10
Yes .. but I don't get the point ... late in the night.
Another one ,
awk '{a[$2]=((a[$2])?a[$2] FS:x)$4;c=(c>$2)?c:$2}END{for(i=0;i++<=c;) if(a)print a}' file
Thanks a ton Danmero and rdcwayx for looking at this. Danmero, the columns are still getting messed up when I use your solution Thanks for the try though!!
Rdcwayx your solution is perfect!!!! Thank you!! I had 16 columns in my data file so I updated the loop accordingly. There is still a slight modification I need to do. Is there a way to add zeros for rows that have no value?
rdcwayx
November 18, 2010, 8:26pm
12
awk '
BEGIN{RS=""}
NR==FNR{max=(NF>max)?NF:max;next}
{for (i=4;i<=max;i=i+5) printf ($i=="")?"000000" FS:$i FS;printf ORS}' infile infile |awk -f testawk.awk
516414 378715 468922 367817 383585
586868 597981 000000 579270 599585
474937 597981 000000 474098 000000
000000 468922 000000 000000 000000
1 Like
Wohoooooooooo!!! It works it works!!! Thankyou rdcwayx. I have been struggling with this all day. I needed this data to be formatted this way so that it can be used to draw a wallchart using gnuplot.
Thankyou very very much everyone for your inputs!!!
ctsgnb
November 18, 2010, 8:47pm
14
n=$(awk '!/^$/{print$4}' infile | wc -l) ; awk '!/^$/{print$4}' infile | eval split -l $((1+$n/4)) - myF ; paste myF* | tee output
[ctsgnb@shell ~]$ n=$(awk '!/^$/{print$4}' infile | wc -l) ; awk '!/^$/{print$4}' infile | eval split -l $((1+$n/4)) - myF ; paste myF* | tee output
516414 378715 367817 383585
586868 597981 579270 599585
474937 468922 474098
[ctsgnb@shell ~]$ cat output
516414 378715 367817 383585
586868 597981 579270 599585
474937 468922 474098
[ctsgnb@shell ~]$ rm myF*
[ctsgnb@shell ~]$
danmero
November 18, 2010, 9:41pm
15
All in one call
# awk 'END{Y=(Y<y)?y:Y;m=1;while(m<x){for(n=0;n++<Y;){printf((a[m,n])?a[m,n]:z)FS}print((a[m,n])?a[m,n]:z);m++}}!NF{Y=(Y<y)?y:Y;x++;y=0}NF{a[++y,x]=$4}' x=1 z=000000 file
516414 378715 468922 367817 383585
586868 597981 000000 579270 599585
474937 597981 000000 474098 000000
000000 468922 000000 000000 000000
@rdcwayx you surprise me today
Or you could use Perl -
$
$
$ cat f41
11-17-2010:13:26 64 4 516414 1392258
11-17-2010:13:26 128 4 586868 695603
11-17-2010:13:26 256 4 474937 1642294
11-17-2010:13:32 64 4 378715 1357066
11-17-2010:13:32 128 4 597981 1684006
11-17-2010:13:32 256 4 468922 1673001
11-17-2010:13:36 64 4 367817 1553394
11-17-2010:13:36 128 4 579270 1526043
11-17-2010:13:36 256 4 474098 1629830
11-17-2010:13:37 64 4 383585 1492919
11-17-2010:13:37 256 4 599585 1036624
$
$
$
$ perl -ane 'BEGIN {@x=([])}
if (!/^\s*$/){
if ($. == 1 or $F[0] eq $prev) {push @{ $x[$#x] }, $F[3]}
else {push @x, [$F[3]]}
$prev = $F[0];
}
END {
for $i (0..$#x) {$max = $#{$x[$i]} > $max ? $#{$x[$i]} : $max}
for $i (0..$max) {
for $j (0..$#x) {print ${$x[$j]}[$i],"\t"}
print "\n";
}
}' f41
516414 378715 367817 383585
586868 597981 579270 599585
474937 468922 474098
$
$
$
tyler_durden
---------- Post updated at 10:19 PM ---------- Previous update was at 05:20 PM ----------
And a variant of the script for the same data file -
$
$
$ perl -ane '
if (!/^\s*$/){
if ($F[0] eq $prev) {push @{$x[$#x]},$F[3]}
else {push @x,[$F[3]]}
$max = $#{$x[$#x]} // $max;
$prev = $F[0];
}
END {
for $i (0..$max+1) {
for $j (0..$#x) {print ${$x[$j]}[$i],"\t"}
print "\n";
}
}' f4
516414 378715 367817 383585
586868 597981 579270 599585
474937 468922 474098
$
$
$
tyler_durden
ctsgnb
November 26, 2010, 5:38pm
17
awk '{print$4}' infile | egrep -ve '^[:blank:]*$' | pr -t -4 -
my $cnt;
my $max;
while(<DATA>){
my @tmp =split;
if(exists $hash{$tmp[0]}){
$hash{$tmp[0]}->{++$cnt}=$tmp[3];
}
else{
$max=($cnt>$max)?$cnt:$max;
$cnt=0;
$hash{$tmp[0]}->{0}=$tmp[3];
$hash{$tmp[0]}->{-1}=$.;
}
}
$max=($cnt>$max)?$cnt:$max;
for(my $i=0;$i<$max;$i++){
foreach my $key(sort {$hash{$a}->{-1} <=> $hash{$b}->{-1}} keys %hash){
print $hash{$key}->{$i}," ";
}
print "\n";
}
__DATA__
11-17-2010:13:26 64 4 516414 1392258
11-17-2010:13:26 128 4 586868 695603
11-17-2010:13:26 256 4 474937 1642294
11-17-2010:13:32 64 4 378715 1357066
11-17-2010:13:32 128 4 597981 1684006
11-17-2010:13:32 256 4 468922 1673001
11-17-2010:13:36 64 4 367817 1553394
11-17-2010:13:36 128 4 579270 1526043
11-17-2010:13:36 256 4 474098 1629830
11-17-2010:13:37 64 4 383585 1492919
11-17-2010:13:37 64 4 383585 1492919