Diya123
December 11, 2013, 2:50pm
1
Hi,
I want to write a script which will add the entries in all columns for the same column id. I can do it in excel, but I need to do this for 384 columns which will come down to 96 (384/4). How can I do this iteratively
A A A A B B B B C C C C
1 0 1 0 2 1 4 5 3 4 5 6
2 0 0 2 3 5 70 100 1 0 90 99
5 6 7 8 1 2 3 2 3 5 5 3
output
A B C
2 12 18
4 208 190
26 8 16
Thanks,
Yoda
December 11, 2013, 3:33pm
2
An awk approach:
awk '
NR == 1 {
for ( i = 1; i <= NF; i++ )
{
A = $i
H[$i]
}
for ( k in H )
printf "%s ", k
printf "\n"
}
NR > 1 {
for ( i = 1; i <= NF; i++ )
R[A] += $i
for ( k in R )
{
printf "%s ", R[k]
R[k] = 0
}
printf "\n"
}
' file
1 Like
Here is a solution in awk:
awk 'NR==1 {
for(i=1;i<=NF;i++) C[$i]=C[$i]" "i
for(col in C) printf col" "
printf "\n"
next
}
{for(col in C) {
T=0
for(i=split(C[col],v);i;i--) T+=$v
printf T" "
}
printf "\n"
} '
Diya123
December 11, 2013, 3:51pm
4
Thanks Yoda. It worked but the output is not tab delimited and also it changed the columns order. Can I get it in tab delimited format?
---------- Post updated at 12:51 PM ---------- Previous update was at 12:50 PM ----------
Thanks Chubler_XL. I tried running it using sh script.sh file.txt >out.txt. Its taking forever to complete.
I'm a little concerned with your solution Yoda.
Is it guaranteed that the totals printed are going to match the headings, i.e. will for (k in H)
be in the same order as for (k in R)
?
Also consider the following input:
A A A A B B B B C C C C
1 0 1 0 2 1 4 5 3 4 5 6
2 0 0 2 3 5 70 100 1 0 90 99
5 6 7 8 1 2 3 2 3 5 5 3
5 6 7 8
---------- Post updated at 07:01 AM ---------- Previous update was at 06:53 AM ----------
Reason my original solution took so long was I forgot to specify the input file!
This adjustment keeps same column order as original file
awk 'NR==1 {
for(i=1;i<=NF;i++) {
G=$i
if (!($i in C)) {
C[$i];
H[++cols]=$i
}
}
for(i=1;i<=cols;i++) printf "%s\t", H
printf "\n"
next
}
{
for(i=1;i<=NF;i++) T[G]+=$i
for(i=1;i<=cols;i++) {
printf "%s\t",0+T[H]
T=0
}
printf "\n"
} ' infile
1 Like
Diya123
December 11, 2013, 4:08pm
6
Thanks Chubler_XL. It worked as I wanted.(did not change the order and the output is tab delimited). thanks again.
With Perl:
perl -lane'
$" = "\t";
do {
@f = @F;
$t{$_}++ or push @o, $_ for @F;
print "@o";
next
} if $. == 1;
$s{$f[$_]} += $F[$_] for 0..$#F;
print "@{[map $s{$_}, @o]}";
%s = ();
' infile
diya123:
Hi,
I want to write a script which will add the entries in all columns for the same column id. I can do it in excel, but I need to do this for 384 columns which will come down to 96 (384/4). How can I do this iteratively
A A A A B B B B C C C C
1 0 1 0 2 1 4 5 3 4 5 6
2 0 0 2 3 5 70 100 1 0 90 99
5 6 7 8 1 2 3 2 3 5 5 3
output
A B C
2 12 18
4 208 190
26 8 16
Thanks,
@ Chubler_XL : your code in
#5
I am getting following output
[akshay@aix tmp]$ cat 1
A A A A B B B B C C C C
1 0 1 0 2 1 4 5 3 4 5 6
2 0 0 2 3 5 70 100 1 0 90 99
5 6 7 8 1 2 3 2 3 5 5 3
[akshay@aix tmp]$ awk 'NR==1 {
for(i=1;i<=NF;i++) {
G=$i
if (!($i in C)) {
C[$i];
H[++cols]=$i
}
}
for(i=1;i<=cols;i++) printf "%s\t", H
printf "\n"
next
}
{
for(i=1;i<=NF;i++) T[G]+=$i
for(i=1;i<=cols;i++) {
printf "%s\t",0+T[H]
T=0
}
printf "\n"
} ' 1
A B C
2 12 18
6 190 208
32 198 224
Will try to reduce number of loop here....
@Diya123 : Try below code
awk 'NR==1{
for(i=1;i<=NF;i++){
A[$i]++
if(A[$i]==1){
B=$i
header = header ? header OFS $i : $i
}
}
print header
}
NR>1{
for(i=1;i<=NF;i++){
tmp += $i
if(B[i+1]){
printf tmp OFS;tmp=""
}
}
printf tmp RS
tmp = 0
}
' OFS=\\t file
A B C
2 12 18
4 178 190
26 8 16