Question about File Processing

inditopgun · May 3, 2012, 2:29am

I have a file with the format

CKF,23G
ckf,234M
CKF,2356K
DFK,4589M
DFK,343K
dfk,3434M
DFK,34G
DFK,34343M,
DFK,3476G
FGK,34k
KLK,43G
KLK,3G

I would like to group by the 3-letter code in the beginning of the file and sum up the second column in giga bytes and output a file like the following :

CKF,<sum_total_of_all_rows_in_gigabytes>G
DFK,<sum_total_of_all_rows_in_gigabytes>G
FGK,<sum_total_of_all_rows_in_gigabytes>G
KLK,<sum_total_of_all_rows_in_gigabytes>G

To calculate in giga bytes, the size mentioned on each row needs to be handled as follows :
if suffix is G, then do nothing, simply add to running_total
if suffix is M or m, then divide by 1024 and add to running_total
if suffix is K o k then divide by (1024*1024) and add to running_total

itkamaraj · May 3, 2012, 3:04am

 
$ nawk -F, '/[mM]$/{size=$2+0;size=$2/1024}/[kK]$/{size=$2+0;size=$2/(1024*1024)}/[gG]$/{size=$2+0} {a[toupper($1)]+=size;next}END{for(i in a){print i,a}}' test.txt
KLK 46
CKF 0.230762
DFK 3551.84
KF 23
FGK 3.24249e-05

Scrutinizer · May 3, 2012, 5:02am

awk -F, '{m=1; $0=toupper($0)} $2~/M$/{m=2**10} $2~/K$/{m=2**20} {S[$1]+=$2/m} END{for(i in S){print i,S}}' OFS=, infile

balajesuri · May 3, 2012, 6:28am

perl -F, -ane '
if ($F[1] =~ /M$/i) { chop($F[1]); $F[1]/=1024; $x{uc($F[0])} += $F[1]; }
elsif ($F[1] =~ /K$/i) { chop($F[1]); $F[1]/=(1024*1024); $x{uc($F[0])} += $F[1]; }
else { chop($F[1]); $x{uc($F[0])} += $F[1]; }
END { printf "%s,%.4fG\n", $_, $x{$_} for (keys %x) }' input

KLK,46.0000G
CKF,23.2308G
DFK,3551.3734G
FGK,0.0000G