awk processing of variable number of fields data file

radudownload · November 15, 2013, 10:08am

Hy!

I need to post-process some data files which have variable (and periodic) number of fields. For example, I need to square (data -> data*data) the folowing data file:

 -5.34281E-28 -3.69822E-29  8.19128E-29  9.55444E-29  8.16494E-29  6.23125E-29
  4.42106E-29  2.94592E-29  1.84841E-29  1.09271E-29  6.08599E-30  3.19287E-30
  1.57732E-30  7.33449E-31  3.20866E-31  1.31982E-31  5.10059E-32  1.85021E-32
  6.29190E-33  2.00262E-33  5.95280E-34  1.64748E-34
 -5.34281E-28 -3.69822E-29  8.19128E-29  9.55444E-29  8.16494E-29  6.23125E-29
  4.42106E-29  2.94592E-29  1.84841E-29  1.09271E-29  6.08599E-30  3.19287E-30
  1.57732E-30  7.33449E-31  3.20866E-31  1.31982E-31  5.10059E-32  1.85021E-32
  6.29190E-33  2.00262E-33  5.95280E-34  1.64748E-34
 -5.34281E-28 -3.69822E-29  8.19128E-29  9.55444E-29  8.16494E-29  6.23125E-29
  4.42106E-29  2.94592E-29  1.84841E-29  1.09271E-29  6.08599E-30  3.19287E-30
  1.57732E-30  7.33449E-31  3.20866E-31  1.31982E-31  5.10059E-32  1.85021E-32
  6.29190E-33  2.00262E-33  5.95280E-34  1.64748E-34

The data is fitted into 6 columns, but depending on the requested precission, one might found he/she has also some lines with less number of columns.

For the processing, some newbie knowledge of awk suffice. In my case I use something like

awk '{printf("%12.8G %12.8G %12.8G %12.8G %12.8G %12.8G\n", $1*$1, $2*$2, $3*$3, $4*$4, $5*$5, $6*$6)}' initial.data > final.data

which produces something like this

2.8545619E-55 1.3676831E-57 6.7097068E-57 9.1287324E-57 6.6666245E-57 3.8828477E-57
1.9545772E-57 8.6784446E-58 3.4166195E-58 1.1940151E-58 3.7039274E-59 1.0194419E-59
2.4879384E-60 5.3794744E-61 1.0295499E-61 1.7419248E-62 2.6016018E-63 3.423277E-64
3.9588006E-65 4.0104869E-66 3.5435828E-67 2.7141904E-68            0            0

Question #1: how can one eliminate the "0"s which awk produces? I've tried sed

sed 's/        0    /             /g' <final.data >almost.final.data

but I can't remove the last 0 from each smaller line (i.e. fewer columns with real data); in this case I obtain something like this:

2.8545619E-55 1.3676831E-57 6.7097068E-57 9.1287324E-57 6.6666245E-57 3.8828477E-57
1.9545772E-57 8.6784446E-58 3.4166195E-58 1.1940151E-58 3.7039274E-59 1.0194419E-59
2.4879384E-60 5.3794744E-61 1.0295499E-61 1.7419248E-62 2.6016018E-63 3.423277E-64
3.9588006E-65 4.0104869E-66 3.5435828E-67 2.7141904E-68                         0

Question #2: How can I stop awk process a non existing data from a column? (in my case, the 5th and 6th "fields" from every 4 columns-only lines)

I thank you for your help!

CarloM · November 15, 2013, 10:20am

You could check explicitly check for every 4th line:

awk '
NR%4==0 { 4 columns stuff }
NR%4!=0 { 6 columns stuff}'

(EDIT: Or check NF==4/NF==6, as was suggested in a briefly-lived post :))

Or just loop around the number of fields you actually have in each line:

awk '
{
   for (i=1;i<=NF;i++) {
      printf ("%12.8G ", $i*$i)
   }
   printf "\n"
}'

RudiC · November 15, 2013, 12:10pm

Try also

awk '{print  $1*$1, $2*$2, $3*$3, $4*$4, $5?$5*$5:"", $6?$6*$6:""}' OFMT="%14.8G" file

Scrutinizer · November 15, 2013, 12:57pm

Yet another possibility:

 awk '{for(i=1; i<=NF; i++) $i*=$i}1' CONVFMT="%14.8G" file

radudownload · November 15, 2013, 6:49pm

Thank you guys!

I've only implemented Scrutinizers suggestion, and it works well for me.

For CarloM: (I'm not an awk expert, but ...) can you please tell me where should I state in your second code what file should be processed? Thank you!

CarloM · November 16, 2013, 4:56am

You can redirect stdin as you did with with your sed command, or just specify the filename as in the other suggestions.