Convert comma separated file to fix length

apenkov · May 29, 2013, 9:33am

Hi, I am converting a comma separated file to fixed field lenght and I am using that:

COLUMNS="25 24 67 26 39 63 20 34 35 14 397"
(
cat $indir/input_file.dat | \
$AWK -v columns="$COLUMNS" '
  BEGIN {
    FS=",";
    OFS="";
    split(columns, arr, " ");
  }
  {
    for(i=1; i<=NF; i++)
      printf("%-*s%c", arr, $i, (i==NF) ? RS : OFS)
  }
') >> $outdir/output_file.dat

It is working fine, but most of the files are big and the performance is very slow. Any ideas how can I make it faster?

Thanks!

alister · May 29, 2013, 9:54am

Step 1: Cut the i/o in half by eliminating the unnecessary cat.
Step 2: Eliminate the for-loop and replace it with a single call to printf.
Step 3: Don't use gawk unless you must. It's the slowest awk implementation.

Regards,
Alister

apenkov · May 29, 2013, 10:02am

Actually it is:

AWK="/usr/xpg4/bin/awk"  #extended awk for solaris

alister · May 29, 2013, 10:10am

I wasn't asserting that you are using gawk. Your post did not specify the implementation, so I mentioned gawk's slow execution speed in case it was relevant.

Regards,
Alister

elixir_sinari · May 29, 2013, 10:23am

Assuming ASCII data, try

perl -F, -lape 'BEGIN { ($tmpl = shift) =~ s/(\d+)/A$1/g }
$_ = pack($tmpl, @F)' "$COLUMNS" "$indir/input_file.dat" > "$outdir/output_file.dat"

apenkov · June 3, 2013, 8:47am

Hi, thanks. I tried it, but unfortunately it is the same