Using Awk to efficiently substitute values using 3 vectors

LaTortuga · February 11, 2010, 3:13pm

I'm trying to efficiently combine the fields of two vectors (vectors b and c) into a new vector (vector d) as defined by instructions from a 3rd vector (vector a). So vector a has either a 1 or 2 in each field specifying which vector (b or c respectively) should go into that field. Vector a is space separated, while vectors b, c, and d are semicolon separated. I made a bash/awk script which does this, but it takes a while:

z=0
for y in `echo $a_vect`
do
  z=$(( $z + 1 ))            
  if [ "$y" -eq 1 ]; then 
    temp_var=`echo ${b_vect} | awk -F';' '{print $"'"${z}"'"}'`
    d_vect=`echo -ne "${d_vect};${temp_var}"`
  else
    temp_var=`echo ${c_vect} | awk -F';' '{print $"'"${z}"'"}'`
    d_vect=`echo -ne "${d_vect};${temp_var}"`
  fi
done

joeyg · February 11, 2010, 3:16pm

Can you provide a sample of the input and desired output? Seeing 5-10 lines of example makes coding much easier.

LaTortuga · February 11, 2010, 3:31pm

a_vect='1 2 1 2 1'
b_vect='0;1;2;3;4'
c_vect='5;6;7;8;9'

desired output:

d_vect='0;6;2;8;4'

Hope that helps :), let me know if you need any further clarification

linuxpenguin · February 11, 2010, 5:49pm

Ok, I got something here for you. Its not awk, but it works and I belive its faster than yours because it does not loop three times. I am sure somebody will get a better solution, but here are my 2 cents.

#!/bin/bash

## JUST a lil FANCY STUFF for debug
function debug()  {
  if [[ $DEBUG ]]; then
    echo "$@"
  fi
}

if [[ "$1" == "d" ]]; then
  DEBUG=true
fi

## REAL USEFUL CODE STARTS HERE
inputfile='/tmp/1.ip' ## SET IT TO WHEREVER YOUR INPUT FILE IS
tmpfile='/tmp/2.ip'

## FORMAT THE INPUT A LITTLE SO ITS EASY TO HANDLE LATER.
sed "s/._vect='\(.*\)'/\1/g" ${inputfile}| sed "s/\([^ ;]*\)[ ;\$]/\1 /g" > $tmpfile

refarr=(`head -1 $tmpfile`)       ## THE FIRST  ROW OF INPUT
arr1=(`head -2 $tmpfile|tail -1`) ## THE SECOND ROW OF INPUT
arr2=(`tail -1 $tmpfile`)         ## THE THIRD  ROW OF INPUT
length=`echo ${arr2[@]}|awk '{print NF}'` ## NUMBER OF FIELDS IN EACH ROW

debug "REFARR = ${refarr[@]}"
debug "arr1 =   ${arr1[@]}"
debug "arr2 =   ${arr2[@]}"

typeset -a arr        ## FINAL RESULT ARRAY
i=0

## LOOP ONLY ONCE, ASSUME ALL ROWS HAVE SAME NUMBER OF FIELDS.
while [[ $i -lt length ]]; do
  if [[ "${refarr}" == "1" ]]; then
    arr=`echo "${arr1}"`
  else
    arr=`echo "${arr2}"`
  fi
  debug "$i => Ref[${i}]: ${refarr[$i]}, arr1[$i]: ${arr1[$i]}, arr2[$i]: ${arr2[$i]}, arr[$i]: ${arr[$i]}"
  (( i = $i + 1 ))
done
echo "${arr[@]}"|sed "s/ /;/g"

I timed this script on my ubuntu (karmic koala) and for the given input it performed as below

time ./ux.sh 
0;6;2;8;4
./ux.sh  0.01s user 0.02s system 83% cpu 0.033 total

You might have noted that the input file is a variable in the script, you may parametrize it if you want, or just hard code it in there.
And as you can see, if you run this script with a argument "d", it should print what it is trying to do. Fore complete debug just set -x.

LaTortuga · February 11, 2010, 6:46pm

MUCH faster, THANK YOU!!!