newbie needs help batching awk, tabitha

Hi guys,

I need a little help learning to batch an awk script. All the examples I found on line are too complicated for me. Here's the awk command that I want to run on lots of files.

awk 'NR==FNR{a[$1]=$0;next$3 in a{print $0 a[$1] " " a[$3]}' inputfile_A_1.out  inputfile_B_1.out  >  outputfile1.txt

then on
inputfile_A_2.out inputfile_B_2.out > outputfile2.txt
and
inputfile_A_3.out inputfile_B_3.out > outputfile3.txt
like that

thanks sooo much, tabitha

first of all, you have an issue with your awk - missing '}':

awk 'NR==FNR{a[$1]=$0;next} $3 in a{print $0 a[$1] " " a[$3]}' inputfile_A_1.out  inputfile_B_1.out  >  outputfile1.txt

Then... something to start with:

#!/bin/ksh

for i in 1 2 3 4 5
do
  awk 'NR==FNR{a[$1]=$0;next} $3 in a{print $0,a[$1], a[$3]}' "inputfile_A_${i}.out"  "inputfile_B_${i}.out" > "outputfile${i}.txt"
done

thanks! you are correct, I forgot the curly brace in my post - ooops

will it work using bash, or just ksh, I have bash?

the file names are more complex than I first showed, sorry, here's some examples of how the files are named:

inputfile_A_1 --> CFD-2012-10-7_RTC_P09_L000058.txt
inputfile_B_1 --> CFD-2012-10-7_RTC_P09_L000102.txt

inputfile_A_2 --> CFD-2012-10-4_RTC_P11_L000058.txt
inputfile_B_2 --> CFD-2012-10-4_RTC_P11_L000102.txt

inputfile_A_2 --> CFD-2012-11-4_RTC_P02_L000058.txt
inputfile_B_2 --> CFD-2012-11-4_RTC_P02_L000102.txt

the _L000058.txt and the _L000102.txt are always the same but the rest of the file name will change like that but many many more

thanks tabitha

He's not using any ksh-specific features.

thank you!

what should I do about the complex filenames to batch over them?

try this


i=1
for j in $(ls *_L000058.txt)
do 

b_file=$(echo $j | awk -F"_" 'BEGIN{OFS=FS}{print $1,$2,$3"_L000102.txt"}')

awk 'NR==FNR{a[$1]=$0;next} $3 in a{print $0,a[$1], a[$3]}' $j $b_file  > "outputfile${i}.txt"

i=$(expr $i + 1)

done

Something shorter and elegant:

for i in *_L000058.txt
do
 awk 'NR==FNR{a[$1]=$0;next} $3 in a{print $0,a[$1], a[$3]}' $i ${i%_*}_L000102.txt > outputfile$((++j)).txt
done

raj, your code works well although it does create an extraneous column of data (column 8) that doesn't appear to correlate to anything in the two input files, and it also leaves out one column of data, but I can live with these. Also, I had to remove the $3 in the print statement

raj, the output files from your code get named: outputfile1.txt outputfile1+1.txt outputfile2+1.txt etc. is there a simple way for it to add the values together

elixir, your code also ran on the files, but only gave one output file that the data was from the last set of inputfiles in the directory. I think all the previous are getting overwritten. Your code produced two extraneous columns of data that doesn't correlate to either of the inputfiles. Columns 6 and 9

I really realy thank you guys sooo much for helping me, hopefully these little qwerkies are easy to fix???

atjurhs I had not made any changes to your code related to output of column in awk statment.
As for the file name is concerned check the value of $i coming in the code on your system. As for me its giving correct value. Try replace ${i} with $i also in output file name.

If still not working get back to me with $i value.

raj, I'm sorry, I wasn't very clear. In part of the code you had written

{print $1,$2,$3"_L000102.txt"}

and in order for it to run without a sybtax error, I re-wrote it as

{print $1,$2"_L000102.txt"}

also by trial and error I fixed the outputfile1+1.txt format by commenting out

i=$(expr $i + 1)

and just using

(( i = i+1))

I still need to figure out where the extraneous columns are coming from but as it is it meets my needs

is your problem solved now or still output file is not as you expected.
have you printed the value of $i with my code.

it's mostly solved, and I say only because I still get one extraneous column and it leaves out one column.

as for printing $i can I just throw in print$i statements wherever I like?

thanxs Tabitha