Use while loop to read file and use ${file} for both filename input into awk and as string to print

pathunkathunk · May 16, 2017, 5:17pm

I have files named with different prefixes. From each I want to extract the first line containing a specific string, and then print that line along with the prefix.

I've tried to do this with a while loop, but instead of printing the prefix I print the first line of the file twice.

Files:

cat prefixes.txt 
N2_A
N2_O

cat N2_A_ko.txt 
K02234  6.24
K02588  14.971
K02588|unclassified     14.971

cat N2_O_ko.txt 
K02588  2.647
K02588|unclassified     2.647
K02233 3.45

Here's what I've tried:

while read file
do 
var=${file}
awk -OFS'\t' '/K02588/ {print $var,$1,$2; exit}' ${file}_ko.txt > ${file}_nifh.txt
done < prefixes.txt

But, again, instead of printing the prefix as the first column, it prints the first line of the file.

cat N2_A_nifh.txt 
K02588  14.971 K02588   14.971 

cat N2_O_nifh.txt 
K02588  2.647 K02588    2.647

Don_Cragun · May 16, 2017, 5:44pm

You're close, but shell variables aren't known in an awk script unless you explicitly pass them in. And, in awk $var prints the field named by the field number contained in the awk variable var (and since var hasn't been defined in your awk script it expanded to $0 which is the contents of the current line). Try:

while read file
do	awk -F'\t' -v var="$file" '/K02588/ {print var,$1,$2; exit}' "${file}_ko.txt" > "${file}_nifh.txt"
done < prefixes.txt

MadeInGermany · May 17, 2017, 1:37am

Can be elegantly done with shell builtins

while read file
do 
  while read id w wr
  do
    case $id in
    K02588 )
      printf "%s\t%s\t%s\n" "$file" "$id" "$w"
      break
    ;;
    esac
  done <"$file"_ko.txt >"$file"_nifh.txt
done < prefixes.txt

--
EDIT: just seeing it is not always found in the first line, therefore added another while loop (and actually the previous awk solution does this loop)

rovf · May 17, 2017, 3:18am

You didn't say which awk you are using. If your awk is linked to nawk or gawk, there is also an alternative to the solution posted by Don Cragun, in that you could export your shell variable (so that it becomes an environment variable) and use awk's built-in ENVIRON array to access it. If you follow this route, I strongly recommend to write the variable in all-uppercase.