Bash to calculate average of all files in directory and output by part of filename

I am trying to use awk to calculate the average of all lines in $2 for every file in a directory. The below bash seems to do that, but I cannot figure out how to capture the string before the _ as the output file name and have it be tab-delimeted. Thank you :).

Filenames in /home/cmccabe/Desktop/20x/idp

NA00449_base_counts_allidp.bed_IDP20x.txt
NA02782_base_counts_allidp.bed_IDP20x.txt

Bash

for f in /home/cmccabe/Desktop/20x/idp/*.txt ; do
     bname=$(basename $f)
     pref=${bname%%.txt}
     awk '{ sum += $2 } END { if (NR > 0) print sum / NR }' $f /home/cmccabe/Desktop/NGS/bed/bedtools/IDP_total_target_length_by_panel/IDP_unix_trim_total_target_length.bed > /home/cmccabe/Desktop/20x/idp/${pref}_average.txt
done

The data files are too large to attach but basically the average is being calculated currently, as below:

current output

NA00449_base_counts_allidp.bed_IDP_average.txt 98.5648

desired output (same data in it, just only the filename is different)

NA00449_average.txt     98.5648

Hi,

Can you please consider this as a starting point to get what you need?

A=NA00449_base_counts_allidp.bed_IDP_average.txt
echo ${A%%_*}_${A##*_}

gives output:

or in awk

echo "NA00449_base_counts_allidp.bed_IDP_average.txt" | awk -F_ '{print $1FS$NF}'
1 Like

Thank you for the suggestion it lead me to the below which produces the desired result:

for f in /home/cmccabe/Desktop/20x/percent/*.txt ; do
     bname=$(basename $f)
     pref=${bname%%_base_*.txt}
     awk '{ sum += $2 } END { if (NR > 0) print sum / NR }' $f /home/cmccabe/Desktop/NGS/bed/bedtools/IDP_total_target_length_by_panel/IDP_unix_trim_total_target_length.bed > /home/cmccabe/Desktop/20x/coverage/${pref}_average.txt
done

Thank you for your help :).

Note that instead of starting a subshell for the command substitution and invoking the basename utility for every file you process, you can change:

     bname=$(basename $f)

to:

     bname=${f##*/}

to make it more efficient and a little bit faster while getting exactly the same results.