Hello,
I have a script that checks every file with a specific extension in a specific directory. The file names contain some numerical output and I am recording the file names with the best n outcomes.
The script finds all files in the directory with the extension .out.txt and uses awk to parse the filename on underscore. In this case, I am reading the first field and looking for the smallest three values across the set of files. In other cases, I may be reading the third field. I understand that in this simple case, all I would have to do is take the first three files, but there will be other cases where that would not work.
This is the script at this point and there is sample input in the attached zip. The input file names look like,
48.93_E3200_55.94_E1900_34_ri_OA_f0_S1A_v17_52.26.1_4_ON_0.25lr.out.txt
49.15_E2700_51.98_E1200_32_ri_OA_f0_S1A_v17_52.26.1_4_ON_0.25lr.out.txt
49.16_E1600_52.54_E1600_44_ri_OA_f0_S1A_v17_52.26.1_4_ON_0.25lr.out.txt
50.36_E3400_55.09_E3000_35_ri_OA_f0_S1A_v17_52.26.1_4_ON_0.25lr.out.txt
50.62_E1700_51.92_E300_8_ri_OA_f0_S1A_v17_52.26.1_4_ON_0.25lr.out.txt
#!/bin/bash
# loop through all files and save the top 3 filenames
# initalize
FILENAME=""
CURRENT_MAE_VALUE=0
# these are initalized to an arbitrarily large value
EV_MAE_0=1000.0
EV_MAE_1=1000.0
EV_MAE_2=1000.0
EV_FILES=(NULL0 NULL1 NULL2)
# set fold value
FOLD=f0
# get directory list
FILES='./'$FOLD'/'*'out.txt'
for INFILE in $FILES
do
# remove directory from path
FILENAME=`echo $INFILE | awk 'BEGIN {FS="/"} {print $3}'`
# find ev mae value
CURRENT_MAE_VALUE=`echo $FILENAME | awk 'BEGIN {FS="_"} {print $1}'`
# save the names of the top 3 EV files and EV values
if (( $(bc <<< "$CURRENT_MAE_VALUE < $EV_MAE_0") == 1 ))
then
#bump down current list items
EV_FILES[2]=${EV_FILES[1]}; EV_MAE_2=$EV_MAE_1
EV_FILES[1]=${EV_FILES[0]}; EV_MAE_1=$EV_MAE_0
EV_FILES[0]=$FILENAME
# assign EV_MAE_VALUE to top value
EV_MAE_0=$CURRENT_MAE_VALUE
elif (( $(bc <<< "$CURRENT_MAE_VALUE < $EV_MAE_1") == 1 ))
then
#bump down current list items
EV_FILES[2]=${EV_FILES[1]}; EV_MAE_2=$EV_MAE_1
EV_FILES[1]=$FILENAME
# assign EV_MAE_VALUE to second value
EV_MAE_1=$CURRENT_MAE_VALUE
elif (( $(bc <<< "$CURRENT_MAE_VALUE < $EV_MAE_2") == 1 ))
then
#bump down current list items
EV_FILES[2]=$FILENAME
# assign EV_MAE_VALUE to third value
EV_MAE_2=$CURRENT_MAE_VALUE
fi
done
# print results
echo "1st EV file"
echo ${EV_FILES[0]}
echo "EV MAE 0" $EV_MAE_0
echo""
echo "2nd EV file"
echo ${EV_FILES[1]}
echo "EV MAE 1" $EV_MAE_1
echo""
echo "3rd EV file"
echo ${EV_FILES[2]}
echo "EV MAE 2" $EV_MAE_2
echo""
My main question is about how to keep a running record of the file names of the best three values as I loop through the file names. This script does it by brute force and works alright, but I may need to save the top 20 or 50, and I don't look forward to coding that up with the method I used above.
Any suggestions?
LMHmedchem