awk to lookup stored variable in file and print matching line

The bash bash below extracts the oldest folder from a directory and stores it in filename
That result will match a line in bold in input . In the matching line there is an_xxx digit in italics that
(once the leading zero is removed) will match a line in link . That is the lint to print in output .
There will always be only one exact match. Thank you :).

# oldest folder used analysis and version log created
dir=/home/cmccabe/Desktop/NGS/test
{
  read -r -d $'\t' time && read -r -d '' filename
} < <(find "$dir" -maxdepth 1 -mindepth 1 -printf '%T+\t%P\0' | sort -z -r )
printf "The oldest folder is $filename, created on $time and analysis done using v1.3 by $USER at $(date "+%D %r")\n" >> /home/cmccabe/Desktop/NGS/test/log

Result of bash:
R_2017_01_13_14_46_04_user_S5-00580-25-Medexome

input

http://xxx.xx.xxx.xxx/output/Home/Auto_user_S5-00580-25-Medexome_135_080/plugin_out/FileExporter_out.194/R_2017_01_13_14_46_04_user_S5-00580-25-Medexome.tar.bz2
http://xxx.xx.xxx.xxx/output/Home/Auto_user_S5-00580-24-Medexome_136_078/plugin_out/FileExporter_out.191/R_2017_01_13_12_11_56_user_S5-00580-24-Medexome.tar.bz2

link

http://xxx.xx.xxx.xxx/output/report/latex/80.pdf
http://xxx.xx.xxx.xxx/output/report/latex/78.pdf

awk attempt with explanation

awk '{match(VAL=substr($0,RSTART,RLENGTH);match($0,/R*_[0-9]+\//);VAL1=substr($0,RSTART,RLENGTH);gsub(/.*_0|.*_|\/);print' $filename < inputlink > output

explanation

R_2017_01_13_14_46_04_user_S5-00580-25-Medexome extracted from $filename and matched to line 1 in input (section in bold)

that line has _080 in it (in italics)

the 80 (leading zero always removed), matches line1 in link so that is output

desired output this line matches the result from the bash so it is printed

http://xxx.xx.xxx.xxx/output/report/latex/80.pdf

Not clear if you need this to run as a single script or not, but here is 1 solution:

# oldest folder used analysis and version log created
dir=/home/cmccabe/Desktop/NGS/test

find "$dir" -maxdepth 1 -mindepth 1 -type d -printf '%T+\t%P\0' | sort -rz |
while read -r -d $'\t' time && read -r -d '' filename
do
    printf "The oldest folder is $filename, created on $time and analysis done using v1.3 by $USER at $(date "+%D %r")\n"
    awk -v FL="$filename" '
         FNR == 1 {filenum++}
         filenum==1 && index($0, FL) { 
              match($0, "_0*([0-9]+)/")
              FNUM=substr($0,RSTART+1,RLENGTH-2)
              gsub(/^0+/,"", FNUM)
          }
          filenum==2 && $0 ~ FNUM".pdf$"' input link > output
    break
done
1 Like

Thank you very much, works perfect :).