In the below I am trying to create a parent directory using the R_2019 line from f1 if what above it is not empty.
I then create sub-directories under each parent if there is a match between $2 of f1 and $2 . Inside each sub-folder the matching paths in $3 and $4 in f2 are printed. If there is no match in f2 then no parent directory is created. So eventhogh in f1 there is a R_2019 line, a directory is only created if a match in $2 between f1 and f2 .
There may be multiple matches between each $2 but only 1 match for each parent directory. So there may be multiple sub-folders under each parent but the parent will always be unique. In the case where the R_2019 has an empty value or newline above it, nothing is done or it is skipped.
awk '
# create an associative array (key/value pairs) based on the f1
NR==FNR { for(i=2; i<NF; i+=2) a[substr($i,1,7)] = $NF; next } ## store the 7 digits in $2 in a
# retrieve the first 7-char of each line in f2 as the key to test against the above hash
{ b = substr($2, 1, 7) } ## store the 7 digits in $2 in b
# if find k, then print
b in a { print a "\t" $3 && $4 "\t" p }
# save prev line to 'p'(these are the paths)
{ p = $0 }
' RS= f1 RS='\n' f2 | while IFS=$'\t' read -r base_dir sub_dir path; do ## loop through each key/pair and matching p
echo "adding path to data [$path] in each '$base_dir/$sub_dir'" ## display message
base_dir=${/R_2019/%%-x0.0*} ## define parent directory
sub_dir=${a} ## define sub-directory
path=${p} ## define path
mkdir -p "$base_dir/$sub_dir/$path" ## make folder and sub-folder
In the first section of code marked in red above you have an awk script that defines the array named a[] and the variable named b . That array and that variable do not exist outside of that awk script. So in the shell script, the third section of code marked in red cannot work because the array a[] and the subscript b have not been defined in your shell script.
In shell code, every parameter expansion (such as the second section of code marked in red above) requires the name of a variable immediately following the opening ${ . That expansion cannot work because /R_2019/ is not a valid shell variable name. And no name even slightly resembling that string has been defined in your shell code.
If you want to create a set of directories from data gathered in an awk script you ether need to have the awk script print the names of those directories and have you shell code read those names and use mkdir to create them or you need to have the awk script create mkdir statements and pipe the output through your shell to have the commands created by the awk script be executed by your shell.
I'm not sure if I completely follow, but I modified the script to utilize awk better. Is the below closer or what do you suggest? I included comments in the code as well. Thank you very much :).
awk
DIR=/home/cmccabe/pre ## define data directory
awk '
# create an associative array (key/value pairs) based on the f1
NR==FNR { for(i=2; i<NF; i+=2) a[substr($i,1,7)] = $NF; next } ## store the 7 digits in $2 in a
# retrieve the first 7-char of each line in f2 as the key to test against the above hash
{ b = substr($2, 1, 7) } ## store the 7 digits in $2 in b
# if find b, then print
b in a { print a "\t" $3 && $4 "\t" p }
# save prev line to 'p'(these are the paths)
{ p = $0 }
' RS= f1 RS='\n' f2 | for RDIR in "$DIR"/R_2019* ; do ## # matching "R_2019*" to operate on desired directory and expand
TRIMSTR=${RDIR%%-x0.0*} ## trim folder match in RDIR from -x0.0 and store in TRIMSTR
mv "$RDIR" "${TRIMSTR}" ## trim folder
mkdir -p "$TRIMSTR/$a/$p" ## make folder/sub-folder/path
done ## end loop