Create directory and sub-directory with awk and bash

In the below I am trying to create a parent directory using the R_2019 line from f1 if what above it is not empty.
I then create sub-directories under each parent if there is a match between $2 of f1 and $2 . Inside each sub-folder the matching paths in $3 and $4 in f2 are printed. If there is no match in f2 then no parent directory is created. So eventhogh in f1 there is a R_2019 line, a directory is only created if a match in $2 between f1 and f2 .

There may be multiple matches between each $2 but only 1 match for each parent directory. So there may be multiple sub-folders under each parent but the parent will always be unique. In the case where the R_2019 has an empty value or newline above it, nothing is done or it is skipped.

f1

xxxx_0000 190326-Control
xxxx_0004 19-0000-L,F
R_2019_xx_yy_xx_yy_xx_xxxx_x0-0000-101-x0.0_xxx_xxxx_xxxx_xx_x_xxx

R_2019_xx_yy_xx_yy_xx_xxxx_x0-0000-100-x0.0_xxx_xxxx_xxxx_xx_x_xxx

xxxx_0002 190326-Control
R_2019_xx_yy_xx_yy_xx_xxxx_x0-0000-99-x0.0_xxx_xxxx_xxxx_xx_x_xxx

xxxx_0008 190302-Control
R_2019_xx_yy_xx_yy_xx_xxxx_x0-0000-93-x0.0_xxx_xxxx_xxxx_xx_x_xxx

f2

xxxx_0000 190326-Control /path/to/file1 /path/to/file2 /path/to/file3
xxxx_0004 19-0000-L,F /path/to/file1 /path/to/file2 /path/to/file3

xxxx_0002 190302-Control /path/to/file1 /path/to/file2 /path/to/file3

desired

R_2019_xx_yy_xx_yy_xx_xxxx_x0-0000-101
    190326-Control
       /path/to/file1 /path/to/file2
    19-0000 L,F
       /path/to/file1 /path/to/file2

R_2019_xx_yy_xx_yy_xx_xxxx_x0-0000-99
    190326-Control
      /path/to/file1 /path/to/file2

awk

awk '
    # create an associative array (key/value pairs) based on the f1
    NR==FNR { for(i=2; i<NF; i+=2) a[substr($i,1,7)] = $NF; next } ## store the 7 digits in $2 in a

    # retrieve the first 7-char of each line in f2 as the key to test against the above hash
    { b = substr($2, 1, 7) } ## store the 7 digits in $2 in b

    # if find k, then print
    b in a { print a "\t" $3 && $4 "\t" p }

    # save prev line to 'p'(these are the paths)
    { p = $0  } 

' RS= f1 RS='\n' f2 | while IFS=$'\t' read -r base_dir sub_dir path; do  ## loop through each key/pair and matching p 
    echo "adding path to data [$path] in each '$base_dir/$sub_dir'"  ## display message
       base_dir=${/R_2019/%%-x0.0*}  ## define parent directory
        sub_dir=${a}  ## define sub-directory
         path=${p}  ## define path
           mkdir -p "$base_dir/$sub_dir/$path"  ## make folder and sub-folder 

In the first section of code marked in red above you have an awk script that defines the array named a[] and the variable named b . That array and that variable do not exist outside of that awk script. So in the shell script, the third section of code marked in red cannot work because the array a[] and the subscript b have not been defined in your shell script.

In shell code, every parameter expansion (such as the second section of code marked in red above) requires the name of a variable immediately following the opening ${ . That expansion cannot work because /R_2019/ is not a valid shell variable name. And no name even slightly resembling that string has been defined in your shell code.

If you want to create a set of directories from data gathered in an awk script you ether need to have the awk script print the names of those directories and have you shell code read those names and use mkdir to create them or you need to have the awk script create mkdir statements and pipe the output through your shell to have the commands created by the awk script be executed by your shell.

1 Like

I'm not sure if I completely follow, but I modified the script to utilize awk better. Is the below closer or what do you suggest? I included comments in the code as well. Thank you very much :).

awk

DIR=/home/cmccabe/pre  ## define data directory
awk '
    # create an associative array (key/value pairs) based on the f1
    NR==FNR { for(i=2; i<NF; i+=2) a[substr($i,1,7)] = $NF; next } ## store the 7 digits in $2 in a

    # retrieve the first 7-char of each line in f2 as the key to test against the above hash
    { b = substr($2, 1, 7) } ## store the 7 digits in $2 in b

    # if find b, then print
    b in a { print a "\t" $3 && $4 "\t" p }

    # save prev line to 'p'(these are the paths)
    { p = $0  } 

' RS= f1 RS='\n' f2 | for RDIR in "$DIR"/R_2019* ; do  ## # matching "R_2019*" to operate on desired directory and expand
                       TRIMSTR=${RDIR%%-x0.0*}  ## trim folder match in RDIR from -x0.0 and store in TRIMSTR
                         mv "$RDIR" "${TRIMSTR}"  ## trim folder
                           mkdir -p "$TRIMSTR/$a/$p"  ## make folder/sub-folder/path
                       done ## end loop