awk to create link, download, and extract in sub-directory

The awk below will create sub-directories in a directory (which is always the last line of file1 , each block separated by an empty line), if the number in line 2 (always the first 6 digits in the format xx-xxxx) of file2 is found in $2 of file1 . This is the current awk output.

If there is a match and a sub-directory is created in a directory then the corresponding line1 https in file2 will always be a link to a zip file for download. I can not seem to create that link in the sub-folder, download and extract the .zip . Thank you :slight_smile:

I updated the awk with the lines in bold to grab the download link and put it in the each sub-directory. If I manually enter the download in the terminal it does work. Thank you :).

file1

xxx_006 19-0000_xxx-yyy-aaa
xxx_007 19-0001_zzz-bbb-ccc
R_2019_02_28_00_xx_yy_user_S5-0271-00-Medexome

yyyy_0287 19-0v02-xxx
yyyy_0289 19-0v31-xxxx
yyyy_0293 19-0v05-xxxx
R_2019_02_15_11_56_40_user_S5-0271-00-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions

file2

https://xx.yy.zz/path/to/file.zip
19-0v05-xxx_000_001
cc112233
https://xx.yy.zz/path/to/download/file.zip
19-0v31-xxx-001-000
bb4456784
https://xx.yy.zz/path/to/file.zip
19-0v02-xxx_000_001
aaa331232

awk

awk 'NR==FNR { for(i=2; i<NF; i+=2) a[substr($i,1,7)] = $NF; next }  ## start loop and iterate over each first 6 digits in $2 of line 2 in file1
     { k = substr($0, 1, 7) } ## store value extracted in k
             { for(i=1; i<NF; i+=1) a[substr($i,1,7)] = $NF; next } ## start loop and iterate over each previous matching line of file1
     { l = ($0) } ## store value extracted in l (grab the link in line 1)
     k in a { cmd = sprintf("mkdir -p %s/%s", a[k], $0); system(cmd); }  ## for each k in file2 make a directory with sub-directory as k
     l in a { cmd = sprintf("curl -O -v -k -X GET "https://xxx/path/to/download/.zip" -H "Content-Type:application/x-www-form-urlencoded" -H "Authorization:xxx"", a[k], $0); system(cmd); && unzip }  ## for each l in file2 make a directory with sub-directory as k and download l in it
' RS= file1 RS='\n' file2  ## files to use

current awk output

R_2019_02_15_11_56_40_user_S5-0271-00-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions --- directory
   19-0v02-xxx_000_001  --- sub folder
   19-0v05-xxx_000_001  --- sub-folder
   19-0v31-xxx-001-000  --- sub-folder

desired awk output

R_2019_02_15_11_56_40_user_S5-0271-00-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions --- directory
   19-0v02-xxx_000_001  --- sub folder
       https://xx.yy.zz/path/to/file.zip  --- zip and extracted downloaded to sub-folder
   19-0v05-xxx_000_001  --- sub-folder
       https://xx.yy.zz/path/to/file.zip  --- zip and extracted downloaded to sub-folder
   19-0v31-xxx-001-000  --- sub-folder
       https://xx.yy.zz/path/to/file.zip  --- zip and extracted downloaded to sub-folder