The awk
below will create sub-directories in a directory (which is always the last line of file1
, each block separated by an empty line), if the number in line 2 (always the first 6 digits in the format xx-xxxx) of file2
is found in $2
of file1
. This is the current awk
output.
If there is a match and a sub-directory is created in a directory then the corresponding line1 https
in file2
will always be a link to a zip
file for download. I can not seem to create that link in the sub-folder, download and extract the .zip
. Thank you
I updated the awk
with the lines in bold to grab the download link and put it in the each sub-directory. If I manually enter the download in the terminal it does work. Thank you :).
file1
xxx_006 19-0000_xxx-yyy-aaa
xxx_007 19-0001_zzz-bbb-ccc
R_2019_02_28_00_xx_yy_user_S5-0271-00-Medexome
yyyy_0287 19-0v02-xxx
yyyy_0289 19-0v31-xxxx
yyyy_0293 19-0v05-xxxx
R_2019_02_15_11_56_40_user_S5-0271-00-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions
file2
https://xx.yy.zz/path/to/file.zip
19-0v05-xxx_000_001
cc112233
https://xx.yy.zz/path/to/download/file.zip
19-0v31-xxx-001-000
bb4456784
https://xx.yy.zz/path/to/file.zip
19-0v02-xxx_000_001
aaa331232
awk
awk 'NR==FNR { for(i=2; i<NF; i+=2) a[substr($i,1,7)] = $NF; next } ## start loop and iterate over each first 6 digits in $2 of line 2 in file1
{ k = substr($0, 1, 7) } ## store value extracted in k
{ for(i=1; i<NF; i+=1) a[substr($i,1,7)] = $NF; next } ## start loop and iterate over each previous matching line of file1
{ l = ($0) } ## store value extracted in l (grab the link in line 1)
k in a { cmd = sprintf("mkdir -p %s/%s", a[k], $0); system(cmd); } ## for each k in file2 make a directory with sub-directory as k
l in a { cmd = sprintf("curl -O -v -k -X GET "https://xxx/path/to/download/.zip" -H "Content-Type:application/x-www-form-urlencoded" -H "Authorization:xxx"", a[k], $0); system(cmd); && unzip } ## for each l in file2 make a directory with sub-directory as k and download l in it
' RS= file1 RS='\n' file2 ## files to use
current awk output
R_2019_02_15_11_56_40_user_S5-0271-00-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions --- directory
19-0v02-xxx_000_001 --- sub folder
19-0v05-xxx_000_001 --- sub-folder
19-0v31-xxx-001-000 --- sub-folder
desired awk output
R_2019_02_15_11_56_40_user_S5-0271-00-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions --- directory
19-0v02-xxx_000_001 --- sub folder
https://xx.yy.zz/path/to/file.zip --- zip and extracted downloaded to sub-folder
19-0v05-xxx_000_001 --- sub-folder
https://xx.yy.zz/path/to/file.zip --- zip and extracted downloaded to sub-folder
19-0v31-xxx-001-000 --- sub-folder
https://xx.yy.zz/path/to/file.zip --- zip and extracted downloaded to sub-folder