awk to create variables to pass into a bash loop to create a download link

I have created one file that contains all the necessary info in it to create a download link. In each of the lines

/results/analysis/output/Home/Auto_user_S5-00580-6-Medexome_67_032/plugin_out/FileExporter_out.67
/results/analysis/output/Home/Auto_user_S5-00580-4-Medexome_65_028/plugin_out/FileExporter_out.52

the _user_S5-00580-6-Medexome , the digit between the - before Medexome is variable, but that line will match a line in the file that begins with R_2016_09_20_10_12_41_user_S5-00580-6-Medexome . I need to read each of those two strings in variables and then use them to create a download link.

download link

http://xxx.xx.xxx.xxx  ---- hardcode
/output/Home/Auto_user_S5-00580-4-Medexome_65_028/plugin_out/FileExporter_out.52  -- from line1 with the /, /results/analysis is removed
/   --- harcode
R_2016_09_01_10_24_52_user_S5-00580-4-Medexome  -- from line R_
.tar.bz  --- hardcode

the --- are not part of the link just there for clarification i hope. Also, the

file

/results/analysis/output/Home/Auto_user_S5-00580-6-Medexome_67_032/plugin_out/FileExporter_out.67
/results/analysis/output/Home/Auto_user_S5-00580-4-Medexome_65_028/plugin_out/FileExporter_out.52
IonXpress_007 MEV21
IonXpress_008 MEV22
IonXpress_009 MEV23
R_2016_09_21_14_01_15_user_S5-00580-9-Medexome
IonXpress_001 MEC1
IonXpress_002 MEC32
IonXpress_003 MEC33
R_2016_09_21_11_26_19_user_S5-00580-8-Medexome
IonXpress_007 MEV37
IonXpress_008 MEV38
IonXpress_009 MEV39
R_2016_09_20_12_47_36_user_S5-00580-7-Medexome
IonXpress_004 MEV34
IonXpress_005 MEV35
IonXpress_006 MEV36
R_2016_09_20_10_12_41_user_S5-00580-6-Medexome
IonXpress_007 MEV45
IonXpress_008 MEV46
IonXpress_009 MEV47
R_2016_09_01_13_20_02_user_S5-00580-5-Medexome
IonXpress_004 MEV42
IonXpress_005 MEV43
IonXpress_006 MEV44
R_2016_09_01_10_24_52_user_S5-00580-4-Medexome
IonXpress_001 MEC1
IonXpress_002 MEV40
IonXpress_003 MEV41
R_2016_08_03_10_42_57_user_S5-00580-2-Medical_Exome

awk

awk {
function pA(arg, string) {
    string = arg
    sub(/_R_.*/, "", string)
    return string
}
print (u = $(i+1))
          sub(/.*_user_/, "", u)
          sub(/_.*/, "", u)
          sub(/^/, "user_", u)
          i += 2
            continue
}' file

The above was an attempt to create the two variable to read in a bash loop, but that had many errors. Is this possible or is there a better way? Thank you :).

From what you have shown us, I have absolutely no idea what two variables you are trying to create, what values you want to assign to either of those variables, nor how many pairs of values you expect to extract from your sample input file. (Note that the use of underlined text in your specification of the download link you want to create when there are other underscores in that specification is really confusing.)

Please clearly describe what you are trying to do and show us the output you want your script to produce.

1 Like

The two values are:

varA= /results/analysis/output/Home/Auto_user_S5-00580-6-Medexome_67_032/plugin_out/FileExporter_out.67

the digit between the - before Medexome is variable, but that line will match a line in the file that begins with

varB= R_2016_09_20_10_12_41_user_S5-00580-6-Medexome

Since _user_S5-00580-6-Medexome matches between those two lines

desired output

http://xxx.xx.xxx.xxx/output/Home/Auto_user_S5-00580-6-Medexome_67_032/plugin_out/FileExporter_out.67/R_2016_09_20_10_12_41_user_S5-00580-6-Medexome.tar.bz
 the (...) is not part of the download link only there to help see where the data comes from
http://xxx.xx.xxx.xxx   (harcoded)
varA or /output/Home/Auto_user_S5-00580-6-Medexome_67_032/plugin_out/FileExporter_out.67  [/results/analysis  is removed]
/    (harcoded)
varB or R_2016_09_20_10_12_41_user_S5-00580-6-Medexome
.tar.bz  (harcoded)

each download link is in the format above and consists of varA and varB

If each download link can be written to the same file, after the files are downloaded and moved, maybe I can create a check to see if the download exits in the directory. I hope this helps and thank you very much :).

Hello cmccabe,

Not completly sure about your requirements but could you please try following and let me know if this helps you.

awk -vvar1="/results/analysis/output/Home/Auto_user_S5-00580-6-Medexome_67_032/plugin_out/FileExporter_out.67" -vvar2="R_2016_09_20_10_12_41_user_S5-00580-6-Medexome" '{match(var1,/user_.*Medexome/);Q=substr(var1,RSTART,RLENGTH);match(var2,/user_.*Medexome/);W=substr(var2,RSTART,RLENGTH);if(Q==W){match(var1,/output.*/);print "http://xxx.xx.xxx.xxx/" substr(var1,RSTART,RLENGTH) "/" var2".tar.bz"}}'

Output will be as follows.

http://xxx.xx.xxx.xxx/output/Home/Auto_user_S5-00580-6-Medexome_67_032/plugin_out/FileExporter_out.67/R_2016_09_20_10_12_41_user_S5-00580-6-Medexome.tar.bz
http://xxx.xx.xxx.xxx/output/Home/Auto_user_S5-00580-6-Medexome_67_032/plugin_out/FileExporter_out.67/R_2016_09_20_10_12_41_user_S5-00580-6-Medexome.tar.bz
http://xxx.xx.xxx.xxx/output/Home/Auto_user_S5-00580-6-Medexome_67_032/plugin_out/FileExporter_out.67/R_2016_09_20_10_12_41_user_S5-00580-6-Medexome.tar.bz
http://xxx.xx.xxx.xxx/output/Home/Auto_user_S5-00580-6-Medexome_67_032/plugin_out/FileExporter_out.67/R_2016_09_20_10_12_41_user_S5-00580-6-Medexome.tar.bz
http://xxx.xx.xxx.xxx/output/Home/Auto_user_S5-00580-6-Medexome_67_032/plugin_out/FileExporter_out.67/R_2016_09_20_10_12_41_user_S5-00580-6-Medexome.tar.bz
http://xxx.xx.xxx.xxx/output/Home/Auto_user_S5-00580-6-Medexome_67_032/plugin_out/FileExporter_out.67/R_2016_09_20_10_12_41_user_S5-00580-6-Medexome.tar.bz

If above is not meeting your expectations, kindly provide more sample Input_file with expected sample output, with your all conditions too.
EDIT: Adding a non-one liner form of solution on same too.

awk -vvar1="/results/analysis/output/Home/Auto_user_S5-00580-6-Medexome_67_032/plugin_out/FileExporter_out.67" -vvar2="R_2016_09_20_10_12_41_user_S5-00580-6-Medexome" '{
	match(var1,/user_.*Medexome/);
	Q=substr(var1,RSTART,RLENGTH);
	match(var2,/user_.*Medexome/);
	W=substr(var2,RSTART,RLENGTH);
	if(Q==W){
			match(var1,/output.*/);
			print "http://xxx.xx.xxx.xxx/" substr(var1,RSTART,RLENGTH) "/" var2".tar.bz"
                }
 }
'   Input_file

Thanks,
R. Singh

1 Like

The output.txt attached is from running the awk on the combine.txt file attached, which will be the input . It is very close but the two matching lines only need to be outputted. I hope this helps and thank you very much :).

desired output (just the two matching lines from combine.txt in the same order)

http://172.24.188.111/output/Home/Auto_user_S5-00580-6-Medexome_67_032/plugin_out/FileExporter_out.67/R_2016_09_20_10_12_41_user_S5-00580-6-Medexome.tar.bz
http://172.24.188.111/output/Home/Auto_user_S5-00580-4-Medexome_65_028/plugin_out/FileExporter_out.52R_2016_09_01_10_24_52_user_S5-00580-4-Medexome.tar.bz

Hello cmccabe,

It is not at all clear from attachments too. Please rephrase your requirements and come up with correctly phrased problem. I would like to request you to please spend sometime in your question and review it before posting it, once you have enough information into your post with proper sample Input_file and expected output_file with all conditions(most importantly) then you could post it.

Thanks,
R. Singh

1 Like

I apologize for the confusion and hope I explain this better.

In combine.txt the first two lines are

 
 /results/analysis/output/Home/Auto_user_S5-00580-6-Medexome_67_032/plugin_out/FileExporter_out.67
/results/analysis/output/Home/Auto_user_S5-00580-4-Medexome_65_028/plugin_out/FileExporter_out.52
 

In each of those lines (in my real data there are several more but all follow the same format), the section in bold will match a line below in the file that looks like R_2016_09_20_10_12_41_user_S5-00580-6-Medexome

When a match is found then a download link is created from these variables along with some additional data.

Using the first line as an example:

varA= /results/analysis/output/Home/Auto_user_S5-00580-6-Medexome_67_032/plugin_out/FileExporter_out.67

and _user_S5-00580-6-Medexome from VarA matches varB= R_2016_09_20_10_12_41_user_S5-00580-6-Medexome

using these two variables VarA and varB a download link is made in the below format:

 http://172.24.188.111/output/Home/Auto_user_S5-00580-6-Medexome_67_032/plugin_out/FileExporter_out.67/R_2016_09_20_10_12_41_user_S5-00580-6-Medexome.tar.bz
 

where:

 the (...) is not part of the download link only there to help see where the data comes from 
http://xxx.xx.xxx.xxx   (harcoded)
varA or /output/Home/Auto_user_S5-00580-6-Medexome_67_032/plugin_out/FileExporter_out.67  [/results/analysis  is removed]
/    (harcoded)
varB or R_2016_09_20_10_12_41_user_S5-00580-6-Medexome
.tar.bz  (harcoded)
 

The

awk

outputs one line multiple times, but only needs to output each match once.

Since the two lines in combine.txt with the / have matches in the file their links are below:

http://172.24.188.111/output/Home/Auto_user_S5-00580-6-Medexome_67_032/plugin_out/FileExporter_out.67/R_2016_09_20_10_12_41_user_S5-00580-6-Medexome.tar.bz
http://172.24.188.111/output/Home/Auto_user_S5-00580-4-Medexome_65_028/plugin_out/FileExporter_out.52R_2016_09_01_10_24_52_user_S5-00580-4-Medexome.tar.bz

the digit between the - before Medexome is variable, but that line should match a line in combine.txt , if it does not then it is skipped. There could and often times will be more than one line with a / like at the top of combine.txt . Does this help? Thank you very much :).

Hello cmccabe,

Could you please try following and let us know if this helps you.

awk 'FNR==NR && $0 ~ /^\/results/{match($0,/user.*Medexome/);Q=substr($0,RSTART,RLENGTH);match($0,/output.*/);A[Q]=substr($0,RSTART,RLENGTH);next} {if($0 ~ /^R_/){match($0,/user.*Medexome/);if(A[substr($0,RSTART,RLENGTH)]){print "http://test/"A[substr($0,RSTART,RLENGTH)]"/"$0".tar.bz";delete A[substr($0,RSTART,RLENGTH)]}}}'   Input_file  Input_file

Also above code will only match in Input_file only once, let us know how it goes then.
EDIT: Adding a non-one liner form of solution on same.

awk 'FNR==NR && $0 ~ /^\/results/{
                                        match($0,/user.*Medexome/);
                                        Q=substr($0,RSTART,RLENGTH);
                                        match($0,/output.*/);
                                        A[Q]=substr($0,RSTART,RLENGTH);
                                        next
                                 }
                                 {
                                        if($0 ~ /^R_/){
                                                        match($0,/user.*Medexome/);
                                                        if(A[substr($0,RSTART,RLENGTH)]){
                                                                                         print "http://test/"A[substr($0,RSTART,RLENGTH)]"/"$0".tar.bz";
                                                                                         delete A[substr($0,RSTART,RLENGTH)]
                                                                                        }
                                                      }
                                 }
    '   Input_file  Input_file
 

Thanks,
R. Singh

1 Like

amazing, works great.... thank you very much for all your great help and patience as I continue to learn :slight_smile: