awk to extract digit in line of text and create link

I am trying to extract the number in bold (leading zero removed) after Medexome_xx_numbertoextract in file and create an output using that extracted number. In the output the on thing that will change is the number the other test is static and will be the same each time. Thank you :).

file

http://xxx.xx.xxx.xx/output/Home/Auto_user_S5-00580-6-Medexome_67_032/plugin_out/FileExporter_out.67/R_2016_09_20_10_12_41_user_S5-00580-6-Medexome.tar.bz2
http://xxx.xx.xxx.xx/output/Home/Auto_user_S5-00580-4-Medexome_65_028/plugin_out/FileExporter_out.52/R_2016_09_01_10_24_52_user_S5-00580-4-Medexome.tar.bz2

desired output

http://xxx.xx.xxx.xx/report/latex/32.pdf
http://xxx.xx.xxx.xx/report/latex/28.pdf

awk

awk {
    A[Q]=substr($0,RSTART,RLENGTH);
    next
}
    print "http://xxx.xx.xxx.xx/report/latex/"A[substr($0,RSTART,RLENGTH)]"$0".pdf";
delete A[substr($0,RSTART,RLENGTH)]
}' file

Hello cmccabe,

If you have each time exactly the same Input_file text then following may help you in same.

awk '{match($0,/.*\/output/);VAL=substr($0,RSTART,RLENGTH);match($0,/Auto.*_[0-9]+\//);VAL1=substr($0,RSTART,RLENGTH);gsub(/.*_0|.*_|\//,X,VAL1);print VAL"/report/latex/" VAL1".pdf"}'   Input_file

Output will be as follows.

http://xxx.xx.xxx.xx/output/report/latex/32.pdf
http://xxx.xx.xxx.xx/output/report/latex/28.pdf

Thanks,
R. Singh

1 Like
awk -F'/' '
{
  n=split($6,a,"_")
  pdf=a[n]+0
  print $1"//"$3 "/report/latex/" pdf ".pdf"
}' myFile

1 Like

A very short script :

$awk -F'[/_]' -vOFS=/ '{$10=$10+0 ;print "http:","",$3,"report/latex",$10 ".pdf"  }' urls.txt
http://xxx.xx.xxx.xx/report/latex/32.pdf
http://xxx.xx.xxx.xx/report/latex/28.pdf
$
1 Like

With any POSIX-conforming shell, you can do this just using shell variable expansions without needing to invoke awk :

while IFS= read -r url
do	head=${url%%/output/*}/report/latex/
	number=${url%%/plugin*}
	number=${number##*_}
	number=${number#0}
	number=${number#0}
	printf '%s%s.pdf\n' "$head" "$number"
done < file

which, if file contains:

http://xxx.xx.xxx.xx/output/Home/Auto_user_S5-00580-6-Medexome_67_032/plugin_out/FileExporter_out.67/R_2016_09_20_10_12_41_user_S5-00580-6-Medexome.tar.bz2
http://xxx.xx.xxx.xx/output/Home/Auto_user_S5-00580-4-Medexome_65_028/plugin_out/FileExporter_out.52/R_2016_09_01_10_24_52_user_S5-00580-4-Medexome.tar.bz2
http://xxx.xx.xxx.xx/output/Home/Auto_user_S5-00580-4-Medexome_65_728/plugin_out/FileExporter_out.52/R_2016_09_01_10_24_52_user_S5-00580-4-Medexome.tar.bz2
http://xxx.xx.xxx.xx/output/Home/Auto_user_S5-00580-4-Medexome_65_008/plugin_out/FileExporter_out.52/R_2016_09_01_10_24_52_user_S5-00580-4-Medexome.tar.bz2
http://xxx.xx.xxx.xx/output/Home/Auto_user_S5-00580-4-Medexome_65_000/plugin_out/FileExporter_out.52/R_2016_09_01_10_24_52_user_S5-00580-4-Medexome.tar.bz2

produces the output:

http://xxx.xx.xxx.xx/report/latex/32.pdf
http://xxx.xx.xxx.xx/report/latex/28.pdf
http://xxx.xx.xxx.xx/report/latex/728.pdf
http://xxx.xx.xxx.xx/report/latex/8.pdf
http://xxx.xx.xxx.xx/report/latex/0.pdf
1 Like

Hi,
For fun with sed (work with example input):
If url source as url destination:

sed -e 's/^\(\([^/]*\/\)\{3\}\).*_0*\([0-9]\+\)\/.*/\1report\/latex\/\3.pdf/' file

If url source not as url destination:

sed -e 's/^.*_0*\([0-9]\+\)\/.*/http:\/\/xxx.xx.xxx.xx\/report\/latex\/\1.pdf/' file

Regards.

1 Like

In case, the format isn't so much fixed, you could try something like this:

awk -F'/plug.*|/outp|_' '{print $1 "/report/latex/" $(NF-1)+0 ".pdf"}' file

--

Yet, it could be reduced a little bit further still ... :

awk -F'[/_]' '{print "http://" $3 "/report/latex", $10+0 ".pdf"}' file 
1 Like

Thank you for the help and explanations :slight_smile:

Not work fine, to correct by delete comma (in red) :slight_smile:
otherwise, it could be further reduced a little bit :

awk -F'[/_]' '$0="http://"$3"/report/latex/"$10+0".pdf"' file

Regards.

1 Like