awk to extract digit in line of text and create link

cmccabe · October 28, 2016, 1:23pm

I am trying to extract the number in bold (leading zero removed) after Medexome_xx_numbertoextract in file and create an output using that extracted number. In the output the on thing that will change is the number the other test is static and will be the same each time. Thank you :).

file

http://xxx.xx.xxx.xx/output/Home/Auto_user_S5-00580-6-Medexome_67_032/plugin_out/FileExporter_out.67/R_2016_09_20_10_12_41_user_S5-00580-6-Medexome.tar.bz2
http://xxx.xx.xxx.xx/output/Home/Auto_user_S5-00580-4-Medexome_65_028/plugin_out/FileExporter_out.52/R_2016_09_01_10_24_52_user_S5-00580-4-Medexome.tar.bz2

desired output

http://xxx.xx.xxx.xx/report/latex/32.pdf
http://xxx.xx.xxx.xx/report/latex/28.pdf

awk

awk {
    A[Q]=substr($0,RSTART,RLENGTH);
    next
}
    print "http://xxx.xx.xxx.xx/report/latex/"A[substr($0,RSTART,RLENGTH)]"$0".pdf";
delete A[substr($0,RSTART,RLENGTH)]
}' file

RavinderSingh13 · October 28, 2016, 1:46pm

Hello cmccabe,

If you have each time exactly the same Input_file text then following may help you in same.

awk '{match($0,/.*\/output/);VAL=substr($0,RSTART,RLENGTH);match($0,/Auto.*_[0-9]+\//);VAL1=substr($0,RSTART,RLENGTH);gsub(/.*_0|.*_|\//,X,VAL1);print VAL"/report/latex/" VAL1".pdf"}'   Input_file

Output will be as follows.

http://xxx.xx.xxx.xx/output/report/latex/32.pdf
http://xxx.xx.xxx.xx/output/report/latex/28.pdf

Thanks,
R. Singh

vgersh99 · October 28, 2016, 1:48pm

awk -F'/' '
{
  n=split($6,a,"_")
  pdf=a[n]+0
  print $1"//"$3 "/report/latex/" pdf ".pdf"
}' myFile

blastit.fr · October 28, 2016, 3:53pm

A very short script :

$awk -F'[/_]' -vOFS=/ '{$10=$10+0 ;print "http:","",$3,"report/latex",$10 ".pdf"  }' urls.txt
http://xxx.xx.xxx.xx/report/latex/32.pdf
http://xxx.xx.xxx.xx/report/latex/28.pdf
$

Don_Cragun · October 28, 2016, 4:34pm

With any POSIX-conforming shell, you can do this just using shell variable expansions without needing to invoke awk :

while IFS= read -r url
do	head=${url%%/output/*}/report/latex/
	number=${url%%/plugin*}
	number=${number##*_}
	number=${number#0}
	number=${number#0}
	printf '%s%s.pdf\n' "$head" "$number"
done < file

which, if file contains:

http://xxx.xx.xxx.xx/output/Home/Auto_user_S5-00580-6-Medexome_67_032/plugin_out/FileExporter_out.67/R_2016_09_20_10_12_41_user_S5-00580-6-Medexome.tar.bz2
http://xxx.xx.xxx.xx/output/Home/Auto_user_S5-00580-4-Medexome_65_028/plugin_out/FileExporter_out.52/R_2016_09_01_10_24_52_user_S5-00580-4-Medexome.tar.bz2
http://xxx.xx.xxx.xx/output/Home/Auto_user_S5-00580-4-Medexome_65_728/plugin_out/FileExporter_out.52/R_2016_09_01_10_24_52_user_S5-00580-4-Medexome.tar.bz2
http://xxx.xx.xxx.xx/output/Home/Auto_user_S5-00580-4-Medexome_65_008/plugin_out/FileExporter_out.52/R_2016_09_01_10_24_52_user_S5-00580-4-Medexome.tar.bz2
http://xxx.xx.xxx.xx/output/Home/Auto_user_S5-00580-4-Medexome_65_000/plugin_out/FileExporter_out.52/R_2016_09_01_10_24_52_user_S5-00580-4-Medexome.tar.bz2

produces the output:

http://xxx.xx.xxx.xx/report/latex/32.pdf
http://xxx.xx.xxx.xx/report/latex/28.pdf
http://xxx.xx.xxx.xx/report/latex/728.pdf
http://xxx.xx.xxx.xx/report/latex/8.pdf
http://xxx.xx.xxx.xx/report/latex/0.pdf

disedorgue · October 28, 2016, 6:15pm

Hi,
For fun with sed (work with example input):
If url source as url destination:

sed -e 's/^\(\([^/]*\/\)\{3\}\).*_0*\([0-9]\+\)\/.*/\1report\/latex\/\3.pdf/' file

If url source not as url destination:

sed -e 's/^.*_0*\([0-9]\+\)\/.*/http:\/\/xxx.xx.xxx.xx\/report\/latex\/\1.pdf/' file

Regards.

Scrutinizer · October 28, 2016, 8:46pm

In case, the format isn't so much fixed, you could try something like this:

awk -F'/plug.*|/outp|_' '{print $1 "/report/latex/" $(NF-1)+0 ".pdf"}' file

--

blastit.fr:

A very short script :

$awk -F'[/_]' -vOFS=/ '{$10=$10+0 ;print "http:","",$3,"report/latex",$10 ".pdf"  }' urls.txt
http://xxx.xx.xxx.xx/report/latex/32.pdf
http://xxx.xx.xxx.xx/report/latex/28.pdf
$

Yet, it could be reduced a little bit further still ... :

awk -F'[/_]' '{print "http://" $3 "/report/latex", $10+0 ".pdf"}' file

cmccabe · October 29, 2016, 12:00pm

Thank you for the help and explanations

disedorgue · October 29, 2016, 2:21pm

Not work fine, to correct by delete comma (in red)
otherwise, it could be further reduced a little bit :

awk -F'[/_]' '$0="http://"$3"/report/latex/"$10+0".pdf"' file

Regards.