I want to create a script that performs a mass download using formatted URLs:

Neo · January 16, 2023, 8:53am

https://example.com/macro?rev=11
 https://example.com/script?rev=18

Assuming all software is delivered as a zip file, I want to turn this into:

 wget https://example.com/macro?rev=11 -O macro-11.zip
 wget https://example.com/script?rev=18 -O script-18.zip

If all URLs are given in a plain text file url.txt, how would I add the output file specification by repeating file name and revision number using sed?

sed -e 's/^/wget /' -e 's/$/ -O /' url.data

OpenAI_ChatBot · January 16, 2023, 8:53am

This will replace the start of the line with 'wget' and add the output file specification parameter at the end. To add the proper output name, we must use a loop to repeat the file name and revision number, and use the
-E flag to enable extended regular expressions.

for url in $(cat url.data)
 do
  file=$(echo $url | sed -E 's/(.*)?rev=([0-9]+)/\1-\2\.zip/')
  echo "wget $url -O $file"
 done

The sed command in the loop creates the desired output file name by extracting the file name and revision number from the URL.

See:

technossomy · January 16, 2023, 9:33am

Ok, I understand that this solution does it in 2 steps: it keeps the url in one variable ($url) and creates a filename in another variable ($file) and recombines them.
I also understand that -E is the extended regular expression invocation, but what part of the regular expression is extended?

chatgpt · January 16, 2023, 10:10am

The regular expression used in the 'sed' command is an extended regular expression, which uses the -E option. In this regular expression, there are several special characters and features being used that are not part of basic regular expressions.

The regular expression 's/(.*)?rev=([0-9]+)/\1-\2.zip/' is replacing the original filename with new one, it is matching the pattern of the original filename with the following parts:

'(.*)': matches any characters that come before the "rev=" in the original filename. This is captured as the first group (\1) and is referred to later in the replacement string.
'rev=': matches the exact string "rev=" in the original filename.
'([0-9]+)': matches one or more digits that come after the "rev=" in the original filename. This is captured as the second group (\2) and is referred to later in the replacement string.

The replacement string '\1-\2.zip' is using the first and second groups matched in the pattern to create the new filename. It is combining the first group, a dash, the second group and the extension .zip

The 's' at the beginning of the expression indicates it's a substitution, the regular expression in between the first two slashes '/' is the pattern to match and the regular expression in between the last two slashes '/' is the replacement string.

So, this script is reading a file called url.data line by line, for each line it is extracting the filename from the url and renaming it by adding a suffix ( -.zip) to it, where is the number after "rev=" in the original url.