If all URLs are given in a plain text file url.txt, how would I add the output file specification by repeating file name and revision number using sed?
This will replace the start of the line with 'wget' and add the output file specification parameter at the end. To add the proper output name, we must use a loop to repeat the file name and revision number, and use the
-E flag to enable extended regular expressions.
for url in $(cat url.data)
do
file=$(echo $url | sed -E 's/(.*)?rev=([0-9]+)/\1-\2\.zip/')
echo "wget $url -O $file"
done
The sed command in the loop creates the desired output file name by extracting the file name and revision number from the URL.
Ok, I understand that this solution does it in 2 steps: it keeps the url in one variable ($url) and creates a filename in another variable ($file) and recombines them.
I also understand that -E is the extended regular expression invocation, but what part of the regular expression is extended?
The regular expression used in the 'sed' command is an extended regular expression, which uses the -E option. In this regular expression, there are several special characters and features being used that are not part of basic regular expressions.
The regular expression 's/(.*)?rev=([0-9]+)/\1-\2.zip/' is replacing the original filename with new one, it is matching the pattern of the original filename with the following parts:
'(.*)': matches any characters that come before the "rev=" in the original filename. This is captured as the first group (\1) and is referred to later in the replacement string.
'rev=': matches the exact string "rev=" in the original filename.
'([0-9]+)': matches one or more digits that come after the "rev=" in the original filename. This is captured as the second group (\2) and is referred to later in the replacement string.
The replacement string '\1-\2.zip' is using the first and second groups matched in the pattern to create the new filename. It is combining the first group, a dash, the second group and the extension .zip
The 's' at the beginning of the expression indicates it's a substitution, the regular expression in between the first two slashes '/' is the pattern to match and the regular expression in between the last two slashes '/' is the replacement string.
So, this script is reading a file called url.data line by line, for each line it is extracting the filename from the url and renaming it by adding a suffix ( -.zip) to it, where is the number after "rev=" in the original url.