Text substitution & getting file name from url

texttoolong · March 17, 2011, 12:22pm

hi, sorry if this seems trivial.

i have a file url.txt which consists of a list of urls (it was supposed to be my wget -i file). however, since the server from which i am trying to download uses redirect, wget dows not remeber the filename of ther original url will save to a file name which is unreadable. my solution is to use a script to read the urls line by line and to write a wget script command in to another text file todo.txt which i could then run as a script.

url.txt looks like

http://serverdoingstrangethings.com/file1.ext
http://serverdoingstrangethings.com/file2.ext

todo.txt should look like

wget http://serverdoingstrangethings.com/file1.ext -Ofile1.ext
wget http://serverdoingstrangethings.com/file2.ext -Ofile2.ext

(simplified, the actual wget-line will have a few more arguments)

after writing to the todo.txt the line in url.txt should be deleted.

i am a script newbie (i tried a few dos-batch and vba-tingies years ago) and am much to shy to post what I have written so for. my problem lies especialy with getting the file name from the url. could you help - i know i should be reading up on regular expressions and i certainly will, but could you point me in the right direction?

thank you so much,

thomas

bartus11 · March 17, 2011, 12:29pm

You don't need todo.txt. This will execute wget "on the fly":

while read line; do wget $line -O`echo $line | cut -d'/' -f4`; done < url.txt

texttoolong · March 17, 2011, 12:49pm

thank you so much, bartus11. i implemented it and had to change the -f4 to -f6 since there were a few more directories in the url. works like a charm. how wonderfult. thank you!

irrevocabile · March 17, 2011, 3:11pm

Simply insert the stuff with sed?

sed 's/^/wget /' <yourdata.txt | sed 's/$/ -Ofile1.ext/' >result