Hi everybody, I would greatly appreciate some expertise in this matter. I am trying find an efficient way to batch download files from a website and rename each file with the url it originated from (from the CLI). (ie. Instead of xyz.zip, the output file would be http://www.abc.com/xyz.zip\) A method using WGET is preferable but not absolutely necessary. I'm just starting to get comfortable with the command line; in part because of some of the great help I've gotten here before, so once again I'm back looking for some help from this excellent community
Okay, so far so good. Thanks for the tip. All I need to do now is figure out how I can automatically replace the filenames with their respective url addresses; is this somehow possible using grep or sed with the
You cannot name a file "http://www.abc.com/xyz.zip". Forward slashes are one of two illegal characters in unix filenames (null byte being the other). The best you can do without some kind of translation is to mirror the hierarchy, with each slash-delimited component in the url, except the last, being a directory. Even so, the "//" cannot be handled without some special treatment as a "//" in a pathname is treated identically to "/".
Thanks for your feedback alistar Maybe you can answer this one for me: How would I go about printing the url of a particular photo onto that photo (ie. watermark) as soon as its downloaded? I have already had limited success with the "convert" command using predefined text, however, I still haven't figured out the auto-url-watermark capability that I'm after. Thanks again!
It would help if you shared the code that you're using, along with a description of how it fails and the desired result (which I assume is to have the url watermarked on an image). Don't assume that we are familiar with the tool's you are using. However, even without specific knowledge of the tools involved, if there is a shortcoming in your shell script, we may be able assist.
$ man wget
WGET(1) GNU Wget WGET(1)
NAME
Wget - The non-interactive network downloader.
SYNOPSIS
wget [option]... ...
DESCRIPTION
GNU Wget is a free utility for non-interactive download of files from
the Web. It supports HTTP, HTTPS, and FTP protocols, as well as
retrieval through HTTP proxies.
...
--force-directories
The opposite of -nd---create a hierarchy of directories, even if
one would not have been created otherwise. E.g. wget -x
http://fly.srk.fer.hr/robots.txt will save the downloaded file to
fly.srk.fer.hr/robots.txt.
...
If you want to watermark an image everytime that wget fetches it , you have to separate each wget call per url in a loop in a shell script. Then everytime wget successfully downloaded an image, the script will call another tool to add watermark to the image. The problem arises with how you will save the images to your local directories.
{This part downloads photos based on the users query, in this case "Apples". The URLs which are access during the download process are then indexed in the file "picasalist}
GET "http://picasaweb.google.com/data/feed/base/all?alt=rss&kind=photo&access=public&filter=1&q=Apples&hl=en_US" | sed 's/</\n</g' | grep media:content |sed 's/.*url='"'"'\([^'"'"']*\)'"'"'.*$/\1/' > picasalist;
{This part watermarks the images with pre-defined text}
wget -c -i picasalist; mogrify -font helvetica -pointsize 12 -gravity southwest -draw 'fill black text 1,1 "Apples" fill white text 2,0 "Apples"' *.jpg
Now I just need to be able to string these together and add the URL -to-Watermark feature.
I also have a piece of code that isolates the filename from the URL:
cat picasalist | rev | cut -d\/ -f 1 | rev
Theoretically, one could compare the filename to the addresses in "picasalist", and then pass the corresponding URL off to the mogrify command and presto, mission accomplished! I just wish my technical ability was on par with my aspirations, lol. I have to say though, the kind people on these forums have always helped me in the right direction.