wget output file names

jacobs.smith · July 16, 2012, 4:03pm

Hi,

I have a list of urls in my input.txt file like this

input.txt

http://unix.com/index.html?acc=OSR765454&file=filename1.gz
http://unix.com/index.html?acc=OBR765454&file=filename111.gz
http://unix.com/index.html?acc=ORS765454&file=filename1111.gz
http://unix.com/index.html?acc=OST76454&file=filename11111.gz
http://unix.com/index.html?acc=OS5454&file=filename1111111.gz

I am using this command

wget -i input.txt

My output files are coming with a weird file names like this

ls in file folder

index.html?acc=OSR765454&file=filename1.gz
index.html?acc=OBR765454&file=filename111.gz
index.html?acc=ORS765454&file=filename1111.gz
index.html?acc=OST76454&file=filename11111.gz
index.html?acc=OS5454&file=filename1111111.gz

How do I change this to just the filenames, like filename1.gz, filename11,gz...etc?

Thanks

---------- Post updated at 04:03 PM ---------- Previous update was at 03:18 PM ----------

Hi, I just figured out the solution.

Append -O desired_filename to the input urls.

For ex

http://unix.com/index.html?acc=OSR765454&file=filename1.gz -O filename1.gz

Now, issue this command

cat inputlist.txt | xargs wget inputlist.txt

You will get the filename as filename1.gz

Hope this helps someone.

Vryali · July 16, 2012, 4:08pm

Thanks for posting the solution!

Corona688 · July 16, 2012, 4:26pm

Useless Use of Cat

xargs ... < inputfile

jacobs.smith · July 16, 2012, 5:23pm

Hi guys,

My solution doesn't work. It leaves u with only the last downloaded file.

Even @Corona's solution is still doing the same.

Any other possibilities?

Scott · July 16, 2012, 5:44pm

while read URL; do
  wget $URL -O ${URL##*=}
done < input.txt

Ends up being:

wget http://unix.com/index.html?acc=OSR765454&file=filename1.gz -O filename1.gz
wget http://unix.com/index.html?acc=OBR765454&file=filename111.gz -O filename111.gz
wget http://unix.com/index.html?acc=ORS765454&file=filename1111.gz -O filename1111.gz
wget http://unix.com/index.html?acc=OST76454&file=filename11111.gz -O filename11111.gz
wget http://unix.com/index.html?acc=OS5454&file=filename1111111.gz -O filename1111111.gz

jacobs.smith · July 16, 2012, 7:30pm

Hi Scott,

Thanks for ur time.

I have more than 400 links.

So, the best possible solution would be to put all those links in a file and use wget -i inputlist.txt by changing the output file names.

alister · July 16, 2012, 8:18pm

Except that you can't tell wget how to filter the urls to generate the filenames. I believe scott's solution is the best you can do. Besides, doing the network i/o is probably going to be the bottleneck anyway, not the shell loop.

Alternatively, you can stick with your original approach, which generates the undesirable filenames, and rename the files afterwards (perhaps with the rename utility).

In my opinion, scott's solution is preferable. Keep it simple.

Regards,
Alister

jacobs.smith · July 17, 2012, 10:06am

Now,

I have downloaded as suggested by Alister. My files are named like this

index.html?acc=OSR765454&file=filename1.gz
index.html?acc=OBR765454&file=filename111.gz
index.html?acc=ORS765454&file=filename1111.gz
index.html?acc=OST76454&file=filename11111.gz
index.html?acc=OS5454&file=filename1111111.gz

I was using the following command

mv index.html?acc=OSR765454&file=filename1.gz filename1.gz

The error I see is the following

[1] 14709
mv: missing destination file operand after `index.html?acc=OSR765454'
Try `mv --help' for more information.
-bash: filename1.gz: command not found
[1]+  Exit 1                  mv index.html?acc=OSR765454

Any thoughts?

alister · July 17, 2012, 10:17am

The & is a special character to the shell. The command is cut off at that point. So the shell takes everything before it as a command to run in the background.

[1]+  Exit 1                  mv index.html?acc=OSR765454Exit 11

And then afterwards will run the following command:

file=filename1.gz filename1.gz

Which creates a variable named file with a value of filename1.gz intended for the environment of the command filename1.gz (which of course doesn't exist and so you get that command not found error).

To fix that mv command, quote any arguments with shell metacharacters. In your example, aside from the & , the ? is also special.

Don't blame this on me. I never suggested doing this. I merely mentioned that you could try using rename if you insisted on using wget -i . I suggested keeping it simple with scott's solution. I don't understand why you don't like that simple while-loop. Did you run it? If so, was it unacceptably slow? Did it give you an erroneous result?

Regards,
Alister

jacobs.smith · July 17, 2012, 10:22am

Alister,

I do love Scott's solution, but the point is after I get a bunch of those wget commands, I have to run them individually.

But, anyways I used the following command

mv index.html'?'acc=OSR765454'&'file=filename1.gz filename1.gz

And it works like a charm. Do you have a simple solution to do the above task in batch on all the files in the directory using rename? Because, I can't write the mv command for all those files in my directory. They are around 400 files. And the OSR number after acc= is varying. Can you suggest me something using sed and rename or sed and mv?

Thanks for ur time.

alister · July 17, 2012, 10:29am

Why manually? In your original post you stated that you had the list of urls in a file. Is that not the case? Are you not now using that file as an option-argument thusly:

If so, just use this instead (I took the liberty of adding -r and quotes to scott's code):

It's just as automated.

Or have I missed something?

Regards,
Alister

jacobs.smith · July 17, 2012, 10:39am

Alister,

You missed a point.

I did check my output files after editing the input.txt to the following

index.html?acc=OSR765454&file=filename1.gz -O filename1.gz
index.html?acc=OBR765454&file=filename111.gz -O filename111.gz
.....

This input.txt has to be given to the

command.

I did that and all I see is a single output file instead of one. If all the 5 file sizes together is 10GB. I see a single file filename1.gz to be 10GB.

Hope you got my point.

alister · July 17, 2012, 11:01am

jacobs.smith:

You missed a point.

I did check my output files after editing the input.txt to the following
index.html?acc=OSR765454&file=filename1.gz -O filename1.gz
index.html?acc=OBR765454&file=filename111.gz -O filename111.gz
.....

Scott's solution does not use -i and is intended to work with the input.txt as initially specified:

jacobs.smith:

Hi,
I have a list of urls in my input.txt file like this

input.txt

http://unix.com/index.html?acc=OSR765454&file=filename1.gz
http://unix.com/index.html?acc=OBR765454&file=filename111.gz
http://unix.com/index.html?acc=ORS765454&file=filename1111.gz
http://unix.com/index.html?acc=OST76454&file=filename11111.gz
http://unix.com/index.html?acc=OS5454&file=filename1111111.gz

Did you try his while-loop with that input data? It should work and no file renaming should need to be done.

Regards,
Alister