wget output file names

Hi,

I have a list of urls in my input.txt file like this

input.txt

http://unix.com/index.html?acc=OSR765454&file=filename1.gz
http://unix.com/index.html?acc=OBR765454&file=filename111.gz
http://unix.com/index.html?acc=ORS765454&file=filename1111.gz
http://unix.com/index.html?acc=OST76454&file=filename11111.gz
http://unix.com/index.html?acc=OS5454&file=filename1111111.gz

I am using this command

wget -i input.txt

My output files are coming with a weird file names like this

ls in file folder

index.html?acc=OSR765454&file=filename1.gz
index.html?acc=OBR765454&file=filename111.gz
index.html?acc=ORS765454&file=filename1111.gz
index.html?acc=OST76454&file=filename11111.gz
index.html?acc=OS5454&file=filename1111111.gz

How do I change this to just the filenames, like filename1.gz, filename11,gz...etc?

Thanks

---------- Post updated at 04:03 PM ---------- Previous update was at 03:18 PM ----------

Hi, I just figured out the solution.

Append -O desired_filename to the input urls.

For ex

http://unix.com/index.html?acc=OSR765454&file=filename1.gz -O filename1.gz

Now, issue this command

cat inputlist.txt | xargs wget inputlist.txt

You will get the filename as filename1.gz

Hope this helps someone.

1 Like

Thanks for posting the solution!

1 Like

Useless Use of Cat

xargs ... < inputfile
1 Like

Hi guys,

My solution doesn't work. It leaves u with only the last downloaded file.

Even @Corona's solution is still doing the same.

Any other possibilities?

while read URL; do
  wget $URL -O ${URL##*=}
done < input.txt

Ends up being:

wget http://unix.com/index.html?acc=OSR765454&file=filename1.gz -O filename1.gz
wget http://unix.com/index.html?acc=OBR765454&file=filename111.gz -O filename111.gz
wget http://unix.com/index.html?acc=ORS765454&file=filename1111.gz -O filename1111.gz
wget http://unix.com/index.html?acc=OST76454&file=filename11111.gz -O filename11111.gz
wget http://unix.com/index.html?acc=OS5454&file=filename1111111.gz -O filename1111111.gz

Hi Scott,

Thanks for ur time.

I have more than 400 links.

So, the best possible solution would be to put all those links in a file and use wget -i inputlist.txt by changing the output file names.

Except that you can't tell wget how to filter the urls to generate the filenames. I believe scott's solution is the best you can do. Besides, doing the network i/o is probably going to be the bottleneck anyway, not the shell loop.

Alternatively, you can stick with your original approach, which generates the undesirable filenames, and rename the files afterwards (perhaps with the rename utility).

In my opinion, scott's solution is preferable. Keep it simple. :slight_smile:

Regards,
Alister

1 Like

Now,

I have downloaded as suggested by Alister. My files are named like this

index.html?acc=OSR765454&file=filename1.gz
index.html?acc=OBR765454&file=filename111.gz
index.html?acc=ORS765454&file=filename1111.gz
index.html?acc=OST76454&file=filename11111.gz
index.html?acc=OS5454&file=filename1111111.gz

I was using the following command

mv index.html?acc=OSR765454&file=filename1.gz filename1.gz

The error I see is the following

[1] 14709
mv: missing destination file operand after `index.html?acc=OSR765454'
Try `mv --help' for more information.
-bash: filename1.gz: command not found
[1]+  Exit 1                  mv index.html?acc=OSR765454

Any thoughts?

The & is a special character to the shell. The command is cut off at that point. So the shell takes everything before it as a command to run in the background.

[1]+  Exit 1                  mv index.html?acc=OSR765454Exit 11

And then afterwards will run the following command:

file=filename1.gz filename1.gz

Which creates a variable named file with a value of filename1.gz intended for the environment of the command filename1.gz (which of course doesn't exist and so you get that command not found error).

To fix that mv command, quote any arguments with shell metacharacters. In your example, aside from the & , the ? is also special.

Don't blame this on me. :wink: I never suggested doing this. I merely mentioned that you could try using rename if you insisted on using wget -i . I suggested keeping it simple with scott's solution. I don't understand why you don't like that simple while-loop. Did you run it? If so, was it unacceptably slow? Did it give you an erroneous result?

Regards,
Alister

Alister,

I do love Scott's solution, but the point is after I get a bunch of those wget commands, I have to run them individually.

But, anyways I used the following command

mv index.html'?'acc=OSR765454'&'file=filename1.gz filename1.gz

And it works like a charm. Do you have a simple solution to do the above task in batch on all the files in the directory using rename? Because, I can't write the mv command for all those files in my directory. They are around 400 files. And the OSR number after acc= is varying. Can you suggest me something using sed and rename or sed and mv?

Thanks for ur time.

Why manually? In your original post you stated that you had the list of urls in a file. Is that not the case? Are you not now using that file as an option-argument thusly:

If so, just use this instead (I took the liberty of adding -r and quotes to scott's code):

It's just as automated.

Or have I missed something?

Regards,
Alister

Alister,

You missed a point.

I did check my output files after editing the input.txt to the following

index.html?acc=OSR765454&file=filename1.gz -O filename1.gz
index.html?acc=OBR765454&file=filename111.gz -O filename111.gz
.....

This input.txt has to be given to the

command.

I did that and all I see is a single output file instead of one. If all the 5 file sizes together is 10GB. I see a single file filename1.gz to be 10GB.

Hope you got my point.

Scott's solution does not use -i and is intended to work with the input.txt as initially specified:

Did you try his while-loop with that input data? It should work and no file renaming should need to be done.

Regards,
Alister