i am trying to recursively save a remote FTP server but exclude the files immediately under a directory directory1
wget -r -N ftp://user:pass@hostname/directory1
I want to keep these which may have more files under them
directory1/dir1/file.jpg
directory1/dir2/file.jpg
directory1/dir3/file.jpg
but I want to exclude all the files directory under directory1, as these are the unsorted files
directory1/file1.jpg
directory1/file2.jpg
How can I exclude all the files immediately under directory1 but keep all other files?
thanks!
If you can get a list of directories, you can feed that into wget with the --no-parent option.
1 Like
the directories are constantly changing and it is a cron that runs regularly
so there arent always the same directories
also there would be hundreds of directories
i want to get these directories and everything under them but not the files that are above them
So? Get the listing and use it.
# Retrieve .listing file
wget --spider --no-remove-listing ftp://user:pass@wherever
# Extract the useful directories from it
awk '{ sub(/\r/, ""); }
/^d/ && ($9 != ".") && ($9 != "..") { print "ftp://user:pass@wherever/" $9 }' .listing
1 Like
thanks! that does seem to work but your commands are more advanced than what i know
for example i dont really understand what exactly the awk is doing but it does seem to be getting the directories
the first line saves the files into index.html
and the second one prints outs a lot of ftp:// lines
how do i feed that into the wget?
thanks!
If it's saving index.html, you forgot the --spider.
You can feed wget a list of URL's with awk '{...}' | wget -I - ...
1 Like
i get an error
.listing has sprung into existence.
it then saves it into index.html and index.html.1, etc. is there a way to make it go to the same file each time?
thanks!
Show exactly what you are doing.
1 Like
corona688:
If it's saving index.html, you forgot the --spider.
You can feed wget a list of URL's with awk '{...}' | wget -I - ...
I put the --spider but it says that still
so run wget withe the spider line
then again with it feeding into it?
like awk | wget?
or is that all just one command?
thanks!
Show exactly what you are doing, word for word, letter for letter, keystroke for keystroke.
1 Like
first i run this:
wget --spider --no-remove-listing ftp://user:pass@hostname/directory/
i think that worked, it made
=> �.listing�
then
awk '{ sub(/\r/, ""); } /^d/ && ($9 != ".") && ($9 != "..") { print "ftp://user:pass@hostname/directory/" $9 }' .listing
this gives the list of directories
if i run this it says
awk: cmd. line:1: fatal: cannot open file `.listing' for reading (No such file or directory)
wget: missing URL
awk '{ sub(/\r/, ""); } /^d/ && ($9 != ".") && ($9 != "..") { print "ftp://user:pass@hostname/directory/" $9 }' .listing | wget -r -N -nH --cut-dirs=1
im not sure what is the syntax for the feed im check it now!
It means exactly what it says: .listing is not there. Probably you didn't run the first command, or ran it in a different directory.
You forgot the -I - on the last command, also. I'd also suggest -x, so it saves files into folders based on the URL.
1 Like
id like to save all the directories into the same directory so that is ok
it says missing URL, but if the URL is being fed into it from the awk, what goes after the -I?
thanks!
awk '{ sub(/\r/, ""); } /^d/ && ($9 != ".") && ($9 != "..") { print "ftp://user:pass@host/directory1/" $9 }' .listing | wget -I -r -N -nH --cut-dirs=1
You put exactly what I said, -I -
The - tells it to read from stdin.
1 Like
when i put this
awk '{ sub(/\r/, ""); } /^d/ && ($9 != ".") && ($9 != "..") { print "ftp://user:pass@hostname/directory1/" $9 }' .listing | wget -I - -r -N -nH --cut-dirs=1
it says
wget: missing URL
Usage: wget [OPTION]... ...
Try �wget --help� for more options.
i think its really close but a small syntax thing
thanks!
Should be -i
instead of -I
, my mistake.
1 Like
yes that seems to work
thank you so much!!