Wget - how to ignore files in immediate directory?

i am trying to recursively save a remote FTP server but exclude the files immediately under a directory directory1

wget -r -N ftp://user:pass@hostname/directory1

I want to keep these which may have more files under them

directory1/dir1/file.jpg
directory1/dir2/file.jpg
directory1/dir3/file.jpg

but I want to exclude all the files directory under directory1, as these are the unsorted files

directory1/file1.jpg
directory1/file2.jpg

How can I exclude all the files immediately under directory1 but keep all other files?

thanks!

If you can get a list of directories, you can feed that into wget with the --no-parent option.

1 Like

the directories are constantly changing and it is a cron that runs regularly
so there arent always the same directories
also there would be hundreds of directories
i want to get these directories and everything under them but not the files that are above them

So? Get the listing and use it.

# Retrieve .listing file
wget --spider --no-remove-listing ftp://user:pass@wherever
# Extract the useful directories from it
awk '{ sub(/\r/, ""); }
/^d/ && ($9 != ".") && ($9 != "..") { print "ftp://user:pass@wherever/" $9 }' .listing
1 Like

thanks! that does seem to work but your commands are more advanced than what i know
for example i dont really understand what exactly the awk is doing but it does seem to be getting the directories

the first line saves the files into index.html
and the second one prints outs a lot of ftp:// lines
how do i feed that into the wget?

thanks!

If it's saving index.html, you forgot the --spider.

You can feed wget a list of URL's with awk '{...}' | wget -I - ...

1 Like

i get an error

.listing has sprung into existence.

it then saves it into index.html and index.html.1, etc. is there a way to make it go to the same file each time?
thanks!

Show exactly what you are doing.

1 Like

I put the --spider but it says that still

so run wget withe the spider line
then again with it feeding into it?
like awk | wget?
or is that all just one command?
thanks!

Show exactly what you are doing, word for word, letter for letter, keystroke for keystroke.

1 Like

first i run this:

wget --spider --no-remove-listing ftp://user:pass@hostname/directory/

i think that worked, it made
=> �.listing�

then

awk '{ sub(/\r/, ""); } /^d/ && ($9 != ".") && ($9 != "..") { print "ftp://user:pass@hostname/directory/" $9 }' .listing 

this gives the list of directories

if i run this it says

awk: cmd. line:1: fatal: cannot open file `.listing' for reading (No such file or directory)
wget: missing URL

awk '{ sub(/\r/, ""); } /^d/ && ($9 != ".") && ($9 != "..") { print "ftp://user:pass@hostname/directory/" $9 }' .listing  |  wget -r -N  -nH --cut-dirs=1

im not sure what is the syntax for the feed im check it now!

It means exactly what it says: .listing is not there. Probably you didn't run the first command, or ran it in a different directory.

You forgot the -I - on the last command, also. I'd also suggest -x, so it saves files into folders based on the URL.

1 Like

id like to save all the directories into the same directory so that is ok

it says missing URL, but if the URL is being fed into it from the awk, what goes after the -I?
thanks!

awk '{ sub(/\r/, ""); } /^d/ && ($9 != ".") && ($9 != "..") { print "ftp://user:pass@host/directory1/" $9 }' .listing  |  wget -I -r -N  -nH --cut-dirs=1 

You put exactly what I said, -I -

The - tells it to read from stdin.

1 Like

when i put this

awk '{ sub(/\r/, ""); } /^d/ && ($9 != ".") && ($9 != "..") { print "ftp://user:pass@hostname/directory1/" $9 }' .listing  |  wget -I - -r -N  -nH --cut-dirs=1 

it says

wget: missing URL
Usage: wget [OPTION]... ...

Try �wget --help� for more options.

i think its really close but a small syntax thing
thanks!

Should be -i instead of -I , my mistake.

1 Like

yes that seems to work
thank you so much!!