FTP in shell script and selecting files for upload

NickeZ28 · June 20, 2015, 7:38am

Hi,

Im a newbie with programming and shell scripting. Im running OSX/Mac and Darwin.

I would like to create a shell script that would :

Search a Volume and directory (including subdirectories) for a file that :

filename ends with �_Highres.pdf� and
the file creation date of that file is no older than 2 days

Upload this file to a FTP server.

I managed to figure out the part 2 my self, like this:

#!/bin/sh
USER=userid
PASSWD=userpw
HOST=ftphost
ftp -n $HOST <<SCRIPT
user $USER $PASSWD
binary
put some.file
quit
SCRIPT

But Im stuck at the above �some.file� that matches the �1.� above

I hope someone can give me some hints in the right direction.

Thanks,

Niklas

Aia · June 20, 2015, 11:41am

Using the utility find might help you search for that file.

find /path/to/directory -name "_Highres.pdf" -type f -ctime -2

find : the program to search for files, links and directories

/path/to/directory : location to start the search

-name "_Highres.pdf ": searches for by name

-type f : it has to be a file

-ctime -2 : it has to be less than two days old

Now, that might return one, several or no file meeting those conditions. It all depends of what you have in /path/to/directory. Something to think about when trying to upload.

Don_Cragun · June 20, 2015, 3:05pm

Aia's suggestion is close, but -name "_Highres.pdf" will only look for files named _Highres.pdf ; not all files with names ending with _Highres.pdf .

Note also that most UNIX filesystems do not have a file creation date timestamp. The

-ctime -2

find primary will limit the results to files whose status has changed in the last two days. If you use

-mtime -2

instead, it will limit the results to files whose contents have changed in the last two days. If you are using a system that has a filesystem that does have file creation date timestamps, the man page for find on your system may list a different primary that can be used to check those file creation timestamps.

Try:

find /path/to/directory -name '*_Highres.pdf' -type f -mtime -2

Presumably, you'll want to read the output from find in a loop and call ftp for each file found. Or, gather a list and if the list is not empty invoke ftp using mput instead of put .

NickeZ28 · June 20, 2015, 4:14pm

Thanks, I will try using this one:

find /path/to/directory -name '*_Highres.pdf' -type f -mtime -2

the files that I want to send will end with "*_Highres.pdf". I'll see the script to run each night at 00.05 and usually there will be just one file found each night. So I don't have to worry about looping I guess?

I could also change the: -mtime -2 to just going back one day, as in -mtime -1 since running the script with cron every night. It doesn't matter If I send duplicate files though, its better to do that than to miss any files, so maybe its better to keep it at -2

After some searching I will give up on the ftp command and go with cURL instead, that seemed more simple and I could get the whole command in just a single line.

So my full script will be:

#!/bin/sh
FILE=$(find /path/to/directory -name "*_HIGHRES.pdf" -type f -mtime -2)
curl --upload-file $FILE ftp://user:password@ftp.someserver.com

Don_Cragun · June 20, 2015, 4:26pm

nickez28:

Thanks, I will try using this one:

find /path/to/directory -name '*_Highres.pdf' -type f -mtime -2

the files that I want to send will end with "*_Highres.pdf". I'll see the script to run each night at 00.05 and usually there will be just one file found each night. So I don't have to worry about looping I guess?

I could also change the: -mtime -2 to just going back one day, as in -mtime -1 since running the script with cron every night. It doesn't matter If I send duplicate files though, its better to do that than to miss any files, so maybe its better to keep it at -2

After some searching I will give up on the ftp command and go with cURL instead, that seemed more simple and I could get the whole command in just a single line.

So my full script will be:
#!/bin/sh
FILE=$(find /path/to/directory -name "*_HIGHRES.pdf" -type f -mtime -2)
curl --upload-file $FILE ftp://user:password@ftp.someserver.com

Usually in the context above (especially when you expect one file per day and are looking for files for two days) seems like a disaster waiting to happen... Try:

#!/bin/sh
find /path/to/directory -name "*_HIGHRES.pdf" -type f -mtime -2 |
while read -r FILE
do      curl --upload-file "$FILE" ftp://user:password@ftp.someserver.com
done

which should work correctly when there are no files, when there is one file, when there are two files, and even if there are more files.

Or, if you really want a 1-liner:

find /path/to/directory -name "*_HIGHRES.pdf" -type f -mtime -2 -exec curl --upload-file "{}" ftp://user:password@ftp.someserver.com \;

NickeZ28 · June 20, 2015, 4:36pm

don cragun:

Usually in the context above (especially when you expect one file per day and are looking for files for two days) seems like a disaster waiting to happen... Try:
#!/bin/sh
find /path/to/directory -name "*_HIGHRES.pdf" -type f -mtime -2 |
while read -r FILE
do      curl --upload-file "$FILE" ftp://user:password@ftp.someserver.com
done
which should work correctly when there are no files, when there is one file, when there are two files, and even if there are more files.

Or, if you really want a 1-liner:
find /path/to/directory -name "*_HIGHRES.pdf" -type f -mtime -2 -exec curl --upload-file "{}" ftp://user:password@ftp.someserver.com \;

Perfect! thanks!

hmm. I would have also liked if there was a way to find files (pdf-documents) that contained 5 pages or more. But I guess that doesnt really qualify as a question in here, since there isn't such a shell command?

Niklas

Don_Cragun · June 20, 2015, 6:20pm

You could try some heuristic based on assuming that a PDF file containing 5 pages will be in a regular file larger than some number and that smaller files will contain fewer pages, but that would depend on very specific knowledge of the content and structure of the PDF files. One PDF page containing a high-res image could be much larger than a 100 pages text document saved in PDF format.

There may also be PDF processing utilities on your system that can return the number of pages in a PDF file, but there aren't any such utilities in the standards, I'm afraid I can't make any specific suggestions on a way to do that.

drl · June 21, 2015, 10:24am

Hi.

I have used pdftk in the context of Linux to report page counts of pdf documents. While looking for something else, I see that there is a package available for OS X, but I have not tried it on an OS X system I use:

OS, ker|rel, machine: Apple/BSD, Darwin 9.8.0, Power Macintosh
Distribution        : Mac OS X 10.5.8 (leopard workstation)

Best wishes ... cheers, drl

Fink - Package Database - Browse (Search = 'pdftk')

Fink - Package Database - Package pdftk (Handy tool for manipulating PDF)

NickeZ28 · June 21, 2015, 2:50pm

Wow thanks. Looks like a really good set of tools that I can hopefully make good use of.