Bash script monitor directory and subdirectories for new pdfs

markus1981 · February 27, 2015, 3:27am

I need bash script that monitor folders for new pdf files and create xml file for rss feed with newest files on the list. I have some script, but it reports errors.

#!/bin/bash

SYSDIR="/var/www/html/Intranet"
HTTPLINK="http://TYPE.IP.ADDRESS.HERE/pdfs"
FEEDTITLE="Najnoviji dokumenti na Intranetu OUG"
FEEDLINK="http://TYPE.IP.ADDRESS.HERE/pdfs"
FEEDDESC="Novi dokumenti"
RSSDIR="/var/www/html/rss"
#DESC="`date`"



function testing_variables {
        if [ ! -d ${RSSDIR} ]; then
                echo -e 'ERROR: $RSSDIR does not exists!\nPlease create a directory and set the right path for $RSSDIR variable!'
                exit 1
        fi

        if [ ! -d ${SYSDIR} ]; then
                echo -e 'ERROR: $SYSDIR does not exists!\nPlease create a directory and set the right path for $SYSDIR variable!'
        fi
}

function rss_header {
### RSS HEADER
echo "<!--?xml version=\"1.0\"?-->
<rss version="\"2.0\"">
  <channel>
        <title>${FEEDTITLE}</title>
        <link>${FEEDLINK}
        <description>${FEEDDESC}</description>" > $1
}

function rss_body {
#RSS BODY
for FILES in `find ${SYSDIR} -type f -name "*.pdf" | xargs ls -t | grep -i ${2}`; do
NAME="`basename $FILES`"
#PARENTDIR="`dirname $FILES | awk -F "/" '{print $NF}'`"

echo "  <item>
                <title>${NAME}</title>
                <link>${HTTPLINK}/${2}/${NAME}
<!--                <description>${DESC}</description> -->
        </item>" >> ${1}
done
}

function rss_footer {
### RSS FOOTER
echo "</channel></rss>" >> ${1}
}


### Main code ###
 
for FILES in `find ${SYSDIR} -type f -name "*.pdf" | xargs ls -t`; do
        PARENTDIR="`dirname $FILES | awk -F "/" '{print $NF}'`"
 
        rss_header ${RSSDIR}/${PARENTDIR}.xml
        rss_body   ${RSSDIR}/${PARENTDIR}.xml ${PARENTDIR}
        rss_footer ${RSSDIR}/${PARENTDIR}.xml
done

It reports error on line 13 "syntax error near unexpected token '$'{\r' '
and 'function testing_variables {

Please some help?

agent.kgb · February 27, 2015, 4:00am

don't use notepad for scripts, use vi!

tr -d '\r' script.sh >script1.sh

Don_Cragun · February 27, 2015, 5:36am

One could also ask why you define a function that is never called???

markus1981 · February 27, 2015, 6:45am

I am not very familiar with code in this script, I just adapted existing script from script library, changed folder names and file type. Please can you review script and correct errors?

Don_Cragun · February 27, 2015, 6:58am

Now that you have made changes to your script based on the suggestions you have already received, what errors are you getting? What does your script look like now? What is it doing wrong?

Or, are you just saying that you want the UNIX and Linux Forums to serve as your unpaid programming staff?

markus1981 · February 27, 2015, 7:14am

I commented out function testing_variables line.
When I run script I get:

syntax error near unexpected token near '$'do\r' '
'for FILES in 'find ${SYSDIR} -type f -name "*.pdf" | xargs ls -t | grep -i ${2}' ; do

No, I just expect this forum to help me to learn how to adjust this code. Thanks in advance.

Don_Cragun · February 27, 2015, 7:18am

With that error (again), you clearly did not follow agent.kgb's advice in message #2 in this thread.

markus1981 · February 27, 2015, 7:45am

Done that, it says: tr: extra operand `feedgen.sh`
only one string may be given when deleting without squeezing repeats.

Result is empty feedgen1.sh file

Don_Cragun · February 27, 2015, 7:57am

So try:

tr -d '\r' < feedgen.sh > feedgen1.sh

If you try something and it doesn't work, it would help if you tell us it didn't work instead of having us assume that everything that was suggested worked.

markus1981 · February 27, 2015, 8:21am

I fixed errors with dos2unix command, but when I run script, there is no output in rss folder, and there is no errors reported after running the script. I inserted pdf in SYSDIR before running script.

Don_Cragun · February 27, 2015, 10:56pm

You have a loop over all PDF files in and under $SYSDIR in your main loop that includes a call to rss_body which includes a loop over all PDF files in and under $SYSDIR . This is almost certainly not what you want, but without a better description of where PDF files are located in your file hierarchy and what XML files you're trying to create, I'm not clear on what you want to accomplish.

What OS are you using? How are you invoking feedgen1.sh ?

Are you getting any output at all from feedgen1.sh ?

Are any files being created by your current script? (And, if so, what is in them?)

What is the output from the commands:

ls -l feedgen1.sh
find $SYSDIR -type f -name '*.pdf' -exec ls -l {} +
find $RSSDIR -type d -exec ls -ld {} +
find $RSSDIR -type f -name '*.xml' -exec ls -l {} +

Is the output you see from the above commands representative of the locations of the PDF files you want to report in your XML files (or do you just have 1 or 2 PDF files installed for testing)? Is the output you see from the above commands representative of the directory structure you hope to see under $RSSDIR ?

What XML files are you hoping to create from the output shown by the above commands?

The variable DESC is unset in this script, but $DESC is used in rss_body . What is the description tag in your XML files supposed to contain for your PDF files?

markus1981 · February 28, 2015, 2:31am

Files will be located in subfolders inside Intranet folder specified in the script. I will upload them manually, daily. I want script to create xml file for rss feed, for new files uploaded. OS is Ubuntu server 14.04, script is invoked by cronjob. There is no output at all. PDFs will be uploaded in folders, not created by script. I will try your code in monday, at work.

Don_Cragun · February 28, 2015, 3:32am

OK. So you will have PDF files in multiple folders in the file hierarchy rooted in $SYSDIR .

What are you hoping to produce? Are you trying to produce:

one XML file in $RSSDIR for all of the PDF files in and under $SYSDIR ,
one XML file in $RSSDIR for each directory in and under $SYSDIR containing one or more PDF files, or
one XML file in a subdirectory under $RSSDIR corresponding to each subdirectory under $SYSDIR containing one or more PDF files?

What data is supposed to be included with the <description> tags in your XML file(s) for each PDF file?

Peasant · February 28, 2015, 3:51am

If you are on Linux, you might consider using inofitywait in your main section.
Something like :

#!/bin/bash
DIR=/dir/to/watch
inotifywait -m -e create --format %f $DIR | while read File
do
	case ${File##*.} in
	[Pp][Dd][Ff])
		printf "%s\n" "Found pdf file $File with ext ${File##*.}" # here you will call function per detected pdf filename, log, handle errors
	;;
	esac
done

markus1981 · February 28, 2015, 6:19am

I need one xml for all pdfs in and under specified directory.
For description tags I need pdf name, and folder name for two levels up. Info should be sorted by date, showing newest first, and links to files.

Don_Cragun · February 28, 2015, 3:36pm

Your script is creating 3 XML tags per PDF file in the rss_body function:

the data stored between <title> and </title> tags in the final component of the absolute pathname of the PDF file,
the data stored after the <link> tag (there is no closing </link> tag) is the string stored in the shell variable $RSSDIR followed by a slash, the last directory in the absolute pathname of a PDF file (not necessarily the directory of the current PDF file's pathname), followed by a slash and the final component of the absolute pathname of the PDF file, and
the data stored between <description> and </description> tags is an empty string.

Should there be a </link> tag after the data you insert following the <link> tag?

Please show an explicit example of the data that you want created for the following PDF file:

-rw-r--r--  1 dwc  staff  2895323 Oct 23  2013 /var/www/html/Intranet/pdf/IEEE/20601-Rev-D7r02-clean.pdf

Your current code is creating one XML file for each different final directory name in the PDF pathnames found. One of these XML files is created for each PDF file found. If another PDF file with the same final directory name is found, it overwrites the previous XML file. (This is just slow if there is only one directory under $SYSDIR with that name. If there are two or more directories with the same final component name, there could be several problems.)

Furthermore, if there are more PDF files than xargs will process in a single invocation of ls your files will NOT be sorted from newest to oldest; there will be groups of PDF files sorted in timestamp order, but the complete list might not be correctly ordered. So, to get a time ordered list of files we either need to gather data needed for each file into single lines in a file that we can sort by timestamp, or we need to create files in a single directory with the same timestamps as your PDF files that we can then sort us ls -t . Creating a single file will probably be faster if you have an easy way to convert file timestamps into text. If not, we can use touch to copy file timestamps to other files.

Does your system have a stat utility?

markus1981 · March 2, 2015, 2:47am

don cragun:

You have a loop over all PDF files in and under $SYSDIR in your main loop that includes a call to rss_body which includes a loop over all PDF files in and under $SYSDIR . This is almost certainly not what you want, but without a better description of where PDF files are located in your file hierarchy and what XML files you're trying to create, I'm not clear on what you want to accomplish.

What OS are you using? How are you invoking feedgen1.sh ?

Are you getting any output at all from feedgen1.sh ?

Are any files being created by your current script? (And, if so, what is in them?)

What is the output from the commands:
ls -l feedgen1.sh
find $SYSDIR -type f -name '*.pdf' -exec ls -l {} +
find $RSSDIR -type d -exec ls -ld {} +
find $RSSDIR -type f -name '*.xml' -exec ls -l {} +
Is the output you see from the above commands representative of the locations of the PDF files you want to report in your XML files (or do you just have 1 or 2 PDF files installed for testing)? Is the output you see from the above commands representative of the directory structure you hope to see under $RSSDIR ?

What XML files are you hoping to create from the output shown by the above commands?

The variable DESC is unset in this script, but $DESC is used in rss_body . What is the description tag in your XML files supposed to contain for your PDF files?

Output from commands:

ls -l feedgen.sh
 -rwxr-x--- 1 root root 1529 Feb 27 14:27 feedgen.sh

find $SYSDIR -type f -name '*.pdf' -exec ls -l {} +
no output

> find $RSSDIR -type d -exec ls -ld {} + drwxr-xr-x 2 root root 4096 Feb 27 14:12 .

> find $RSSDIR -type f -name '*.xml' -exec ls -l {} +
no output

I have STAT utility, I think. Does code work on your system? On my system there is no output after running the code at all, still.

Don_Cragun · March 2, 2015, 5:30am

The > at the start of next to the last two command lines you typed is a secondary prompt indicating that you probably mistyped or omitted a quote in the first find command. Hit the control (or cntl or ctl depending on who made your keyboard) key and the c key at the same time to generate an interrupt signal to get back to your primary prompt. And, then, copy the find command I requested and paste it into your shell.

And, for the third time, what operating system are you using?

markus1981 · March 2, 2015, 6:06am

I copied your commands to Webmin command shell, when I tried it, no mistype I think. OS is Ubuntu Server 14.04.

Don_Cragun · March 2, 2015, 4:51pm

I'm sorry. I completely misunderstood your problem. I thought you wanted help with a bash shell script. Now it appears that you are unable to run simple bash shell commands on the system where you want to run that bash shell script with the environment variables set as they are used in that script.

If you look back through this thread, you'll see that I have asked several questions that you still have not answered. Without answers to those questions (and a real sample of the desired output based on at least two PDF files from different direcctories), I can't figure out what needs to be done to satisfy your requirements.