How to detect and fix why crontab job is not executed?

Ophiuchus · March 5, 2019, 11:03pm

I have set several cron jobs. I recently added a new cron job that copies a file of last day from another server and is executed each day (for example at 04:00 am) but when I check next day the file hasn't been copied.

I'm working in GNU/Linux CentOS (2.6.32) system.

The files that I need to copy are created in the origin server before 23:00 hours.

I had the cron job like below: (The first line is the job that fails. related script to copy the files is script1.sh)

	
[root@srvc ~]# crontab -l
0 6 * * * /path/to/scripts/script1.sh
0 5 * * * /path/to/scripts/script3.sh
0 8 * * 1 /path/to/scripts/script4.sh

When I checked next day about 11:00 am I see the file wasn't copied, I edited cron job to be executed at 11:10 am and the file was copied successfully.

    
[root@srvc ~]# crontab -l
10 11 * * * /path/to/scripts/script1.sh

The script1.sh content is this:

#!/bin/bash
 
dyear=`date +'%Y' -d "1 day ago"`
dmonth=`date +'%b' -d "1 day ago"`
ddate=`date +%Y-%m-%d -d "1 day ago"`

sshpass -p 'ThePassword' scp -r root@X.X.X.X:/path/to/files/*$ddate* /Destination/path/$dyear/$dmonth/

The files to be copied have in their name the format Logfile.2018-01-17

How to know why cron job fails and how to fix it?

Thanks in advance.`

MadeInGermany · March 6, 2019, 3:07am

/path/to/files/*$ddate* is expanded on the calling host i.e. the destination host.
Quote the * characters to expand on the source host.
/path/to/files/\*$ddate\* or
/path/to/files/"*$ddate*" or
"/path/to/files/*$ddate*"
Note that within the " (double quotes) the $ddate is still expanded on the calling host.

RudiC · March 6, 2019, 4:41am

Is that behaviour reproducible? What's the difference between the script executing at 6:00h and 11:10h? Anything in the log files? If not, modify the script to log its steps. Is it possible the files does not (yet) exist on the source node at 6:00h?

bakunin · March 6, 2019, 5:19am

This was my first thought too. How about doing the following to investigate:

change the script like this:

#!/bin/bash
set -xv 
dyear=`date +'%Y' -d "1 day ago"`
dmonth=`date +'%b' -d "1 day ago"`
ddate=`date +%Y-%m-%d -d "1 day ago"`

sshpass -p 'ThePassword' scp -r root@X.X.X.X:/path/to/files/*$ddate* /Destination/path/$dyear/$dmonth/

Then modify your crontab like this:

10 11 * * * /path/to/scripts/script1.sh > /path/to/cronlog.log 2> /path/to/cronlog.err

and have a look at what is logged. My first suspect would be the unquoted globs too, like MadeInGermany already said.

Two things to notice: if you create cron jobs you should ALWAYS redirect their stdout and their stderr - either to a (log-)file or to /dev/null if you are not interested. Otherwise any output the script eventually generates creates a mail to root which you probably want to avoid.

Second, you should really, really do away with sshpass . Even the developers admit that it is ill advised to use it and it is offered just as a last straw effort. When such a process ist started you can see the password in cleartext in the process list - not to mention the script file itself. You might secure the script file against being read by everybody but the output of ps is public information.

You can easily try it yourself: open two terminal windows as a normal user to some host. Issue in one of them:

sshpass -p 'ThePassword' ssh root@X.X.X.X sleep 1000

Now issue in the other window, while this runs

ps -fe | grep sh

and you will see the password there.

I hope this helps.

bakunin

RudiC · March 6, 2019, 5:23am

Why would it fail at 6:00h but work at 11:10h, then?

Ophiuchus · March 6, 2019, 7:48am

Hi to all,

Thanks for your answers.

IMHO the issue is not the expansion of the variable because it worked at 11:10 but not at 06:00 and the script1.sh is the same.

The files to be copied are created before 23:00 hours, so that wouldn't be the reason either.

In order to redirect the log like bakunin saiys, how to know which is the path for cronlog.log and cronlog.err?

RudiC · March 6, 2019, 9:15am

This is up to you to chose. For temporary, transient debugging you might select your own home directory, for permanent logging, /var/log lends itself to usage.

Don_Cragun · March 6, 2019, 10:38am

I know it is unlikely, but has anyone checked whether or not the server is running at 6am on the day(s) when the cron job doesn't run? If there is some kind of daily maintenance operation that takes the server out of normal multi-user mode during that timeframe, cron jobs that were missed while not in multi-user mode will not be started if the system comes back up to normal multi-user mode after the job's scheduled start time.

Ophiuchus · March 6, 2019, 11:55am

Hi Don,

Thanks for your answer.

For now with the help of bukuni and RudiC I've set the cron in this way

  # crontab -l
0 6 * * * /path/to/scripts/script1.sh > /root/CronLog/cronlog.log 2> /root/CronLog/cronlog.err

The issue is that the cron job has failed every day of the week.

Do you know how to check if the source server was running a maintenance operation or something else when the cron job failed?

The source server model/version is

SunOS Generic_118833-22 sun4u sparc SUNW,Sun-Fire-V245

Thanks

RudiC · March 6, 2019, 12:55pm

Did you add the set -vx to the script so it prints useful info?
For the server maintenance, you check the server system logs. You could also search for the relevant cron entries there.

MadeInGermany · March 6, 2019, 1:25pm

You have got Solaris. (BTW a very old Solaris 10, certainly never updated.)
Solaris /usr/bin/date has no -d option. The trailing -d is ignored! (An initial -d would give an error).
Either you have got GNU date installed, then please add the path to it, usually /usr/local/bin/date or /usr/sfw/bin/date or /opt/csw/bin/date
Or install GNU date.
Or try the TZ trick:

dyear=`TZ=$TZ+24 /usr/bin/date +'%Y'`
dmonth=`TZ=$TZ+24 /usr/bin/date +'%b'`
ddate=`TZ=$TZ+24 /usr/bin/date +%Y-%m-%d`

Ophiuchus · March 6, 2019, 1:41pm

Hi RudyC. Yes, I've added set -vx. So I'll need to check by tommorow what appears in those logs. Thanks for the help.

--- Post updated at 02:41 PM ---

madeingermany:

You have got Solaris. (BTW a very old Solaris 10, certainly never updated.)
Solaris /usr/bin/date has no -d option. The trailing -d is ignored! (An initial -d would give an error).
Either you have got GNU date installed, then please add the path to it, usually /usr/local/bin/date or /usr/sfw/bin/date or /opt/csw/bin/date
Or install GNU date.
Or try the TZ trick:
dyear=`TZ=$TZ+24 /usr/bin/date +'%Y'`
dmonth=`TZ=$TZ+24 /usr/bin/date +'%b'`
ddate=`TZ=$TZ+24 /usr/bin/date +%Y-%m-%d`

Thanks for your suggestions, but since I get the date, year, month in destination server (CentOS) where the cron job is defined, then I don't have issues with date command.

The question about SunOS was more oriented how to know, if possible, if it runs some maintenance operation in a timeframe that includes the hour when the cron is configured on CentOS server (04:00 or 06:00 am).

If not I'll wait tommorow if something appears in log.

Thanks again.

Don_Cragun · March 6, 2019, 3:28pm

I don't remember the name of the log file that would contain the information about when the system changes states, but there has to be one. And, as has already been stated, look for cron's log file as well.

A quick and simple check would be to see what output you get from the command:

uptime

when run on that server. If it says the system has been up longer than when your last 6am cron job was skipped, that isn't your problem.

Ophiuchus · March 6, 2019, 4:50pm

don cragun:

I don't remember the name of the log file that would contain the information about when the system changes states, but there has to be one. And, as has already been stated, look for cron's log file as well.

A quick and simple check would be to see what output you get from the command:
uptime
when run on that server. If it says the system has been up longer than when your last 6am cron job was skipped, that isn't your problem.

Thanks for your answer Don.

On SunOS server where I copy the files from the uptime command gives this output.

# uptime
  4:28pm  up 839 day(s),  5:42,  1 user,  load average: 0.08, 0.07, 0.07

Don_Cragun · March 6, 2019, 5:03pm

OK. That server has been running for 2.5 years without a reboot (non uncommon for Sun servers). So that isn't your issue.

bakunin · March 7, 2019, 3:50am

You probably mean the file wtmp and it is typically located in /var/tmp . It is NOT plain text file. Regardless of where it exactly is (placement may vary across different OSes, always somewhere in /var ) you can prints its content in a formatted way by using the last command.

Note that in some (admittedly rather rare) cases the output of uptime can be misleading because in principle it is possible to tamper with the system date: start a server with the system clock set to 1980, change the date to 2019 and it may look as if the system is up nearly 40 years. The last command will print already written time stamps from the wtmp log so that it cannot be tampered with it this way.

I hope this helps.

bakunin

bakunin · March 7, 2019, 4:00am

A possible reason might be that globs are expanded ONLY if there is something they can be expanded to:

$ touch myfileA
$ touch myfileB
$ echo myfile*
myfileA myfileB
$ echo filedoesnotexist*
filedoesnotexist*

Notice that in the second case the asterisk is preserved. It would, in the context above, then be transferred (and maybe expanded, depending on a matching file existing there) by the remote system. In other words, the whole construct is totally unpredictable because it depends on the presence as well as the absence of certain files locally AND remotely.

I hope this helps.

bakunin

MadeInGermany · March 7, 2019, 4:47am

Regarding the "system boot":
the wtmp file is certainly not in /var/tmp/ because that is world-writable and might cause security problems.
It is usually located in /var/log/ or /var/adm/
One can filter for the "reboot" records with

last reboot

(Linux SuSE by default frequently rotates wtmp regardless of its size, so the last command is rendered almost useless.)
Further the last boot is stored in the utmp file. The who command does not print it; one needs

who -b

BTW on SysV-init-compatible systems the current run level is printed with

who -r

bakunin · March 7, 2019, 10:32am

You are right, i meant to write /var/log but wrote /var/tmp somehow.

bakunin

Ophiuchus · March 7, 2019, 12:38pm

Hello again,

The cron job was set to 06 am but the file wasn't copied again and cronlog.err says this:

cat /path/to/cronlog.err
sshpass -p 'ThePassword' scp -r root@X.X.X.X:/path/to/files/*$ddate* /Destination/path/$dyear/$dmonth/
+ sshpass -p 'ThePassword' scp -r 'root@X.X.X.X:/path/to/files/*2019-03-06*' /Destination/path/2019/Mar/
scp: /path/to/files/*2019-03-06*: No such file or directory

I've checked yesterday on source server (SunOS) at 23:30 and after 01:00 of today and the file logfile.2019-03-06 hasn't been created at that time.

But now checking the source server appears like the logfile was created yesterday at 22:00 but like I say above after 23:00 the file wasn't there.

pwd
/path/to/LogFiles/
# ls -l | tail -5
-rw-r--r--   1 root     root      447624 Mar  2 21:16 logfile.2019-03-02
-rw-r--r--   1 root     root      163406 Mar  3 21:14 logfile.2019-03-03
-rw-r--r--   1 root     root      480599 Mar  4 22:58 logfile.2019-03-04
-rw-r--r--   1 root     root      660980 Mar  5 23:42 logfile.2019-03-05
-rw-r--r--   1 root     root      376530 Mar  6 22:00 logfile.2019-03-06

So I think the file on SunOS appears like is created yesterday but probably was created after 06:00 of today, I don't know. It confuses me since the date appears March 6 at 22:00.

Is there a way to check on SunOS really when was created on /path/to/LogFiles/ directory?