Parsing cron data with awk

Elvirnith · February 4, 2013, 4:06pm

Heya,

I'm currently working on a script so I can see which cron jobs, if any, on a system are executing less frequently than 15 minutes (1 - 14 minutes). This is the only data I'm interested in. So far I have the following:

#!/bin/bash

IFS=$'\n';for line in `ls -f /var/spool/cron/*`;
 do
  echo -e '5\n10\n0\n20' | awk '{if($1 ~ "*" || ($1 ~ /^[0-9]+$/ && $1 != 0) && $1 < 15) print}' <${line};
done

That being said, I'm a bit stuck. That lists the data for minutes, but if a cron job is like so:

5 2 * * * echo "test"

It will still list that cron job, since it's not taking into account the other columns. Is there a decent way to search out only the cron jobs that meet said criteria while excluding those which execute on the hour/day/week?

Any help is appreciated. Thanks!

Chubler_XL · February 4, 2013, 4:30pm

A cronjob that runs every 6 mins would look like (V7 standard)

0,6,12,18,24,30,36,42,48,54 * * * *

or

0-54/6 * * * *

You should also consider some of these (should they be listed or not?):

* * * * 2 # every min on tuesdays
20-30 * * * * * # every min between 20 and 30 past hour
2-58/2 9-17 * * * # every 2 mins 9am - 5pm

Elvirnith · February 4, 2013, 4:39pm

Ah, I didn't even consider those... yeah, that's going to make it a lot more complicated than it already is.

Do you know of any docs / examples out there which cover specifically what I'm trying to do here? I've been reading awk docs throughout the day, but I've of yet to find anything of relevance. Just a piece here or there, but nothing that's helped me form a whole so far.

Chubler_XL · February 4, 2013, 5:04pm

Dosn't appear to be a lot of resources around for processing crontab entries.

Do you need to know if any job that MAY run less than 15 mins apart, or for a given day/hour will the script run more than twice in a 15min period?. Think about something that is setup to run every min but only on Fri 13th, should this be listed by your script or only when the run date is actually Friday 13th.

Elvirnith · February 4, 2013, 9:43pm

The script is only going to run once a week. I'm not interested in anything that runs every 15 minutes or greater. If a script runs on Friday at 2 pm, it should be ignored. Now, if a script is supposed to run on Friday every 5 minutes, then that of course could be something I want to see.

That being said, there are so many potential combinations that some of them are unrealistic for the scope of what I'm trying to do here. I basically just need to say that, if a script is set to execute between 1 and 14 minutes in frequency every hour, every day of the week then it should be flagged. However, if it's set to execute every hour at 5 past the hour then it should be ignored. Likewise with daily/weekly.

I'm wondering if potentially using additional regex's for the other columns would be a viable way to go. i.e. if * or 1 - 9 is found in columns 2, 3, 4, or 5 then exclude this cron from the list.

Don_Cragun · February 4, 2013, 10:17pm

chubler_xl:

A cronjob that runs every 6 mins would look like (V7 standard)
0,6,12,18,24,30,36,42,48,54 * * * *
or
0-54/6 * * * *
You should also consider some of these (should they be listed or not?):
* * * * 2 # every min on tuesdays
20-30 * * * * * # every min between 20 and 30 past hour
2-58/2 9-17 * * * # every 2 mins 9am - 5pm

Many systems still don't support / in the time fields (so check your cron and crontab man pages before worrying about /.

If your system supports / and one appears in the first field and the number after the / is less than 15, flag it.
Otherwise, if there is a - in the first field, flag it.
Otherwise, if there are one or more commas in the first field, sort the values separated by the commas and if the difference between any two adjacent values in the sorted list is less than 15, flag it. If the difference between the last value in the list and the (1st value in the list + 60) is less than 15, you have to determine if the hour field will ever yield consecutive hours. If it does, flag it. (Note that consecutive hours could also be hour 23 on one day and hour 0 on the next day, ... ... ...) Sorting the values in the minutes field is necessary because a minutes field of:

0,20,40,10,50,30

results in the job running every 10 minutes, but no two adjacent values are within 15 minutes of each other.

The logic isn't that hard, but there are enough special cases about which fields are considered depending on which fields have values specified in a crontab entry that this is not going to be a trivial task.

Good luck.