Need help looking for missing hours.

I have a file that should cover a days worth of stats, at the beginning of each 15 minute report I have a unique header that looks like the below example. The "0000" and "0015" will change in the header line to show which 15 Minute interval the report is covering and of course from day to day the "A20130604" will change based on the date. My thinking in the script is there should be 96 headers for a complete day.

A20130604.0000-0600-0015-0600_SubNetwork=DMME1,ManagedElement=ces-1

I am trying to write I quick "if else then" script to let me know if I have all the reports. I have tried the following:

#!/bin/ksh

x='cat /home/fsan/output/samfile | grep DMME | wc -l'
y=96

if [ "$x" -le 95 ]
then
        echo "You are missing files"
else
        echo "All hourly files appear to be present"
fi

When I run the file I get:
samcheck[4]: cat /home/fsan/output/samfile | grep DMME | wc -l: bad number

I have also tried changing the

if [ "$x" -le 95 ]

line to

if [ "$x" < $y ]

Then I get:

samcheck[5]: 96: cannot open

Any help on what I am doing wrong would be amazing!! I also wouldn't mind help on making a more dynamic script that can maybe tell me which hours I am missing.

Here you store the string in x

x='cat /home/fsan/output/samfile | grep DMME | wc -l'
echo $x
cat /home/fsan/output/samfile | grep DMME | wc -l

You have mixed backtics `code` with 'code'

God way to do it.

x=$(cat /home/fsan/output/samfile | grep DMME | wc -l)

also works

x=`cat /home/fsan/output/samfile | grep DMME | wc -l`

other ways:

x=$(awk '/DMME/ {a++} END {print a}' /home/fsan/output/samfile)
x=$(awk '/DMME/' /home/fsan/output/samfile | wc -l)
1 Like

What about

[ $(grep -c DMME /home/fsan/output/samfile) -eq 96 ] && echo good || echo bad

?

I can't answer your question regarding which files are missing as I don't understand the time pattern in your file name. Pls explain.

1 Like

The time pattern is based on a 24 hour clock

A20130604.0000-0600-0015-0600_SubNetwork=DMME1,ManagedElement=ces-1
A20130604

is the date field A=the stream (we only have one stream but if we had multiple it would change) 20130604 = YYYYMMDD

0000-0600-0015-0600

0000 = The start time in the 15 minute report where 0000 is 12AM
0600 = Our system number so it will NEVER change
0015 = Then end time in the 15 minute report where 0015 is 12:15AM
0600 = Our system number so it will NEVER change

all the rest:
_SubNetwork=DMME1,ManagedElement=ces-1is just system info and will never change as well.

So the 0000 should count up to 2345 (11:45 PM)?

Let me start off by saying thank your to Jotne and RudiC for your quick and helpful answers, I greatly appreciate it

To RudiC's question:
Correct the first 0000 should count up to 2345 in increments of 15 minutes for example: 0000 , 0015, 0030, 0045, 0100, 0115

so the time lines will look like this:

0000-0600-0015-0600
0015-0600-0030-0600
0030-0600-0045-0600
0045-0600-0100-0600
0100-0600-0115-0600
0115-0600-0130-0600
0130-0600-0145-0600
0145-0600-0200-0600

and so on up to the final report of the day being

2345-0600-0000-0600

You might want to try

awk     '/DMME/ {ACT=substr($0,11,4)
                 if (ACT - EXP) print EXP
                 CNT += .25
                 EXP  = sprintf ("%04d", int(CNT)*100 + (CNT - int(CNT))*60)
                }
         END    {for (CNT; CNT<24; CNT+=.25) printf ("%04d\n", int(CNT)*100 + (CNT - int(CNT))*60)
                }
        ' file

---------- Post updated at 21:35 ---------- Previous update was at 20:48 ----------

Don't! It does not synchronize well once a line/time is lost. Try this:

awk     'BEGIN  {for (CNT=0; CNT<24; CNT+=.25) Arr[sprintf ("%04d", int(CNT)*100 + (CNT - int(CNT))*60)]}
         /DMME/ {delete Arr[substr($0,11,4)]}
         END    {for (i in Arr) print i}
        ' file 

Pipe the result through sort | sort if you need the item in ascending sequence.

1 Like

Rudi's 2nd awk script is great for the given problem. If at some point in the future, you're interested in processing a samfile containing data for multiple days and/or multiple streams (or if you want to know if there are multiple input lines for a given stream on a given date that have the same start time, you could try the following more complex awk script:

awk -F'[-.]' '
/DMME/ {# Note that the following tests will not catch start times that are not
        # even multiples of 15 minutes and will not catch end times like 0160
        # when paired with a start time like 0145.  If these issues are a
        # concern, I leave fixing it as an exercise for the reader.
        # However, if any start time that is a multiple of 15 mintues is not
        # present that will be caught during END processing.
        if($2 + 15 != $4 &&
                (substr($2, 3) != 45 ||
                        (((substr($2, 1, 2) + 1) % 24) "00") + 0 != $4)) {
                printf("Skipping: %s\n\t%s not 15 minutes after %s\n",
                        $0, $4, $2)
                next
        }
        if(start[$1, $2]++) {
                printf("Skipping: %s\n\t", $0) 
                printf("%d entries seen for %s stream %s @ start time %s\n",
                        start[$1, $2], substr($1, 2), substr($1, 1, 1), $2)
                next
        }
        sd[$1]++
}
END {   for(i in sd) {
                printf("Processing %d entries for %s stream %s:\n",
                        sd, substr(i, 2), substr(i, 1, 1))
                for(h = 0; h <= 23; h++)
                        for(m = 0; m < 60; m +=15) {
                                st = sprintf("%02d%02d", h, m)
                                if(!((i, st) in start)) {
                                        printf("\tStart time %s missing\n", st)
                                        cm++
                                }
                        }
                if(cm)  cm = 0
                else    printf("\tOK.\n")
        }
}' /home/fsan/output/samfile

With a samfile that contains data for multiple streams on one date and two dates for one of the streams (with a few errors sprinkled in for fun), the output produced is:

Skipping: A20130604.0000-0600-0016-0600_SubNetwork-DMME1ManagedElement=ces-1
	0016 not 15 minutes after 0000
Skipping: A20130605.2345-0600-2400-0600_SubNetwork-DMME1ManagedElement=ces-1
	2400 not 15 minutes after 2345
Skipping: A20130605.2345-0600-0000-0600_SubNetwork-DMME1ManagedElement=ces-1
	2 entries seen for 20130605 stream A @ start time 2345
Processing 96 entries for 20130604 stream B:
	OK.
Processing 95 entries for 20130604 stream A:
	Start time 0000 missing
Processing 96 entries for 20130605 stream A:
	OK.
1 Like