I have an awk command which reads a file and counts the frequency of each "date/time" column and outputs the results to a log
# if field7 is "input_message" in the input log
# count frequency of the field4 date and time field then print out totals to log
awk ' $7=="input_message"{A[$4]++}END {for(i in A)print i","A","}' $IN_FILE >> $OUTPUT_LOG ;
The $IN_FILE input file looks like this
31.44.217.2 - A [02/Sep/2016:09:44:52 +0100] "POST input_message HTTP" 200 -
31.44.217.2 - A [02/Sep/2016:09:44:52 +0100] "POST input_message HTTP" 200 -
31.44.217.2 - A [02/Sep/2016:09:44:53 +0100] "POST input_message HTTP" 200 -
31.44.217.2 - A [02/Sep/2016:09:45:01 +0100] "POST input_message HTTP" 200 -
31.44.218.2 - A [02/Sep/2016:09:50:52 +0100] "POST input_message HTTP" 200 -
The $OUTPUT_LOG output file looks like this and gives a total count of the frequency of each date/time field
[02/Sep/2016:09:44:52,2,
[02/Sep/2016:09:44:53,1,
[02/Sep/2016:09:45:01,1,
[02/Sep/2016:09:50:52,1,
I would like to change it so it counts 10 minute time frames e.g instead of reading $4 as [02/Sep/2016:09:44:52 it reads it as [02/Sep/2016:09:4 (ignore 4:52 at the end).
This would then output counts for the 10 minute period as opposed to counts of every second. I'm trying to figure out how to ignore the last 4 characters of $4 to do this.
Any help appreciated.