awk

I have an awk command which reads a file and counts the frequency of each "date/time" column and outputs the results to a log

# if field7 is "input_message" in the input log
# count frequency of the field4 date and time field then print out totals to log 
awk ' $7=="input_message"{A[$4]++}END {for(i in A)print i","A","}' $IN_FILE >> $OUTPUT_LOG ;
  

The $IN_FILE input file looks like this

31.44.217.2 - A [02/Sep/2016:09:44:52 +0100] "POST input_message HTTP" 200 -  
31.44.217.2 - A [02/Sep/2016:09:44:52 +0100] "POST input_message HTTP" 200 -  
31.44.217.2 - A [02/Sep/2016:09:44:53 +0100] "POST input_message HTTP" 200 -  
31.44.217.2 - A [02/Sep/2016:09:45:01 +0100] "POST input_message HTTP" 200 -  
31.44.218.2 - A [02/Sep/2016:09:50:52 +0100] "POST input_message HTTP" 200 -  

The $OUTPUT_LOG output file looks like this and gives a total count of the frequency of each date/time field

[02/Sep/2016:09:44:52,2,
[02/Sep/2016:09:44:53,1,
[02/Sep/2016:09:45:01,1,
[02/Sep/2016:09:50:52,1,

I would like to change it so it counts 10 minute time frames e.g instead of reading $4 as [02/Sep/2016:09:44:52 it reads it as [02/Sep/2016:09:4 (ignore 4:52 at the end).

This would then output counts for the 10 minute period as opposed to counts of every second. I'm trying to figure out how to ignore the last 4 characters of $4 to do this.

Any help appreciated.

Hello finn,

Could you please try following and let me know if this helps.

awk '{A[substr($4,2,16)]++} END{for(i in A){print i "," A ","}}'  Input_file

Output will be as follows.

02/Sep/2016:09:4,4,
02/Sep/2016:09:5,1,

Thanks,
R. Singh

1 Like

Spot on many thanks Ravinder!

Wouldn't it be nicer to print the full minutes and seconds as well, to make the time recognizeable:

1 Like

Good idea thanks, I noticed when opening the resultant output file in Excel it didnt auto format the time correctly and the column need a custom format doing. Doing your suggestion will mean that is not necessary. Cheers.