Using awk to extract data from log file for exact period of time

Hello

As part of the script I am using awk to take data from /var/log/messages for exact period of time, for example for the past 7 days.
For current month it works fine but it doesn't work if I need to take data for example from Apr and Mar (previous month). It takes data from the current Apr only.
I tried to use mins like "-40320 min" or "28 days ago", nothing works.

awk -v d1="$(date --date="-40320 min" "+%m %_d %H:%M")" \
         -v d2="$(date "+%m %_d %H:%M")" '$0 > d1 && $0 < d2 ||
         $0 ~ d2' $log_file

Please advise what's wrong there.
Thanks.

@Kazarkin , welcome

can you show the actual date strings produced by these date command and a sample of the input file being scanned .

runn the commands i see the following

date --date="-40320 min" "+%m %_d %H:%M"
03  4 11:57

date "+%m %_d %H:%M"
04  1 12:59

thks

The part of the file:

Mar  3 08:20:06 serverA sshd[2124]:  ...
Apr  1 12:34:45 serverA sshd[184593]: ....

The output of date commands:

date "+%m %_d %H:%M"
04  1 08:40
date --date="-40320 min" "+%m %_d %H:%M"
03  4 07:41

I also tried to change date format and used the following:

date "+%b %e"
Apr  1
date --date="-40320 min" "+%b %e"
Mar  4

But the script looks for the current (April) month only in the file and ignores previous month (March).

Thanks.

The > and < operaters compare strings lexicographically, and Apr comes before Mar in the alphabet.

Would be easier if your log had YYYY-MM-DD format...

Do you have
journalctl
?

1 Like

Yes I have but it include data for the past 3 days only. That's why I am trying to get data from /var/log/messages using grep or awk.
As I got "grep" doesn't work with time interval.

I feel like there has to be a better way than what I'm describing below. But nothing beyond Perl / Python / etc. come to mind.

But a convoluted she / awk answer is:

  1. convert the month (%b), day (%e), and time (%H:%M:%S) to seconds since the epoch (%s)
  2. compare numbers to numbers to find the range you want
  3. convert the seconds since the epoch (%s) back to month (%b), day (%e), and time (%H:%M:%S).

Here is a proof of concept that I just hacked together on my system to do step #1 and #3 above.

tail /var/log/messages | awk '{cmd = "date \"+%s\" -d \""$1" "$2" "$3"\""; cmd | getline date; sub(/... .. ..:..:../, date, $0);print}' | awk '{cmd = "date \"+%b %e %H:%M:%S\""; cmd | getline date; sub(/........../, date, $0); print}'

Here's the first awk broken up:

awk '{
   cmd = "date \"+%s\" -d \""$1" "$2" "$3"\""
   cmd | getline date
   sub(/... .. ..:..:../, date, $0)
   print
}'

The awk + date above converts this:

Apr  1 12:13:05 omega last message buffered 1 times

to this:

1711991585 omega last message buffered 1 times

Here's the second awk broken up:

awk '{
   cmd = "date \"+%b %e %H:%M:%S\""
   cmd | getline date
   sub(/........../, date, $0)
   print
}'

The awk + date above converts this:

1711991585 omega last message buffered 1 times

to this:

Apr  1 12:13:05 omega last message buffered 1 times

You should be able to easily insert another awk in between the two that does a numeric comparison of the date & time in seconds since epoch (%s) form.

It should be relatively trivial to make the interstitial awk automatically select a time range if you want the last X number of days. -- Make date (read: the computer) do the work to calculate the start and stop dates.

Let me know if you want help with the interstitial awk (I'm on my lunch break and don't have much time).

1 Like

if the OP/@Kazarkin has gawk, this conversion back and forth can be done natively (no date command spawning from within awk) using systime() and mktime GAWK primitives and comparing epoch times.

NOTE: I'd cache the converted times, to save on some CPU cycles.

@Kazarkin

here's a rudimentary script to pull rows from a log file between two dates using gnu awk

Pay attention to the date format you pass in, must have the year , must have a numerical month, see below , why - makes it simpler , feel free to use whatever, but i chose that shown for ease of messing around.

"2024 04 1 22:12:30"

cat kazarkin.awk 
BEGIN {
	# needed to cvt log file date mth to numeric for mktime fn
	mths["Jan"]=1; mths["Feb"]=2; mths["Mar"]=3; mths["Apr"]=4;  mths["May"]=5;  mths["Jun"]=6;
	mths["Jul"]=7; mths["Aug"]=8; mths["Sep"]=9; mths["Oct"]=10; mths["Nov"]=11; mths["Dec"]=12
	
	#
	# convert start and end datetimes to timestamps
	# for mktime the format needs to be YYYY MM DD HH MM SS
	# stripped of colons !
	# 
	gsub(":"," ", from )     
	gsub(":"," ", to )     
	startYr = substr(from,1,4)
	fromTS  = mktime(from)
	endTS   = mktime(to)
}

{
	#
	# Extract the datetime from the log line
	# 
	# YOU NEED TO ENSURE THE FORMAT OF THE INCOMING MESSAGES MAP TO THE 
	# DATE FORMAT AS BELOW, some may use 3 fields, some 4 ... 
	#
	logTS = startYr " " mths[$1] " " $2 " " $3
	gsub(":"," ", logTS )  # for mktime the format needs to be YYYY MM DD HH MM SS
	
	log_ts = mktime(logTS)

	if (log_ts >= fromTS && log_ts <= endTS)
		print
}

awk -v from="2024 04 1 22:12:30" -v to="2024 04 01 22:22:00" -f kazarkin.awk syslog > results.txt

cat results.txt
Apr  1 22:12:34 Z800 junglegingled[301778]: 2024/04/01 22:12:34 [Info] HTTP CALL Duration: 211.013204ms
Apr  1 22:12:34 Z800 junglegingled[301778]: Request: HTTP/3 GET https://api.junglegingle.com/v1/servers?filters[servers.id]=0 map[Accept-Encoding:[gzip, deflate] Content-Length:[] Content-Type:[application/json] User-Agent:[NordApp Linux 3.17.3 Linux 5.4.0-174-generic]]
Apr  1 22:12:34 Z800 junglegingled[301778]: Response: HTTP/3.0 200 - map[Alt-Svc:[h3=":443"; ma=86400] Cache-Control:[public, max-age=30, s-maxage=30, stale-if-error=120] Cf-Cache-Status:[DYNAMIC] 
Apr  1 22:12:34 Z800 junglegingled[301778]: 2024/04/01 22:12:34 [Info] HTTP CALL Duration: 169.712661ms
Apr  1 22:12:34 Z800 junglegingled[301778]: Request: HTTP/3 GET https://api.junglegingle.com/v1/servers?limit=1073741824&filters[servers.status]=online&fields[servers.id]&fields[servers.name]&fiel
Apr  1 22:12:34 Z800 junglegingled[301778]: Response: HTTP/3.0 200 - map[Alt-Svc:[h3=":443"; ma=86400] Cache-Control:[public, max-age=30, s-maxage=30, stale-if-error=120] Cf-Cache-Status:[DYNAMIC
Apr  1 22:12:40 Z800 junglegingled[301778]: 2024/04/01 22:12:40 [Warning] TELIO(v4.1.2): "telio_lana::event_log":64 [Lana] couldn't find foreign context field device_info.brand
Apr  1 22:12:40 Z800 junglegingled[301778]: 2024/04/01 22:12:40 [Warning] TELIO(v4.1.2): "telio_lana::event_log":64 [Lana] couldn't find foreign context field device_info.location.city
Apr  1 22:12:40 Z800 junglegingled[301778]: 2024/04/01 22:12:40 [Warning] TELIO(v4.1.2): "telio_lana::event_log":64 [Lana] couldn't find foreign context field device_info.location.country
Apr  1 22:13:42 Z800 junglegingled[301778]: 2024/04/01 22:13:42 [Info] TELIO(v4.1.2): "telio::device::wg_controller":137 Removing peer: Some(ipAddresses)
Apr  1 22:13:57 Z800 junglegingled[301778]: 2024/04/01 22:13:57 [Info] TELIO(v4.1.2): "telio::device::wg_controller":142 Inserting peer: Some(ipAddresses)
Apr  1 22:14:02 Z800 junglegingled[301778]: 2024/04/01 22:14:02 [Info] TELIO(v4.1.2): "telio::device::wg_controller":137 Removing peer: Some(ipAddresses)
Apr  1 22:14:02 Z800 junglegingled[301778]: 2024/04/01 22:14:02 [Info] TELIO(v4.1.2): "telio::device::wg_controller":142 Inserting peer: Some(ipAddresses)
Apr  1 22:14:07 Z800 junglegingled[301778]: 2024/04/01 22:14:07 [Info] TELIO(v4.1.2): "telio::device::wg_controller":137 Removing peer: Some(ipAddresses)
Apr  1 22:14:07 Z800 junglegingled[301778]: 2024/04/01 22:14:07 [Info] TELIO(v4.1.2): "telio::device::wg_controller":142 Inserting peer: Some(ipAddresses)
Apr  1 22:17:32 Z800 junglegingled[301778]: 2024/04/01 22:17:32 [Info] TELIO(v4.1.2): "telio::device::wg_controller":137 Removing peer: Some(ipAddresses)
Apr  1 22:17:32 Z800 junglegingled[301778]: 2024/04/01 22:17:32 [Info] TELIO(v4.1.2): "telio::device::wg_controller":142 Inserting peer: Some(ipAddresses)
Apr  1 22:17:59 Z800 rtkit-daemon[1552]: Supervising 4 threads of 3 processes of 1 users.
Apr  1 22:17:59 Z800 rtkit-daemon[1552]: Successfully made thread 481298 of process 3041 owned by '1000' RT at priority 5.
Apr  1 22:17:59 Z800 rtkit-daemon[1552]: Supervising 5 threads of 3 processes of 1 users.
Apr  1 22:20:58 Z800 junglegingled[301778]: 2024/04/01 22:20:58 [Info] TELIO(v4.1.2): "telio::device::wg_controller":137 Removing peer: Some(ipAddresses)
Apr  1 22:20:58 Z800 junglegingled[301778]: 2024/04/01 22:20:58 [Info] TELIO(v4.1.2): "telio::device::wg_controller":142 Inserting peer: Some(ipAddresses)
Apr  1 22:21:03 Z800 junglegingled[301778]: 2024/04/01 22:21:03 [Info] TELIO(v4.1.2): "telio::device::wg_controller":137 Removing peer: Some(ipAddresses)
Apr  1 22:21:03 Z800 junglegingled[301778]: 2024/04/01 22:21:03 [Info] TELIO(v4.1.2): "telio::device::wg_controller":142 Inserting peer: Some(ipAddresses)
Apr  1 22:21:08 Z800 junglegingled[301778]: 2024/04/01 22:21:08 [Info] TELIO(v4.1.2): "telio::device::wg_controller":137 Removing peer: Some(ipAddresses)
Apr  1 22:21:08 Z800 junglegingled[301778]: 2024/04/01 22:21:08 [Info] TELIO(v4.1.2): "telio::device::wg_controller":142 Inserting peer: Some(ipAddresses)
Apr  1 22:21:13 Z800 junglegingled[301778]: 2024/04/01 22:21:13 [Info] TELIO(v4.1.2): "telio::device::wg_controller":137 Removing peer: Some(ipAddresses)
Apr  1 22:21:13 Z800 junglegingled[301778]: 2024/04/01 22:21:13 [Info] TELIO(v4.1.2): "telio::device::wg_controller":142 Inserting peer: Some(ipAddresses)
Apr  1 22:21:18 Z800 junglegingled[301778]: 2024/04/01 22:21:18 [Info] TELIO(v4.1.2): "telio::device::wg_controller":137 Removing peer: Some(ipAddresses)
Apr  1 22:21:18 Z800 junglegingled[301778]: 2024/04/01 22:21:18 [Info] TELIO(v4.1.2): "telio::device::wg_controller":142 Inserting peer: Some(ipAddresses)
Apr  1 22:21:23 Z800 junglegingled[301778]: 2024/04/01 22:21:23 [Info] TELIO(v4.1.2): "telio::device::wg_controller":137 Removing peer: Some(ipAddresses)
Apr  1 22:21:23 Z800 junglegingled[301778]: 2024/04/01 22:21:23 [Info] TELIO(v4.1.2): "telio::device::wg_controller":142 Inserting peer: Some(ipAddresses)
Apr  1 22:21:28 Z800 junglegingled[301778]: 2024/04/01 22:21:28 [Info] TELIO(v4.1.2): "telio::device::wg_controller":137 Removing peer: Some(ipAddresses)
Apr  1 22:21:28 Z800 junglegingled[301778]: 2024/04/01 22:21:28 [Info] TELIO(v4.1.2): "telio::device::wg_controller":142 Inserting peer: Some(ipAddresses)
Apr  1 22:21:33 Z800 junglegingled[301778]: 2024/04/01 22:21:33 [Info] TELIO(v4.1.2): "telio::device::wg_controller":137 Removing peer: Some(ipAddresses)
Apr  1 22:21:33 Z800 junglegingled[301778]: 2024/04/01 22:21:33 [Info] TELIO(v4.1.2): "telio::device::wg_controller":142 Inserting peer: Some(ipAddresses)
Apr  1 22:21:38 Z800 junglegingled[301778]: 2024/04/01 22:21:38 [Info] TELIO(v4.1.2): "telio::device::wg_controller":137 Removing peer: Some(ipAddresses)
Apr  1 22:21:38 Z800 junglegingled[301778]: 2024/04/01 22:21:38 [Info] TELIO(v4.1.2): "telio::device::wg_controller":142 Inserting peer: Some(ipAddresses)
Apr  1 22:21:43 Z800 junglegingled[301778]: 2024/04/01 22:21:43 [Info] TELIO(v4.1.2): "telio::device::wg_controller":137 Removing peer: Some(ipAddresses)
Apr  1 22:21:53 Z800 junglegingled[301778]: 2024/04/01 22:21:53 [Info] TELIO(v4.1.2): "telio::device::wg_controller":142 Inserting peer: Some(ipAddresses)
Apr  1 22:21:58 Z800 junglegingled[301778]: 2024/04/01 22:21:58 [Info] TELIO(v4.1.2): "telio::device::wg_controller":137 Removing peer: Some(ipAddresses)

come back with q's if any

1 Like

Thanks a lot everyone.
This is exactly what I was looking for.

@Kazarkin , if you want only to deal with month day hour min sec , try the example below

As with all submissions, you need to test !

(could've posted with the initial but didn't want to confuse....)

cat kazarkinALT.awk 
BEGIN {
	mths["Jan"]=1; mths["Feb"]=2; mths["Mar"]=3; mths["Apr"]=4;  mths["May"]=5;  mths["Jun"]=6;
	mths["Jul"]=7; mths["Aug"]=8; mths["Sep"]=9; mths["Oct"]=10; mths["Nov"]=11; mths["Dec"]=12
	
	split(from,fdate,/[ :]/);
	fromTS = int(sprintf("%02d%02d%02d%02d%02d", fdate[1], fdate[2], fdate[3], fdate[4], fdate[5] ))

	split(to,tdate,/[ :]/);
	endTS = int(sprintf("%02d%02d%02d%02d%02d", tdate[1], tdate[2], tdate[3], tdate[4], tdate[5] ))
}

{
	split(mths[$1] " " $2 " " $3,logDs,/[ :]/)
	log_ts = int(sprintf("%02d%02d%02d%02d%02d", logDs[1], logDs[2], logDs[3], logDs[4], logDs[5] ))
	
	if (log_ts >= fromTS && log_ts <= endTS)
		print
}

awk -v from="04 1 23:14:55" -v to="04 01 23:18:40" -f kazarkinALT.awk syslog
Apr  1 23:14:55 Z800 giggleAPPLICATIONd[301778]: 2024/04/01 23:14:55 [Info] TELIO(anIPaddr)
Apr  1 23:14:55 Z800 giggleAPPLICATIONd[301778]: 2024/04/01 23:14:55 [Info] TELIO(anIPaddr)
Apr  1 23:17:01 Z800 CRON[483871]: (anIPaddr)
Apr  1 23:18:21 Z800 giggleAPPLICATIONd[301778]: 2024/04/01 23:18:21 [Info] TELIO(anIPaddr)
Apr  1 23:18:21 Z800 giggleAPPLICATIONd[301778]: 2024/04/01 23:18:21 [Info] TELIO(anIPaddr)
Apr  1 23:18:26 Z800 giggleAPPLICATIONd[301778]: 2024/04/01 23:18:26 [Info] TELIO(anIPaddr)
Apr  1 23:18:26 Z800 giggleAPPLICATIONd[301778]: 2024/04/01 23:18:26 [Info] TELIO(anIPaddr)
Apr  1 23:18:31 Z800 giggleAPPLICATIONd[301778]: 2024/04/01 23:18:31 [Info] TELIO(anIPaddr)
Apr  1 23:18:31 Z800 giggleAPPLICATIONd[301778]: 2024/04/01 23:18:31 [Info] TELIO(anIPaddr)
Apr  1 23:18:36 Z800 giggleAPPLICATIONd[301778]: 2024/04/01 23:18:36 [Info] TELIO(anIPaddr)
Apr  1 23:18:36 Z800 giggleAPPLICATIONd[301778]: 2024/04/01 23:18:36 [Info] TELIO(anIPaddr)
1 Like

Thanks a lot.

A slightly simplified date/time conversion function - should be easily adjusted to for any date/time formats.:

#Apr  1 23:14:55
function toTs(m,d,t,    mthNr) {
    gsub(/[^0-9]/,"",t)
    mthNr = (index("JanFebMarAprMayJunJulAugSepOctNovDec",m) + 2) / 3
    return sprintf("%02d%02d%02d",mthNr,d,t)
}

{print toTs($1, $2, $3) }
1 Like