Newbie looking for how to Grep times more than 10 seconds apart

Markham · September 18, 2016, 12:14am

I am new to grep and Linux and am looking to see if grep can parse out a list of lines that have a difference of more than 10 seconds between the times on each line.

Example

2016-09-17 19:30:57  INFO: [D3B4AEB3] id: 4562079216, time: 2016-09-17 19:30:41,
2016-09-17 12:02:26  INFO: [D3B4AEB3] id: 4562079193, time: 2016-09-17 12:02:25,

We need to have the grep script parse out the first line because the difference in times is more than 10 seconds, but not the second line as they are within 10 seconds.

Any assistance would be appreciated.

I do not even know where to start and I spent hours googleing this to only get completely confused.

Thanks in advance.

Mark

Don_Cragun · September 18, 2016, 1:07am

Hi Mark,
Welcome to the UNIX & Linux Forums.

For you first question; no, grep can't do this. The grep utility selects lines matching certain fixed strings, basic regular expressions, or extended regular expression. The grep utility is not able to perform arithmetic calculations.

Are the timestamps on a given line always on the same date? The arithmetic needed to compare two HH:MM:SS values is relatively simple when all times are on the same date. If timestamps in a line can cross the midnight barrier, the arithmetic is more complex.

What is the name of your input file?

Please show us what output you want to produce from your sample input file (in CODE tags, please).

What shell are you using?

Scrutinizer · September 18, 2016, 1:21am

Like Don Cragun said, this cannot be done with grep.

Here is an awk approach you could try, with date change around midnight:

awk -F, '                                 # set the input field separator (FS) to a comma
  {
    n=split($2,B," ")                     # use split() twice to convert begin time t1 to seconds
    split(B[n], T, ":")
    t1=T[1]*3600 + T[2]*60 + T[3]
    split($1,E," ")                       # use split() twice to convert end time t2 to seconds
    split(E[2], T, ":")
    t2=T[1]*3600 + T[2]*60 + T[3]
  } 
  E[1]>B[n-1] {                           # if there is a date change
    t2+=3600*24                           # add the number of seconds in a day to t2
  }
  (t2-t1)>10                              # if the difference is more than 10 seconds, print the line.
' file

drl · September 18, 2016, 8:06am

Hi.

If there is a concern about date differences over days, months, years, then the date-aware package dateutils can be used. The package can be found in many Linux distribution repositories, or at GitHub - hroptatyr/dateutils: nifty command line date and time utilities; fast date calculations and conversion in the shell

We convert the dates into a generic form, then find the absolute value of the difference in seconds, printing the (saved) line if greater than 10:

#!/usr/bin/env bash

# @(#) s1       Demonstrate date/time difference, dconf, ddiff.

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
LC_ALL=C ; LANG=C ; export LC_ALL LANG
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
em() { pe "$*" >&2 ; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C dateutils.dconv dateutils.ddiff

# Function absolute value.
abs() { v1="$1"; [ "$v1" -lt 0 ] && echo "${v1:1}" || echo "$v1" ; }

FILE=${1-data1}

pl " Input data file $FILE:"
cat $FILE

#         1        2     3          4   5           6     7         8          9
#2016-09-17 19:30:57 INFO: [D3B4AEB3] id: 4562079216, time: 2016-09-17 19:30:41,

pl " Results:"
while read line
do
  read d1 t1 j3 j4 j5 j6 j7 d8 t9 <<< $line
  reference=$( dateutils.dconv "$d1 $t1" )
  other=$( dateutils.dconv "$d8 $t9" )
  db " reference is :$reference:, other is :$other:"
  difference=$( dateutils.ddiff -f "%S%n" $reference $other )
  db " Difference in time is :$difference:"
  positive=$( abs "$difference" )
  db " absolute value of :$difference: is :$positive:"
  [ "$positive" -gt 10 ] && echo "$line"
done < $FILE

exit 0

produciing:

$ ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 3.16.0-4-amd64, x86_64
Distribution        : Debian 8.4 (jessie) 
bash GNU bash 4.3.30
dateutils.dconv dconv 0.3.1
dateutils.ddiff ddiff 0.3.1

-----
 Input data file data1:
2016-09-17 19:30:57  INFO: [D3B4AEB3] id: 4562079216, time: 2016-09-17 19:30:41,
2016-09-17 12:02:26  INFO: [D3B4AEB3] id: 4562079193, time: 2016-09-17 12:02:25,

-----
 Results:
2016-09-17 19:30:57  INFO: [D3B4AEB3] id: 4562079216, time: 2016-09-17 19:30:41,

To see inermediate values, interchange the 2 db lines to get (in part):

 Results:
 db,  reference is :2016-09-17T19:30:57:, other is :2016-09-17T19:30:41:
 db,  Difference in time is :-16:
 db,  absolute value of :-16: is :16:
2016-09-17 19:30:57  INFO: [D3B4AEB3] id: 4562079216, time: 2016-09-17 19:30:41,
 db,  reference is :2016-09-17T12:02:26:, other is :2016-09-17T12:02:25:
 db,  Difference in time is :-1:
 db,  absolute value of :-1: is :1:

See man pages, results from Google for details ... cheers, drl

Scrutinizer · September 18, 2016, 9:58am

@drl: I noticed there is a trailing comma in the the t9 variable. I have not tested with the dateutils packages, but I could imagine that this might have undesirable effects.

$ read d1 t1 j3 j4 j5 j6 j7 d8 t9 <<< file; echo "$t9"
19:30:41,

This could be mitigated like this:

$ IFS=$' \t\n,' read d1 t1 j3 j4 j5 j6 j7 d8 t9 <<< file; echo "$t9"
19:30:41

Markham · September 18, 2016, 11:35am

Don,

The dates change every day.

They are server log files that log access by IOT devices and the name changes every day.

I do not know what "CODE tags" are nor how to show them. I would want to have the first line be part of the output and not the second line.

I have Ubuntu Server 16.0.4

---------- Post updated at 11:35 AM ---------- Previous update was at 11:18 AM ----------

I want to thank everyone for the suggestions.

I will have to review them and test them to see what works.

Again, I am a newbie to unix and appreciate all the suggestions. I never thought that it would be easy. I am not a programmer, so I have a lot to digest.

Don_Cragun · September 18, 2016, 3:27pm

Hi Mark,
I know that dates change every day. What I don't know is whether or not the starting time and ending times in your input data are ever on different dates. As I said before, if the timestamps being compared are always on the same date, your problem is much simpler than if the timestamps can be on different dates. The following tutorial explains how to use CODE and ICODE tags:

Markham · September 18, 2016, 3:33pm

Don,

Sorry. The times are always on the same day.

I think I asked for more details than I can understand. I got lost on that video after the first minute or two. I am so new to unix that any of these commands have NO meaning to me.

Mark

greet_sed · September 18, 2016, 4:23pm

Hi,

If fields are fixed and if it is always same day , you can try this :

 awk '{split($2,a,":");sub(/\,/,"",$NF);split($NF,b,":");if ( (a[3]-b[3])>10) { printf "%s ",$0","; }}' inputfile

Markham · September 18, 2016, 4:36pm

Greet_sed,

That worked perfectly except that it put them all on one line.
Is there a way to separate them out one per line?

Mark

Don_Cragun · September 18, 2016, 6:06pm

Hi Mark,
I'm glad that you think greet_sed's script did what you want, but it seems a little bit too simple to me. If we take a slightly different sample file (expanded from your example in post #1 in this thread:

2016-09-17 19:30:57  INFO: [D3B4AEB3] id: 4562079216, time: 2016-09-17 19:30:41,
2016-09-17 12:02:26  INFO: [D3B4AEB3] id: 4562079193, time: 2016-09-17 12:02:25,
2016-09-17 19:31:57  INFO: [D3B4AEB3] id: 4562079300, time: 2016-09-17 19:30:57,
2016-09-17 20:30:57  INFO: [D3B4AEB3] id: 4562079301, time: 2016-09-17 19:30:57,
2016-09-17 19:30:07  INFO: [D3B4AEB3] id: 4562079302, time: 2016-09-17 19:20:58,
2016-09-17 19:40:01  INFO: [D3B4AEB3] id: 4562079302, time: 2016-09-17 19:39:50,

We see that the code suggested by Scrutinizer in post #3 in this thread produces the output:

2016-09-17 19:30:57  INFO: [D3B4AEB3] id: 4562079216, time: 2016-09-17 19:30:41,
2016-09-17 19:31:57  INFO: [D3B4AEB3] id: 4562079300, time: 2016-09-17 19:30:57,
2016-09-17 20:30:57  INFO: [D3B4AEB3] id: 4562079301, time: 2016-09-17 19:30:57,
2016-09-17 19:30:07  INFO: [D3B4AEB3] id: 4562079302, time: 2016-09-17 19:20:58,
2016-09-17 19:40:01  INFO: [D3B4AEB3] id: 4562079302, time: 2016-09-17 19:39:50,

(with each input line output on a separate line) which seems to me to be correct.

The code greet_sed suggested, however, only produces the output:

2016-09-17 19:30:57 INFO: [D3B4AEB3] id: 4562079216, time: 2016-09-17 19:30:41,

with no line terminator. Note that it also converts all sequences of multiple blanks on an input line to a single space on output lines (in this case changing the two spaces before INFO to a single space). I don't know if this will matter to whatever will be looking at your output.

You can't just look at the seconds field to determine if two timestamps are within ten seconds of each other. As shown above, greet_sed's code does not detect when the start and end times are one minute apart, one hour apart, or even eleven seconds apart if the two times are not in the same minute.

Now that we know that all timestamps in your input data will occur on a single date, we can slightly simplify Scrutinizer's code and get the same results:

awk '
{	# Split end time field into hours, minutes, and seconds.
	split($2, time, ":")
	# Convert hour, minutes, and seconds to seconds since midnight.
	end_time = time[1] * 3600 + time[2] * 60 + time[3]
	# Split start time field into hours, minutes, and seconds.
	split($NF, time, ":")
	# Convert hour, minutes, and seconds to seconds since midnight.
	start_time = time[1] * 3600 + time[2] * 60 + time[3]
}
# If the end time is more than ten seconds after the start time, print the line.
end_time - start_time > 10
' Example

Markham · September 18, 2016, 6:11pm

Don,

I appreciate you trying to assist, but you might as well provide the theories of Relativity as well because everything you said makes about as much sense to me.

I respectfully appreciate that you are trying to explain it, but i lost you after "Hi Mark".

Don_Cragun · September 18, 2016, 6:39pm

Mark,
OK. Let me simplify it... The code greet_sed suggested does not work. The code Scrutinizer suggested in post #3 in this thread does work.

Now that we know that all all of your timestamps are on a single date, we can simplify Scrutinizer's code (and strip out the comments since you apparently don't want an explanation of how anything works and don't want to be distracted by our attempts to help you) to just:

awk '
{	split($2, t, ":")
	e = t[1] * 3600 + t[2] * 60 + t[3]
	split($NF, t, ":")
	s = t[1] * 3600 + t[2] * 60 + t[3]
}
e - s > 10
' Example

I sincerely apologize for trying to help you understand that greet_sed's code does not work and for showing you examples that show failures in the output greet_sed's code produces.

Markham · September 18, 2016, 6:43pm

Don,

I appreciate all the effort that you went to and I now see that his code did not work.

I tried your last suggestion and it is getting results for times less than 10 seconds.

I must be doing something wrong.

Mark

Don_Cragun · September 18, 2016, 6:55pm

My guess would be that you are getting the wrong results because the data format has changed, but we can't know what went wrong if you do not show us an example of in input line for which my last suggestion gave you the wrong results.

Markham · September 18, 2016, 6:57pm

Don,

Can send me a Private Message so that I can share things with you in private?

Mark

Don_Cragun · September 18, 2016, 7:07pm

Mark,
Forum rules prohibit hiding technical discussions in private e-mail. Hiding technical conversations in private e-mail hides details about this public conversation from everyone else reading this thread and keeps you from getting input from others on this forum who may be able to provide you with a working solution. In addition to that, I am obviously unable to speak in a language you understand. Maybe someone else seeing our continued discussion will be able to help you much better than I have been able to do.

The request is very simple: Show us sample input for which the script I suggested (or the script Scrutinizer suggested) does not produce the correct output.

Markham · September 18, 2016, 7:18pm

Don,

The reason is that we use some information in our logs that is not to be made visible to the public. I will try to obfuscate the data so that the content is valid but the data itself is still private.

---------- Post updated at 07:15 PM ---------- Previous update was at 07:13 PM ----------

Don,

Here is the data:

2016-09-17 10:30:36  INFO: [D3B4AEB3] id: 4562079193, time: 2016-09-17 10:30:35, lat: 51.00000, lon: -112.00000, speed: 64.3, course: 130.0
2016-09-17 10:30:57  INFO: [D3B4AEB3] id: 4562079216, time: 2016-09-17 07:30:55, lat: 51.00000, lon: -112.00000, speed: 0.0, course: 0.0
2016-09-17 10:30:57  INFO: [D3B4AEB3] id: 4562079216, time: 2016-09-17 10:30:55, lat: 51.00000, lon: -112.00000, speed: 0.0, course: 0.0
2016-09-17 10:31:16  INFO: [D3B4AEB3] id: 4562079193, time: 2016-09-17 10:31:15, lat: 51.00000, lon: -112.00000, speed: 64.3, course: 130.0

---------- Post updated at 07:18 PM ---------- Previous update was at 07:15 PM ----------

I did not want to undermine the great benefits of this forum nor to make anyone feel that there was a language difference. I wanted to protect some data that contained sensitive information.

Mark

Don_Cragun · September 18, 2016, 7:21pm

Hi Mark,
You should not be sending company private data outside of your company to this forum or to me. Obfuscated data is fine, but we do need to see the actual format of your data. If the data you showed us in earlier posts was not representative of your actual data, there is a very low chance that the code we provided to work on your sample data will work on your actual data.

Don

Markham · September 18, 2016, 7:26pm

Don,

I thought that any other data on the line would not be an issue. I would never provide any of the sensitive data, but wanted to try to keep as much of it out of public scrutiny.

I deeply apologize for these actions.

I provided obfuscated data above this post and probably when you were responding.

See above.

Mark