Hi Mark,
I'm glad that you think greet_sed's script did what you want, but it seems a little bit too simple to me. If we take a slightly different sample file (expanded from your example in post #1 in this thread:
2016-09-17 19:30:57 INFO: [D3B4AEB3] id: 4562079216, time: 2016-09-17 19:30:41,
2016-09-17 12:02:26 INFO: [D3B4AEB3] id: 4562079193, time: 2016-09-17 12:02:25,
2016-09-17 19:31:57 INFO: [D3B4AEB3] id: 4562079300, time: 2016-09-17 19:30:57,
2016-09-17 20:30:57 INFO: [D3B4AEB3] id: 4562079301, time: 2016-09-17 19:30:57,
2016-09-17 19:30:07 INFO: [D3B4AEB3] id: 4562079302, time: 2016-09-17 19:20:58,
2016-09-17 19:40:01 INFO: [D3B4AEB3] id: 4562079302, time: 2016-09-17 19:39:50,
We see that the code suggested by Scrutinizer in post #3 in this thread produces the output:
2016-09-17 19:30:57 INFO: [D3B4AEB3] id: 4562079216, time: 2016-09-17 19:30:41,
2016-09-17 19:31:57 INFO: [D3B4AEB3] id: 4562079300, time: 2016-09-17 19:30:57,
2016-09-17 20:30:57 INFO: [D3B4AEB3] id: 4562079301, time: 2016-09-17 19:30:57,
2016-09-17 19:30:07 INFO: [D3B4AEB3] id: 4562079302, time: 2016-09-17 19:20:58,
2016-09-17 19:40:01 INFO: [D3B4AEB3] id: 4562079302, time: 2016-09-17 19:39:50,
(with each input line output on a separate line) which seems to me to be correct.
The code greet_sed suggested, however, only produces the output:
2016-09-17 19:30:57 INFO: [D3B4AEB3] id: 4562079216, time: 2016-09-17 19:30:41,
with no line terminator. Note that it also converts all sequences of multiple blanks on an input line to a single space on output lines (in this case changing the two spaces before INFO
to a single space). I don't know if this will matter to whatever will be looking at your output.
You can't just look at the seconds field to determine if two timestamps are within ten seconds of each other. As shown above, greet_sed's code does not detect when the start and end times are one minute apart, one hour apart, or even eleven seconds apart if the two times are not in the same minute.
Now that we know that all timestamps in your input data will occur on a single date, we can slightly simplify Scrutinizer's code and get the same results:
awk '
{ # Split end time field into hours, minutes, and seconds.
split($2, time, ":")
# Convert hour, minutes, and seconds to seconds since midnight.
end_time = time[1] * 3600 + time[2] * 60 + time[3]
# Split start time field into hours, minutes, and seconds.
split($NF, time, ":")
# Convert hour, minutes, and seconds to seconds since midnight.
start_time = time[1] * 3600 + time[2] * 60 + time[3]
}
# If the end time is more than ten seconds after the start time, print the line.
end_time - start_time > 10
' Example