How to extract start/end times from log file to CSV file?

Hi,

I have a log file (log.txt) that which contains lines of date/time.
I need to create a script to extract a CSV file (out.csv) that gets all the sequential times (with only 1 minute difference) together by stating the start time and end time of this period.

Sample log file (log.txt)

14-11-2014 05:23
14-11-2014 05:24
14-11-2014 05:25
14-11-2014 05:26
16-11-2014 13:01
16-11-2014 13:02
16-11-2014 13:03
23-11-2014 03:00
23-11-2014 05:24
23-11-2014 05:25

The CSV out file should look something like the below:

14-11-2014 05:23,14-11-2014 05:26,4
16-11-2014 13:01,16-11-2014 13:03,3
23-11-2014 03:00,23-11-2014 03:00,1
23-11-2014 05:24,23-11-2014 05:25,2

The first column of each line is the start time, the second column is the end time, and the third column is the number of minutes between the start and end times.

Would you please help me in this?

Thanks in advance.

Zizo

Enjoy...

#!/bin/bash

numOfRecords=0
prevDay=`head -1 log.txt| awk '{print $1}'`
startTime=`head -1 log.txt| awk '{print $2}'`
endTime=$startTime
while read data
do
        curDay=`echo $data | awk '{print $1}'`
        if [[ $curDay != $prevDay ]]
        then
        echo $prevDay,$startTime,$endTime,$numOfRecords
                startTime=`echo $data | awk '{print $2}'`
                numOfRecords=1
        else
                numOfRecords=$((numOfRecords+1))
                endTime=`echo $data | awk '{print $2}'`
        fi
        prevDay=$curDay
done<log.txt
echo $prevDay,$startTime,$endTime,$numOfRecords

This depends on a recent shell and GNU date:

awk '{print $3"-"$2"-"$1" "$4":"$5}' FS="[-: ]" file4 |
        date -f- +%s |
        { read OLDP
          printf "%(%d-%m-%Y %H:%M)T, " $OLDP; CNT=1   
          while read EP
                do [ $((EP - OLDP)) -gt 60 ] && { printf "%(%d-%m-%Y %H:%M)T, %s\n%(%d-%m-%Y %H:%M)T, " $OLDP $CNT $EP; CNT=0; }  
                OLDP=$EP
                ((CNT++))
                done
          printf "%(%d-%m-%Y %H:%M)T, %s\n" $OLDP $CNT
        } 
14-11-2014 05:23, 14-11-2014 05:26, 4
16-11-2014 13:01, 16-11-2014 13:03, 3
23-11-2014 03:00, 23-11-2014 03:00, 1
23-11-2014 05:24, 23-11-2014 05:25, 2
1 Like

Thanks a lot for your replies. I really appreciate your help.

However, unfortunately, I still have problems trying both ways...

@Igal Malka, I tried the script you provided, but it only checks for days (not times)... meaning that it doesn't capture the case when there is non-sequential times in the same day.

14-11-2014 05:23 
14-11-2014 05:24 
14-11-2014 05:25 
14-11-2014 05:26 
16-11-2014 13:01 
16-11-2014 13:02 
16-11-2014 13:03 
23-11-2014 03:00 
23-11-2014 05:24 
23-11-2014 05:25

When I tired the script, the result was the below:

14-11-2014,05:23,05:26,4
16-11-2014,13:01,13:03,3
23-11-2014,03:00,05:25,3

It checked only for the date and assumed that those are only 3 minutes (on 23-11-2014), where it is actually 1 minute at 03:00, and 2 minutes from 05:24 to 05:25.

@RudiC, I tried the script you provided, but unfortunately it gave me some error related to printf:

-bash: printf: `(': invalid format character
-bash: printf: `(': invalid format character
-bash: printf: `(': invalid format character
-bash: printf: `(': invalid format character
-bash: printf: `(': invalid format character

Note:
OS:

Linux  #1 SMP Tue Feb 18 11:42:11 EST 2014 x86_64 x86_64 x86_64 GNU/Linux

Thanks,
Zizo

As I said, it needs a recent shell that has the %(datefmt)T format for printf . You still can try to use the date command to convert epoch seconds to your preferred date format.

I'm stuck unfortunately.

Modification of RudiC script (I unrolled the pipeline just to make it clearer to myself) that doesn't use the %(datefmt)T format of man printf (linux)

awk '{print $3"-"$2"-"$1" "$4":"$5}' FS="[-: ]" "${1}" \
| date -f- +%s \
| {
  read oldT
  echo -n "${oldT} "
  N=1
  while read curT; do
    if [[ $(( curT - oldT )) -gt 60 ]]; then
      echo ${oldT} ${N}
      echo -n "${curT} "
      N=0
    fi
    oldT=${curT}
    (( N++ ))
  done
  echo ${oldT} ${N}
} \
| while read oldT newT N; do
  echo $(date +%d-%m-%Y\ %H:%M -d @${oldT}),$(date +%d-%m-%Y\ %H:%M -d @${newT}),${N}
done

See the manpage if your version of man date (linux) doesn't support -d @seconds .
Testing with a larger dataset:

14-11-2014 05:23
14-11-2014 05:24
14-11-2014 05:25
14-11-2014 05:26
16-11-2014 13:01
16-11-2014 13:02
16-11-2014 13:03
23-11-2014 03:00
23-11-2014 05:24
23-11-2014 05:25
24-11-2014 08:24
24-11-2014 08:25
25-11-2014 08:58
25-11-2014 08:59
25-11-2014 09:00
25-11-2014 09:01
31-12-2014 23:47
31-12-2014 23:48
31-12-2014 23:49
    snip
01-01-2015 00:05
01-01-2015 00:06
01-01-2015 00:07

yielded:

14-11-2014 05:23,14-11-2014 05:26,4
16-11-2014 13:01,16-11-2014 13:03,3
23-11-2014 03:00,23-11-2014 03:00,1
23-11-2014 05:24,23-11-2014 05:25,2
24-11-2014 08:24,24-11-2014 08:25,2
25-11-2014 08:58,25-11-2014 09:01,4
31-12-2014 23:47,01-01-2015 00:07,21

(thanks to RudiC for giving me a starting point)

2 Likes

Thank you very much :slight_smile: I really appreciate your effort.
That worked fine with me. :slight_smile: