Pattern Matching and creating output

HI Unix Forum,

My requirement

I have two set of Patterns UBA and CIE for which different Phases are there which will have Start and End time. They are not in same order.

I want the o/p in the below mentioned format.

Eg: Mangolia Alien 03:04:56 Phase 0 started (10 seconds)
In the above stmt --- 03:04:36 is Start_Time for Phase0

Hope my Requirement is clear.

Below is an Sample LOg --- That may not make sense as I have changed to protect the data.

 Log (This log is not exact but just to indicate the pattern)
  <<Any number of lines inbetween>>
[YYYY-MM-DD hh:mm:ss] Standard Output for '/Task: CIE':
--------------------------------------------------------------------------------
Mangolia Alien 03:04:56   Phase 0 started (10 seconds)
 <<Any number of lines inbetween>>
 --------------------------------------------------------------------------------
[YYYY-MM-DD 03:04:58] Standard Output for '/Task: CIE':
--------------------------------------------------------------------------------
Mangolia Alien 03:04:58   Phase 1 started (0 seconds)
 <<Any number of lines inbetween>>
--------------------------------------------------------------------------------
[YYYY-MM-DD 03:05:07] Standard Output for '/Task: CIE':
--------------------------------------------------------------------------------
Mangolia Alien 03:05:07   Phase 2 started (7 seconds)
 <<Any number of lines inbetween>>
--------------------------------------------------------------------------------
[YYYY-MM-DD 03:05:12] Standard Output for '/Task: UBA':
--------------------------------------------------------------------------------
Mangolia Alien 03:05:12   Phase 0 started (14 seconds)
 <<Any number of lines inbetween>>
--------------------------------------------------------------------------------
[YYYY-MM-DD 03:05:16] Standard Output for '/Task: CIE':
Mangolia Alien 03:05:16   Phase 2 ended (16 seconds)
 <<Any number of lines inbetween>>
--------------------------------------------------------------------------------
[YYYY-MM-DD 03:05:19] Standard Output for '/Task: UBA':
Mangolia Alien 03:05:19   Phase 0 ended (21 seconds)
 <<Any number of lines inbetween>>
--------------------------------------------------------------------------------
[YYYY-MM-DD 03:05:20] Standard Output for '/Task: UBA':
--------------------------------------------------------------------------------
Mangolia Alien 03:05:20   Phase 1 started (1 second)
 <<Any number of lines inbetween>>
[YYYY-MM-DD 03:05:21] Standard Output for '/Task: UBA':
Mangolia Alien 03:05:21   Phase 1 ended (2 seconds)
 <<Any number of lines inbetween>>
--------------------------------------------------------------------------------
[YYYY-MM-DD 03:05:23] Standard Output for '/Task: CIE':
--------------------------------------------------------------------------------
Mangolia Alien 03:05:23   Phase 3 started (4 seconds)
 <<Any number of lines inbetween>>
--------------------------------------------------------------------------------
[YYYY-MM-DD 03:05:25] Standard Output for '/Task: CIE':
Mangolia Alien 03:05:25   Phase 3 ended (6 seconds)
 <<Any number of lines inbetween>>
--------------------------------------------------------------------------------
[YYYY-MM-DD 03:05:27] Standard Output for '/Task: CIE':
--------------------------------------------------------------------------------
Mangolia Alien 03:05:27   Phase 4 started (1 second)
 <<Any number of lines inbetween>>
--------------------------------------------------------------------------------
[YYYY-MM-DD 03:05:29] Standard Output for '/Task: UBA':
--------------------------------------------------------------------------------
Mangolia Alien 03:05:29   Phase 2 started (6 seconds)
 <<Any number of lines inbetween>>
--------------------------------------------------------------------------------
[YYYY-MM-DD 03:05:31] Standard Output for '/Task: CIE':
Mangolia Alien 03:05:31   Phase 4 ended (4 seconds)
 <<Any number of lines in between>>
--------------------------------------------------------------------------------
[YYYY-MM-DD 03:05:32] Standard Output for '/Task: UBA':
Mangolia Alien 03:05:32   Phase 2 ended (8 seconds)
 <<Any number of lines inbetween>>
 

Very urgent... please lets discuss and find a good solution.

Note:
I tried grep -B3 and grep -A3 to first separate UBA and CIE into two separate files and then fetch respective start and end time.

But -B and -A option is not there in AIX

Thanks,
TechGyaann

---------- Post updated at 06:30 PM ---------- Previous update was at 04:55 PM ----------

Can someone look into and assist!!

@Don Cragun or any other Mods please comment.

How about this:

awk -F "[ \\\]]*" '
function difftime(st,et) {
   split(st,sa,":")
   split(et,ea,":")
   return "(" ea[1]*3600-sa[1]*3600 + ea[2]*60-sa[2]*60 + ea[3]-sa[3] " seconds)"
}
/Task: / { T=$NF; gsub(/['\'':]*$/,"",T) }
/Phase .*started/ { TL[T];TS[T,$5]=$3}
/Phase .*ended/ { TE[T,$5]=$3}
END{
  for(task in TL) {
      print task
      for(phase=0; (task SUBSEP phase) in TS; phase++) {
         if((task SUBSEP phase) in TE)
         print "Phase " phase, TS[task,phase], TE[task,phase],\
               difftime(TS[task,phase],TE[task,phase])
         else print "Phase " phase, TS[task,phase], "Unfinished"
      }
  }
} ' OFS=" " infile

Output:

CIE
Phase 0 03:04:56 Unfinished
Phase 1 03:04:58 Unfinished
Phase 2 03:05:07 03:05:16 (9 seconds)
Phase 3 03:05:23 03:05:25 (2 seconds)
Phase 4 03:05:27 03:05:31 (4 seconds)
UBA
Phase 0 03:05:12 03:05:19 (7 seconds)
Phase 1 03:05:20 03:05:21 (1 seconds)
Phase 2 03:05:29 03:05:32 (3 seconds)
1 Like

Thanks @Chubler_XL.
I will go through the code. try to execute it and comment more on it. :slight_smile:

May I doubt that making a problem "very urgent" (although this is highly deprecated here!), bumping it up as well, and then not coming back for a week should be considered good style?

Chubler_XL, Can you please explain me so that I will understand it.

Eg: File creation , csv Post: 302963081

function difftime(st,et) {
   split(st,sa,":")
   split(et,ea,":")
   return "(" ea[1]*3600-sa[1]*3600 + ea[2]*60-sa[2]*60 + ea[3]-sa[3] " seconds)"
}

difftime(starttime, endtime) - function takes two times in format "hh:mm:ss" and returns "(Y seconds)" where Y is the number of seconds endtime is ahead of starttime.

/Task: / { T=$NF; gsub(/['\'':]*$/,"",T) }

Matches lines that contain "Task: " eg:
[YYYY-MM-DD 03:04:58] Standard Output for '/Task: CIE':

T is assigned to last field of line ("CIE':" for example)
gsub() call replaces ' and : chars with nothing in T (giving "CIE").

Note as the awk program was quoted with ' we have to close the quoted string then escape a single ' with '\ and then start a new quoted string, hence '\'' is required to get a single ' in the code.

/Phase .*started/ { TL[T];TS[T,$5]=$3}

Matches line containing "Phase " followed by "started" eg:

Mangolia Alien 03:05:23   Phase 3 started (4 seconds)

TL[T] build an array TL[] with all task strings
TS[T, $5] build a 2 dimensional array TS with Task,phase# as the index and time as the value

/Phase .*ended/ { TE[T,$5]=$3}

build a 2nd 2 dimensional array TE with Task,phase# as index and as the index and time as the value

END{
  for(task in TL) {
      print task
      for(phase=0; (task SUBSEP phase) in TS; phase++) {
         if((task SUBSEP phase) in TE)
         print "Phase " phase, TS[task,phase], TE[task,phase],\
               difftime(TS[task,phase],TE[task,phase])
         else print "Phase " phase, TS[task,phase], "Unfinished"
      }
  }
}

After the file has been passed go through each task stored in the TL array

for(phase=0; (task SUBSEP phase) in TS; phase++)
starting with phase=0 loop while a phase exists in the TS[] (task start) array for our current task, increment the phase number at the end of each loop.

if their is an entry in TE[] (task end) for this task,phase then print start and end time and call difftime() to display seconds.
otherwise display start time and "Unfinished"

1 Like