processing the output of AWK

rakeshkumar · October 4, 2011, 7:56am

Hi

my input file is

<so   > < Time  >      <Pid>  <some ro><Job Name> 
111004 04554447      26817  JOB03275 MBPDVLOI
111004 04554473      26817  JOB03275 MBPDVLOI
111004 04554778      26807  JOB03276 MBPDVAWD
111004 04554779      26807  JOB03276 MBPDVAWD
111004 04554780      26817  JOB03276 MBPDVAWD
111004 04554783      26817  JOB03276 MBPDVAWD
111004 04555113      26807  JOB03277 MBPDD090
111004 04555116      26807  JOB03277 MBPDD090
111004 04555117      26817  JOB03277 MBPDD090
111004 04555159      26817  JOB03277 MBPDD090
111004 04555447      26807  JOB03278 MBPDD201
111004 04555449      26807  JOB03278 MBPDD201

the First occurrence of <time> for <Job Name> is starting time
the Fourth occurrence of <time> for <Job Name> is end time of that job

i want to calculate the difference between start time and end time for all the jobs
Note : Every job has exactly 4 entries where 1st entry denotes start time and 4th denotes the end time , there is no rule that all the four occurrences are consecutive

i have tried the

awk 'x[$2,$5]++' FS=" " file.txt

i dont know how to catch each occurrences to variables so that we can process in script

please help me

jayan_jay · October 4, 2011, 8:14am

$ cat testfile
#!/bin/bash
for i in `awk '{print $NF|"sort|uniq"}' infile`
do
        echo "$i -- `sed -n "/$i/,/$i/p" infile | sed -n '1p;$p' | awk '{print $2|"xargs"}' | awk '{print $2-$1}'`"
done
$ sh testfile
MBPDD090 -- 46
MBPDD201 -- 42
MBPDVAWD -- 5
MBPDVLOI -- 26
$

aigles · October 4, 2011, 8:56am

You can do something like that :

awk '
    {
        Times[$5,++Job[$5]] = $2;
    }
    END {
        for (j in Job) {
           printf "%-8s  : %s\n", j, (Job[j] >= 4 ? Times[j,4]-Times[j,1] : "?");
        }
    }
' file.txt

Output for your sample file :

MBPDD201  : ?
MBPDD090  : 46
MBPDVLOI  : ?
MBPDVAWD  : 5

Jean-Pierre.

rakeshkumar · October 4, 2011, 4:19pm

solution is amazing .... thanks a ton aigles

---------- Post updated at 03:19 PM ---------- Previous update was at 02:50 PM ----------

can anyone please explain the below multi dimensional array (if am not wrong) , i tried but could not get exactly how its working
Times[$5,++Job[$5]] = $2;

Chubler_XL · October 4, 2011, 10:17pm

What you have here are two arrays:

Job[] uses Jobname as an index and counts how many lines have been seen for a job sofar.
Times[] has and index of JobName + Line sequence (from Job[]) and stores the Time value (field 2).

When processing line 6 of your input file the arrays will be as follows:

Job[MBPDVLOI]=2
Job[MBPDVAWD]=4

Times[MBPDVLOI,1]=04554447
Times[MBPDVLOI,2]=04554473
Times[MBPDVAWD,1]=04554778
Times[MBPDVAWD,2]=04554779
Times[MBPDVAWD,3]=04554780
Times[MBPDVAWD,4]=04554783