Add rules for multiple lines processing

ashishagg2005 · July 17, 2014, 2:07am

Hi ,

I want to process the below file and add rules for grouping based on key in unix shell script. For eg

Input file :

I have 3 columns and key column is group

Group  status application
G1     Complete   A1
G1     Delay        A2
G2     Complete   A3,A4
G3     Delay        A5, A6
G3     Complete   A7,A8

Output i need is a single record for each group.

Logic is that if any application in a group is delay status then status of group should be delay, else if all are complete in group then status complete

Group Status Application
G1    Delay    A2
G2    Complete A3,A4
G3    Delay     A5, A6

i am not able to get the correct solution for this , Please help me with the scripting

Scrutinizer · July 17, 2014, 3:34am

A simple approach may be to use the last one:

awk 'NR==1{print; next}{A[$1]=$0} END{for(i in A)print A}' file

The NR==1 is to print the header. If there is no header then you can use:

awk '{A[$1]=$0} END{for(i in A)print A}' file

ashishagg2005 · July 17, 2014, 3:58am

thanks,

But my question is that how would i add logic to verify that if any group has status complete and delay both , then in the output it should pick delay.
2nd scenario is that if it has only complete then the status would be complete in output

Scrutinizer · July 17, 2014, 4:23am

OK, try this:

awk 'NR==FNR{if($2=="Delay")D[$1]; next} !($2=="Complete" && $1 in D)' file file

Note: the file is specified twice (and so gets read twice)..

Don_Cragun · July 17, 2014, 5:27am

If all lines for a given group are contiguous in your input file, the following may be better than Scrutinizer's suggestion. It only reads the file once and doesn't care if there are lines with status other than "Complete" and "Delay":

awk '
NR==1 {	print
	next
}
g != $1 {
	if(NR > 2) print o
	d = 0
	g = $1
}
!d {	o = $0
	d = ($2 == "Delay")
}
END {	print o
}' file

With your sample input, this produces:

Group  status application
G1     Delay        A2
G2     Complete   A3,A4
G3     Delay        A5, A6

which duplicates the data found in your sample input, but has spacing that is very different from the output you said you wanted. (I didn't see any obvious way to duplicate the seemingly random spacing in your sample output, and I didn't bother changing the capitalization in your header line.)

If data for some groups is not all on contiguous lines, the following still just reads your input file once:

awk '
NR==1 {	print
	next
}
D[$1] != "Delay" {
	A[$1] = $0
	D[$1] = $2
}
END {	for(i in A)
		print A
}' file

but the output (other than the header) is in random order.

If the output order is important and all lines for a group are not contiguous in your input files, use Scrutinizer's code instead. But note that it won't work if there is a line in your input file with a status other than "Delay" or "Complete".

Scrutinizer · July 17, 2014, 5:43am

Good point. If need be that could be remedied like so:

awk 'NR==FNR{if($2==s)D[$1]; next} $2==s || !($1 in D)' s=Delay file file

ashishagg2005 · July 18, 2014, 3:46am

Hi ,
Thanks for the response

But there is a slight change and one new status has been added

now my input is like

Group  status application
G1     BAU        A1
G1     Delay        A2
G1     Delayed     A9
G2     BAU   A3 
G2     Delayed    A4
G3     Delay        A5, A6
G3     BAU   A7,A8

Expected output is

G1   Delay A2
G2   Delayed  A4
G3   Delay A5,A6

so priorties are in the order Delay(1) , Delayed(2) and BAU(3)

if all 3 status are present ( BAU , Delay & delayed) then require Delay in output
if BAU & Delay - Delay
BAU & Delayed - Delayed
Delay and Delayed - Delay

Don_Cragun · July 18, 2014, 4:03am

So, some of us showed you code to solve your earlier problem. Did any of our suggestions do what you wanted?

How have you tried to modify our suggestions to meet your new requirements?

Are all entries for a given group contiguous in your input file?

If you can't be bothered to give us any feedback on the suggestions we made before, why should we try to guess at what needs to be done again?

ashishagg2005 · July 18, 2014, 6:19am

Yes , i am working on using the earlier solution for this new requirement.

Below is the code which i created to use for 3 status but its not efficient .
I was looking for a way to do it with one awk

awk -F ','  'NR==FNR{if($2=="Delay")D[$1] ; next} ! ($2=="Delayed" && $1 in D || $2=="BAU" && $1 in D) ' temp.dat temp.dat | awk -F ',' '{A[$1]=$0} END{for(i in A)print A }' > temp1.dat

Thanks in advance!