Hi ,
I want to process the below file and add rules for grouping based on key in unix shell script. For eg
Input file :
I have 3 columns and key column is group
Group status application
G1 Complete A1
G1 Delay A2
G2 Complete A3,A4
G3 Delay A5, A6
G3 Complete A7,A8
Output i need is a single record for each group.
Logic is that if any application in a group is delay status then status of group should be delay, else if all are complete in group then status complete
Group Status Application
G1 Delay A2
G2 Complete A3,A4
G3 Delay A5, A6
i am not able to get the correct solution for this , Please help me with the scripting
A simple approach may be to use the last one:
awk 'NR==1{print; next}{A[$1]=$0} END{for(i in A)print A}' file
The NR==1 is to print the header. If there is no header then you can use:
awk '{A[$1]=$0} END{for(i in A)print A}' file
thanks,
But my question is that how would i add logic to verify that if any group has status complete and delay both , then in the output it should pick delay.
2nd scenario is that if it has only complete then the status would be complete in output
OK, try this:
awk 'NR==FNR{if($2=="Delay")D[$1]; next} !($2=="Complete" && $1 in D)' file file
Note: the file is specified twice (and so gets read twice)..
If all lines for a given group are contiguous in your input file, the following may be better than Scrutinizer's suggestion. It only reads the file once and doesn't care if there are lines with status other than "Complete" and "Delay":
awk '
NR==1 { print
next
}
g != $1 {
if(NR > 2) print o
d = 0
g = $1
}
!d { o = $0
d = ($2 == "Delay")
}
END { print o
}' file
With your sample input, this produces:
Group status application
G1 Delay A2
G2 Complete A3,A4
G3 Delay A5, A6
which duplicates the data found in your sample input, but has spacing that is very different from the output you said you wanted. (I didn't see any obvious way to duplicate the seemingly random spacing in your sample output, and I didn't bother changing the capitalization in your header line.)
If data for some groups is not all on contiguous lines, the following still just reads your input file once:
awk '
NR==1 { print
next
}
D[$1] != "Delay" {
A[$1] = $0
D[$1] = $2
}
END { for(i in A)
print A
}' file
but the output (other than the header) is in random order.
If the output order is important and all lines for a group are not contiguous in your input files, use Scrutinizer's code instead. But note that it won't work if there is a line in your input file with a status other than "Delay" or "Complete".
Good point. If need be that could be remedied like so:
awk 'NR==FNR{if($2==s)D[$1]; next} $2==s || !($1 in D)' s=Delay file file
Hi ,
Thanks for the response
But there is a slight change and one new status has been added
now my input is like
Group status application
G1 BAU A1
G1 Delay A2
G1 Delayed A9
G2 BAU A3
G2 Delayed A4
G3 Delay A5, A6
G3 BAU A7,A8
Expected output is
G1 Delay A2
G2 Delayed A4
G3 Delay A5,A6
so priorties are in the order Delay(1) , Delayed(2) and BAU(3)
if all 3 status are present ( BAU , Delay & delayed) then require Delay in output
if BAU & Delay - Delay
BAU & Delayed - Delayed
Delay and Delayed - Delay
So, some of us showed you code to solve your earlier problem. Did any of our suggestions do what you wanted?
How have you tried to modify our suggestions to meet your new requirements?
Are all entries for a given group contiguous in your input file?
If you can't be bothered to give us any feedback on the suggestions we made before, why should we try to guess at what needs to be done again?
Yes , i am working on using the earlier solution for this new requirement.
Below is the code which i created to use for 3 status but its not efficient .
I was looking for a way to do it with one awk
awk -F ',' 'NR==FNR{if($2=="Delay")D[$1] ; next} ! ($2=="Delayed" && $1 in D || $2=="BAU" && $1 in D) ' temp.dat temp.dat | awk -F ',' '{A[$1]=$0} END{for(i in A)print A }' > temp1.dat
Thanks in advance!