Please consider the following file, I have many groups which can be of 3 types, T1 (Serial_Number 1) T2 (Serial_Number 2) and T1*T2 (all other Serial_Number).
I want to only consider groups that have both T1 and T2 present and their values are different from each other. In the example file, Group3 and Group 5 are not to be considered for the same reasons.
Important to mention that the data is not sorted, so T1, T2 and T1*T2 rows are scattered in the file, in no particular order.
Group Type Value Serial_number
Group1 T1 aa 1
Group1 T2 tt 2
Group1 T1*T2 at 3
Group1 T1*T2 tt 4
Group2 T1 gg 1
Group2 T2 tt 2
Group2 T1*T2 gg 3
Group2 T1*T2 tt 4
Group2 T1*T2 gt 5
Group3 T1 gg 1
Group3 T2 gg 2
Group3 T1*T2 gg 3
Group3 T1*T2 gg 5
Group4 T1 gg 1
Group4 T2 tt 2
Group4 T1*T2 gt 4
Group4 T1*T2 gg 5
Group5 T1 gg 1
Group5 T1*T2 gt 5
I want to add a column to the output , only for types T1*T2 that states if they match the corrsponding value of T1 in the group, or T2 in the group or doesnt match any of T1 or T2.
For example for Group1, the value of T1*T2 (Serial_number 3) is 'at' which
doesnt match its T1 value of 'aa' or T2 value of 'tt'. So it is 'different'
For Group1, the value of T1*T2 (Serial_number 4) is 'tt' which matches T2 value of 'tt' , so it assigned 'T2-like'
Group Type Value Serial_number Similar_To
Group1 T1*T2 at 3 different
Group1 T1*T2 tt 4 T2-like
Group2 T1*T2 gg 3 T1-like
Group2 T1*T2 tt 4 T2-like
Group2 T1*T2 gt 5 different
Group4 T1*T2 gt 4 different
Group4 T1*T2 gg 5 T1-like
This is my feeble attempt, which doesn't work.
awk
' {
if(!($1 in grp)) {
grp[$1]++
type[$1]=$2
val[$1,1]=$3 FS $4
next
}
NR != 1 {
grp[$1]++
type[$2]++
val[$3]++
a[$1,$2]=a[$1,$2]" "$3
if($3=="T1") categ="T1-like"
else if ($3=="P2") categ="T2-like"
else categ="different"
if($3="T1*T2")
for (i=1;i<length(grp);i++)
if (grp==grp[i-1])
catg1=categ
print $1 FS $2 FS $3 FS $4 FS catg1
}' infile