HHow to print the group with a highest value within a set

quincyjones · June 12, 2017, 10:09am

How to print the names with a highest value within a set and filter if it is the only unique group within the same set

input

sets    names    value    groups
j007    shot1    0.6    a
j007    shot2    0.5    b
j007    shot3    0.4    bb
j007    shot4    0.3    bc
j007    shot5    0.2    cd
j008    shot1    0.4    a
j008    shot2    0.3    ab
j009    shot1    0.14    a
j009    shot2    0.13    b
j009    shot3    0.12    bc
j010    shot1    22    a
j010    shot2    19    b
j010    shot3    5    bcd
j011    shot1    5    a
j011    shot2    2    b
j011    shot3    3    c

output

j007    shot1    0.6    a
j009   shot1   0.14 a
j010    shot1    22    a

Tried

sort -k 3,3 input | awk '$3*$3>A[$1]*A[$1]{A[$1]=$0} END{for(i in A) print i,A}'  | sort -k 1,1

rdrtx1 · June 12, 2017, 1:20pm

Clarification added.

quincyjones · June 12, 2017, 4:08pm

Thanks. But the group has to be always unique. Sorry maybe I didn't explain well. For example, first, the group with highest score 'a' should be unique. Means no ab or abc etc. Therefore j008 is not in the output. Second, there should be no other group that should be unique with in the same set. For example, j011 has 3 unique groups. Therefore it should not be in output. Hope that's clear? With your script, j009 is missing and j010 with wrong group is being selected.

rdrtx1 · June 12, 2017, 4:41pm

The output was changed.

quincyjones · June 12, 2017, 4:58pm

Yes that's correct. I forgot to include j009 in the output earlier. My apologies.

RudiC · June 12, 2017, 5:06pm

For my understanding, let me paraphrase your request:
In any set, look for the maximum value Col 3). These are listed below:

sets    names   value   groups  other_groups
j007    shot1   0.6     a       b bb bc cd
j008    shot1   0.4     a       ab
j009    shot1   0.14    a       b bc
j010    shot1   22      a       b bcd
j011    shot1   5       a       b c

Now, if the group(s) of this entry show up in any of the other entries of the same set, suppress this record. If so, ALL entries EXCEPT j008 should be printed, no?

Sorry, you crossposted while I was pondering. So - to be eliminated, the group has to be unique, i.e.one single letter, and this letter may not occur in any of the other, possibly multiletter, groups it the same set, nor may any other single letter group occur in that set?

quincyjones · June 13, 2017, 3:34am

@RudiC: First part is correct. Second part noy exactly. If you do Venn diagram with the letters in the groups of a specific set, you should always see 'a' as a separate group. For example, j007 has this type but not j008. Next, though j009, j010, j011 have 'a' as a separate group, j011 has also 'b' and 'c' as separate groups. Therefore only j007, j009 and j010 are in the output.

---------- Post updated at 04:18 PM ---------- Previous update was at 04:17 PM ----------

@RudiC: Update: yes your update is correct. Sorry for the confusion.

---------- Post updated 06-13-17 at 02:34 AM ---------- Previous update was 06-12-17 at 04:18 PM ----------

@rdrtx1: It is still not working I think. For ex, when I ran the modified script on this input, it not suppose to print 'j007' but instead it prints with group 'a'. This should not be printed because there is another unique group ('e') in the data.

sets    names    value    groups
j007    shot1    0.6    a
j007    shot2    0.5    b
j007    shot3    0.4    bc
j007    shot4    0.3    c
j007    shot5    0.2    cd
j007    shot6    0.1    e

RudiC · June 13, 2017, 4:40am

Please rephrase verbosely and in great detail the conditions; and: what do you mean by "filter"? Eliminate? Print and eliminate others?
For me, the example in post#7 should NOT print, as more than one single letter groups exist. And, why group "e" and not "c"?

quincyjones · June 13, 2017, 5:03am

yeah, sorry. done!
'e' is a single letter group. 'c' is not a single letter group. because there is a group that shares c in it, 'cd'.