Rearrange rows by group pairs

senhia83 · June 19, 2018, 4:53pm

Hello gurus,

I have two variable columns 1 and 2 , and their respective groups in 3 and 4

var1 var2 gr1 gr2
a b g h
c d h g
d f d h
f g h g
d r h d 
p q a b
h y h g
r t g h

I want to rearrange the rows in such a way that all similarly grouped (var1 var2) pairs are together . The similarity rule is (gr1 gr2) pair is the same as (gr2 gr1) pair.

For example
the variable pair (a b) has group (g and h) , group (g and h) pair is equivalent to group (h and g) pair. Since variables (c d) has group pair of ( h and g) also equiavalent to (g and h), these can be clubbed together.

In other words columns $3""$4 is the same as $4""$3

So my desired output is

1 a b g h
1 c d g h
1 h y g h
1 r t g h
1 f g g h
2 d f d h
2 d r d h
3 p q a b

To achieve this I`m trying to put the last 2 columns in an array and output in a sorted way. Then I can sort by the last columns and get my result, , but it gives me a blank output.

awk  '{delete a;  s=x; a[$3];a[$4]; for (i=1;i<=length(a);i++)  {  s =s"__"a};  print $1,$2,s}' infile | sort -k3,3 | head

please assist, row order doesn't matter as long as similar groups are togther. Please note this is made up data, and groups have no alpha or numeric pattern.

RudiC · June 19, 2018, 5:37pm

Try

awk '
        {if (!(a[$3,$4]))       {if (a[$4,$3])  {X  = $4
                                                 $4 = $3
                                                 $3 = X
                                                }
                                 else            a[$3,$4]++
                                }
        }
1
' file

vgersh99 · June 19, 2018, 6:07pm

I think you have made a mistake in your desired output based on your sample input.
here's another alternative - a bit verbose, but....
awk -f sen.awk myFile where sen.awk is:

{ g=(($3>$4)?($4 OFS $3):($3 OFS $4)) }
{ a[g]=(g in a)?a[g] ORS (SUBSEP $1 OFS $2): (SUBSEP $1 OFS $2) }
END {
   for (i in a) {
     gsub(SUBSEP,++j OFS,a)
     gsub(ORS,OFS i ORS,a)
     printf("%s%s\n", a, OFS i)
   }
}

results in

1 a b g h
1 c d g h
1 f g g h
1 h y g h
1 r t g h
2 p q a b
3 d f d h
3 d r d h

Chubler_XL · June 19, 2018, 8:07pm

Another option using sort:

awk '
{
  if (!(a[$3,$4]) && a[$4,$3]) {
    X  = $4
    $4 = $3
    $3 = X
  }
  if (!(a[$3,$4])) a[$3,$4]=++grp;
  $1=a[$3,$4] FS $1
} 1 ' infile | sort -d

RudiC · June 20, 2018, 2:44am

I'm afraid I missed the leading counter. Try

awk '
NR > 1  {if (!(a[$3,$4]))       {if (a[$4,$3])  {X  = $4
                                                 $4 = $3
                                                 $3 = X
                                                }
                                 else            a[$3,$4] = ++CNT
                                }
         print a[$3,$4], $0
        }

' file | sort -g
1 a b g h
1 c d g h
1 f g g h
1 h y g h
1 r t g h
2 d f d h
2 d r d h
3 p q a b