Rearrange rows by group pairs

Hello gurus,

I have two variable columns 1 and 2 , and their respective groups in 3 and 4

var1 var2 gr1 gr2
a b g h
c d h g
d f d h
f g h g
d r h d 
p q a b
h y h g
r t g h

I want to rearrange the rows in such a way that all similarly grouped (var1 var2) pairs are together . The similarity rule is (gr1 gr2) pair is the same as (gr2 gr1) pair.

For example
the variable pair (a b) has group (g and h) , group (g and h) pair is equivalent to group (h and g) pair. Since variables (c d) has group pair of ( h and g) also equiavalent to (g and h), these can be clubbed together.

In other words columns $3""$4 is the same as $4""$3

So my desired output is

1 a b g h
1 c d g h
1 h y g h
1 r t g h
1 f g g h
2 d f d h
2 d r d h
3 p q a b

To achieve this I`m trying to put the last 2 columns in an array and output in a sorted way. Then I can sort by the last columns and get my result, , but it gives me a blank output.

awk  '{delete a;  s=x; a[$3];a[$4]; for (i=1;i<=length(a);i++)  {  s =s"__"a};  print $1,$2,s}' infile | sort -k3,3 | head

please assist, row order doesn't matter as long as similar groups are togther. Please note this is made up data, and groups have no alpha or numeric pattern.

Try

awk '
        {if (!(a[$3,$4]))       {if (a[$4,$3])  {X  = $4
                                                 $4 = $3
                                                 $3 = X
                                                }
                                 else            a[$3,$4]++
                                }
        }
1
' file
2 Likes

I think you have made a mistake in your desired output based on your sample input.
here's another alternative - a bit verbose, but....
awk -f sen.awk myFile where sen.awk is:

{ g=(($3>$4)?($4 OFS $3):($3 OFS $4)) }
{ a[g]=(g in a)?a[g] ORS (SUBSEP $1 OFS $2): (SUBSEP $1 OFS $2) }
END {
   for (i in a) {
     gsub(SUBSEP,++j OFS,a)
     gsub(ORS,OFS i ORS,a)
     printf("%s%s\n", a, OFS i)
   }
}

results in

1 a b g h
1 c d g h
1 f g g h
1 h y g h
1 r t g h
2 p q a b
3 d f d h
3 d r d h
1 Like

Another option using sort:

awk '
{
  if (!(a[$3,$4]) && a[$4,$3]) {
    X  = $4
    $4 = $3
    $3 = X
  }
  if (!(a[$3,$4])) a[$3,$4]=++grp;
  $1=a[$3,$4] FS $1
} 1 ' infile | sort -d
1 Like

I'm afraid I missed the leading counter. Try

awk '
NR > 1  {if (!(a[$3,$4]))       {if (a[$4,$3])  {X  = $4
                                                 $4 = $3
                                                 $3 = X
                                                }
                                 else            a[$3,$4] = ++CNT
                                }
         print a[$3,$4], $0
        }

' file | sort -g
1 a b g h
1 c d g h
1 f g g h
1 h y g h
1 r t g h
2 d f d h
2 d r d h
3 p q a b
1 Like