[Solved] Sorting a column based on another column

hello,

I have a file as follows:

F0100010 A C     F0100040 A G    BTA-28763-no-rs     77.2692
F0100020 A G      F0100030 A T    BTA-29334-no-rs     11.4989
F0100030 A T      F0100020 A G    BTA-29515-no-rs     127.006
F0100040 A G      F0100010 A C    BTA-29644-no-rs     7.29827
F0100050 A T      F0100050 A T    BTA-32647-no-rs     70.9005

I want to sort the fourth column based on the first column to get the same order.

Thank you in advance for any help.

Please show us you expected output, based on the sample above.

What is the difference between sorting on the fourth column, and sorting 'based on' the fourth column? Do you want to sort on both columns, but group on the fourth?

In short -- what output would you expect for this input?

output:

F0100010 A C      F0100010 A C    BTA-29644-no-rs     7.29827  
F0100020 A G      F0100020 A G    BTA-29515-no-rs     127.006
F0100030 A T      F0100030 A T    BTA-29334-no-rs     11.4989
F0100040 A G      F0100040 A G    BTA-28763-no-rs     77.2692

This is what I want to have.

The difference is that if I only sort the fourth column, it will sort based on the numbers but I want to keep the same order as it exists in the first column. I can not sort the first column too because it is crucial to keep the order.

Oh, I see. You don't want it sorted. You want columns 4 through n of all rows moved such that column 1 lines up with column 4.

Working on it.

1 Like
$ awk 'NR==FNR { ARR[$4]=$0 ; next }; $1 in ARR { A=$1 ; B=$2; C=$3; D=$4 ; $0=ARR[$1]; $1=A; $2=B; $3=C } 1' datafile datafile

F0100010 A C F0100010 A C BTA-29644-no-rs 7.29827
F0100020 A G F0100020 A G BTA-29515-no-rs 127.006
F0100030 A T F0100030 A T BTA-29334-no-rs 11.4989
F0100040 A G F0100040 A G BTA-28763-no-rs 77.2692
F0100050 A T F0100050 A T BTA-32647-no-rs 70.9005

$

Yes, I put the input file into it twice, not a typo. The first time reads all lines into memory and indexes on the fourth column. The second time, it prints out lines, recalling and recombining lines.

1 Like

Great, thanks a lot!

A moment, that doesn't look quite right.

[edit] I was restoring four columns when I only needed three. Remove the $4=D from the code and it works.

1 Like

oh yes you are right, thank you very much!

In general, for an efficient merge operation, you need to have two files.
If you have one file, the shell can open it twice with another file descriptor.

#!/bin/sh
sort -k4,4 infile |
(
# in this sub shell, direct the stdin to &3
exec 3<&0
# now the while loop reads from another stdin
while read f1 f2 f3 junk
do
 read j1 j2 j3 k4 k5 k6 rest <&3
 printf "%s %s %s  %s %s %s  %s\n" "$f1" "$f2" "$f3" "$k4" "$k5" "$k6" "$rest"
done < infile
)
3 Likes