Help with sort only column 2 data separately

Input File

Contig_1_294435nt       242231  242751
Contig_1_294435nt       242390  242782
Contig_1_294435nt       242390  242782
Contig_1_294435nt       291578  291668
Contig_2_242278nt       75910   76271
Contig_2_242278nt       76036   76316
Contig_2_242278nt       76036   76316
Contig_3_206988nt       100775  100835
Contig_3_206988nt       100775  100835
Contig_3_206988nt       156143  157111
Contig_3_206988nt       156147  157111
Contig_3_206988nt       178412  178537
Contig_3_206988nt       178729  178896
Contig_3_206988nt       179269  179510
Contig_3_206988nt       179269  179517
Contig_3_206988nt       179269  180441
Contig_3_206988nt       180033  180441
Contig_3_206988nt       180043  180441
Contig_3_206988nt       39664   39742
Contig_3_206988nt       39666   39743
Contig_3_206988nt       41610   41684
Contig_3_206988nt       41617   41684
.
.

Output File

Contig_1_294435nt       242231  242751
Contig_1_294435nt       242390  242782
Contig_1_294435nt       242390  242782
Contig_1_294435nt       291578  291668
Contig_2_242278nt       75910   76271
Contig_2_242278nt       76036   76316
Contig_2_242278nt       76036   76316
Contig_3_206988nt       39664   39742
Contig_3_206988nt       39666   39743
Contig_3_206988nt       41610   41684
Contig_3_206988nt       41617   41684
Contig_3_206988nt       100775  100835
Contig_3_206988nt       100775  100835
Contig_3_206988nt       156143  157111
Contig_3_206988nt       156147  157111
Contig_3_206988nt       178412  178537
Contig_3_206988nt       178729  178896
Contig_3_206988nt       179269  179510
Contig_3_206988nt       179269  179517
Contig_3_206988nt       179269  180441
Contig_3_206988nt       180033  180441
Contig_3_206988nt       180043  180441
.
.

I would like to sort column 2 data separately and fix the column 1.
I did try below command, but it no work:

sort -k2n Input_File
Contig_3_206988nt       39664   39742
Contig_3_206988nt       39666   39743
Contig_3_206988nt       41610   41684
Contig_3_206988nt       41617   41684
Contig_2_242278nt       75910   76271
Contig_2_242278nt       76036   76316
Contig_2_242278nt       76036   76316
Contig_3_206988nt       100775  100835
Contig_3_206988nt       100775  100835
Contig_3_206988nt       156143  157111
Contig_3_206988nt       156147  157111
Contig_3_206988nt       178412  178537
Contig_3_206988nt       178729  178896
Contig_3_206988nt       179269  179510
Contig_3_206988nt       179269  179517
Contig_3_206988nt       179269  180441
Contig_3_206988nt       180033  180441
Contig_3_206988nt       180043  180441
Contig_1_294435nt       242231  242751
Contig_1_294435nt       242390  242782
Contig_1_294435nt       242390  242782
Contig_1_294435nt       291578  291668
.
.

Above command will sort the column 2 from smallest to largest but at the same time it will change the column 1 data as well :frowning:
Which is not what I desired output.

Thanks for any advice.

---------- Post updated at 03:27 AM ---------- Previous update was at 03:14 AM ----------

I think the below command works :slight_smile:

sort -k1,1 -k2n Input_File
Contig_1_294435nt       242231  242751
Contig_1_294435nt       242390  242782
Contig_1_294435nt       242390  242782
Contig_1_294435nt       291578  291668
Contig_2_242278nt       75910   76271
Contig_2_242278nt       76036   76316
Contig_2_242278nt       76036   76316
Contig_3_206988nt       39664   39742
Contig_3_206988nt       39666   39743
Contig_3_206988nt       41610   41684
Contig_3_206988nt       41617   41684
Contig_3_206988nt       100775  100835
Contig_3_206988nt       100775  100835
Contig_3_206988nt       156143  157111
Contig_3_206988nt       156147  157111
Contig_3_206988nt       178412  178537
Contig_3_206988nt       178729  178896
Contig_3_206988nt       179269  179510
Contig_3_206988nt       179269  179517
Contig_3_206988nt       179269  180441
Contig_3_206988nt       180033  180441
Contig_3_206988nt       180043  180441
.
.

How about

sort -k1,1 -k2,2n file
1 Like

If you really want to keep column 1 as it is and sort the 2nd and 3rd columns by increasing numeric value of the 2nd column (instead of sorting column 2 values with column 1 values, you could try:

#!/bin/ksh
tmpfn=${0##*/}.$$

trap 'rm -rf "$tmpfn"' EXIT

awk -v tmpfn="$tmpfn" '
BEGIN {	OFS = "\t"
	cmd = "sort -k1n > " tmpfn
}
{	f1[++lc] = $1
	print $2, $3 | cmd
}
END {	close(cmd)
	for(i = 1; i <= lc; i++) {
		getline < tmpfn
		print f1, $1, $2
	}
	close(tmpfn)
}' Input_File

which, with your sample input (without the .... lines) produces the output:

Contig_1_294435nt	39664	39742
Contig_1_294435nt	39666	39743
Contig_1_294435nt	41610	41684
Contig_1_294435nt	41617	41684
Contig_2_242278nt	75910	76271
Contig_2_242278nt	76036	76316
Contig_2_242278nt	76036	76316
Contig_3_206988nt	100775	100835
Contig_3_206988nt	100775	100835
Contig_3_206988nt	156143	157111
Contig_3_206988nt	156147	157111
Contig_3_206988nt	178412	178537
Contig_3_206988nt	178729	178896
Contig_3_206988nt	179269	179510
Contig_3_206988nt	179269	179517
Contig_3_206988nt	179269	180441
Contig_3_206988nt	180033	180441
Contig_3_206988nt	180043	180441
Contig_3_206988nt	242231	242751
Contig_3_206988nt	242390	242782
Contig_3_206988nt	242390	242782
Contig_3_206988nt	291578	291668

Of course, a lot of error checking should be added in the END clause, but this seems to work as a proof of concept.

As always, if you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk .

This was written and tested using a 1993+ version of the Korn shell, but will work with any POSIX-conforming shell.

1 Like