Sorting with header and mixed numerals (scientific and decimal) | awk

Assoc.txt
 CHR         SNP         BP   A1       TEST    NMISS         OR         STAT            P 
   1   rs2980319     766985    A        ADD     4154      1.024       0.1623       0.8711
   1   rs2980319     766985    A     AGECAT     4154      1.371        6.806    1.003e-11
   1   rs2980319     766985    A        EV1     4154   1.68e-30       -17.51      1.2e-68
   1   rs2980319     766985    A        EV2     4154   3.42e-12       -7.966    1.645e-15
   1   rs2980319     766985    A        EV3     4154    0.03088      -0.5361       0.5919
   1   rs2980319     766985    A        EV4     4154      17.18       0.2873       0.7739
   1  rs11260595    1028961    T        ADD     4158      1.004      0.01069       0.9915
   1  rs11260595    1028961    T     AGECAT     4158      1.364        6.671     2.54e-11
   1  rs11260595    1028961    T        EV1     4158  1.318e-30       -17.52    1.043e-68

I want to get only the lines where TEST (Col. 5) has value "ADD"; then
I want to sort P (Col.9) in ascending order, but also keep the column headers intact without getting sorted out.

Using awk, I can do the following

>awk '{if($5=="ADD") {print $0}}' OFS='\t'  Assoc.txt |sort -nrk9

This however, removes the Header Row
It also does not sort Col.9 properly, because I have values that are in scientific notation (3.457e-05) and decimals ( 0.0001029)

1st command, something like

head -1 <Assoc.txt >Assoc.srt

2nd command, like

blah blah >>Assoc.srt

Sorry about that blah blah, but going to have to think about that scientific notation.
You might be able to awk select for $5=ADD or $5=TEST, but that might create some weirdness in your later sorting. Thus, why I suggested two steps.

---------- Post updated at 01:48 PM ---------- Previous update was at 01:46 PM ----------

A couple of posts on scientific notation:

try this:

nawk 'FNR==1{print | "cat 1>&2";next}$5=="ADD" {print $0}' OFS='\t'  Assoc.txt |sort -nrk9
awk 'FNR==1{print | "cat 1>&2";next}$5=="ADD" {print $0}' OFS='\t'  Assoc.txt  |sort -gk9 >Assoc_sorted.TXT
CHR         SNP         BP   A1       TEST    NMISS         OR         STAT            P 
FL_NoOutlr :~>head Foll_LRM_Sorted.TXT
   6   rs4530903   32689867    A        ADD     4151      2.084        5.538    3.067e-08
   6   rs9268853   32537621    C        ADD     4182      1.749        5.335    9.552e-08
   6   rs9469220   32766288    C        ADD     4164     0.5633       -5.237    1.628e-07

What happens is that it prints the header on to the terminal and the rest is piped out to the file output.
Also found out that -gk9 works for scientific notation. -nrk9 did not work as expected.

nawk 'FNR==1{print >out;next}$5=="ADD" {print $0}' OFS='\t' out=Assoc_sorted.TXT  Assoc.txt |sort -gk9 >>Assoc_sorted.TXT
1 Like

Or:

awk 'NR==1{print;next} $5=="ADD"{print | "sort -grk9"}' OFS="\t" file
1 Like

Give descending order . Otherwise good solution and slightly faster.
Thank you Franklin52 and vgersh99 :):b:
I changed

sort -grk9 to sort -gk9