awk second largest, third largest value

I have two text files like this:
file1.txt:

133 10  
133 22
133 13
133 56
133 78
133 98

file2.txt:

158 38
158 67 
158 94
158 17
158  23

I'm basically trying to have awk check the second largest value of the second column of each text file, and cat it to its own text file. There are 27 text files, my idea is to use perl to loop awk through each one, but i use backticks in every loop, hence it's quite slow.

Afterwards it needs to find the third largest value of the text files and cat it to its own text file as well. And then fourth largest, 5th largest etc.

THanks for the help guys, very much appreciated.

EDIT: Net result would look like this for second largest values

133  78
158  67
...
...

you can simply do a numeric sort on second column of your file and then your lines are arranged in required order.

#!/bin/sh

DEPTH=4
while [ $# -gt 0 ]
do
        N=0
        sort -r -n -k 2 < "$1" |
        while read LINE
        do
                echo "$LINE" >> "$N.txt"
                N=`expr $N + 1`
                [ "$N" -ge "$DEPTH" ] && break
        done

        shift
done
$ ./asort.sh file*.txt
$ cat 0.txt
133 98
158 94
$ cat 1.txt
133 78
158 67
$ cat 2.txt
133 56
158 38
$ cat 3.txt
133 22
158 23
$

my problem has gotten a bit stickier after sorting

my text files actually look like:

133 800         
133 799
133 798
133 403
133 402
133 401

I'm looking to simply read the max value of 800, and then ignore values that are within 5 of it, and then read for the second largest value (403 in this case) and then cat these to their own files.

sorry for the mixup!! and thanks for the help so far!

assumed that you have sorted the file. based on your example data:

kent$  echo "133 800         
133 799
133 798
133 403
133 402
133 401"|awk 'NR==1{max=$2;next;}max-$2>5{print $0;exit;}'

133 403

if you want to put the line into a new file:

awk 'NR==1{max=$2;next;}max-$2>5{print $0 >> "2ndLargest.txt";exit;}' inputFile

1 Like

Thanks for that last reply! That really did it for 2nd max, but what can I do for a 3rd largest max with the same "max - $2>5" idea??
It's sort of like this (continued)

133 800 
         133 799 
133 798 
133 403 
133 402 
133 401
133 127
133 126
133 125

pseudocode: select real max;print $0;exit && select second max; print$0;exit && select thirdmax;print$0 exit >> newfile.txt

eventually to obtain: cat newfile.txt

133 800
133 403
133 127

I'm stuck here because the next function seems built in here to find a second max, can I store the second max in a variable and tell it to 'next' that record to find the 3rd maximum? Any tips would be well appreciated. Thanks!

this gives your 2nd and 3rd largest values:

 echo "133 800         
dquote> 133 799
dquote> 133 798
dquote> 133 403
dquote> 133 402
dquote> 133 401"|awk 'NR==1{max=$2;next;}max-$2>5 && !second{print $0;second=$2;}$2<second{print $0;exit;}'

133 403
133 402

output to a file is omitted.

i'm having a hard time understanding the !second{print $0;second=$2;}$2<second{print $0;exit;}' line
.

I want to understand it to tailor it to find 3rd largest maxes with difference greater than 5 (as used in second largest logic), 4th largsest maxes with that same difference, 5th largest, etc. etc.

Thanks for the help so far. I am in unfamiliar territories with awk here.

hi, see this example:

100
99
98
97
---
80
79
78
77

say max=100

if you want the 2nd largest number is 80

2nd=80

, then the next smaller should be the 3rd largest (79)

3rd=79

and you don't have to compare 3rd and max, to check if the difference is > 5.

or you want to ensure that difference between 2nd and 3rd > 5?

---------- Post updated at 17:16 ---------- Previous update was at 17:05 ----------

regarding this line:

 !second{print $0;second=$2;}$2<second{print $0;exit;}' 

if a variable is not assigned any value, it returns false in a boolean check. so "!second" means "if variable <second> has not been set, then do {block}"

yes, i want to ensure between 2nd and 3rd is > 5, and 3rd and 4th is > 5, 4th and 5th > 5, etc. etc.

thanks

Ok.
don't say etc. etc. how many "largest" numbers do you want?
2nd -- ?th
or all as long as possible?

up to 8 largest, with the differences still there (i.e. difference greater than 5 between the previous largest).

thanks for helping me out, I've never gone this far in awk before. But I am learning.

here I made an example, hope that it helps:

a sorted file (descending) named t

kent$  cat t
foo 100
foo 99
foo 98
foo 97
foo 96
foo 95
foo 94
foo 93
foo 92
....
foo 5
foo 4
foo 3
foo 2
foo 1

awk one liner gives you 8 largest number with 5 as difference with previous :


kent$  awk 'NR==1{max=$2;next;}max-$2>5{print $0;max=$2;c++;}c==8{exit}' t
foo 94
foo 88
foo 82
foo 76
foo 70
foo 64
foo 58
foo 52
1 Like

thanks this is perfect! thanks for stickin it out with me!