I'm basically trying to have awk check the second largest value of the second column of each text file, and cat it to its own text file. There are 27 text files, my idea is to use perl to loop awk through each one, but i use backticks in every loop, hence it's quite slow.
Afterwards it needs to find the third largest value of the text files and cat it to its own text file as well. And then fourth largest, 5th largest etc.
THanks for the help guys, very much appreciated.
EDIT: Net result would look like this for second largest values
my problem has gotten a bit stickier after sorting
my text files actually look like:
133 800
133 799
133 798
133 403
133 402
133 401
I'm looking to simply read the max value of 800, and then ignore values that are within 5 of it, and then read for the second largest value (403 in this case) and then cat these to their own files.
sorry for the mixup!! and thanks for the help so far!
assumed that you have sorted the file. based on your example data:
kent$ echo "133 800
133 799
133 798
133 403
133 402
133 401"|awk 'NR==1{max=$2;next;}max-$2>5{print $0;exit;}'
133 403
if you want to put the line into a new file:
awk 'NR==1{max=$2;next;}max-$2>5{print $0 >> "2ndLargest.txt";exit;}' inputFile
Thanks for that last reply! That really did it for 2nd max, but what can I do for a 3rd largest max with the same "max - $2>5" idea??
It's sort of like this (continued)
pseudocode: select real max;print $0;exit && select second max; print$0;exit && select thirdmax;print$0 exit >> newfile.txt
eventually to obtain: cat newfile.txt
133 800
133 403
133 127
I'm stuck here because the next function seems built in here to find a second max, can I store the second max in a variable and tell it to 'next' that record to find the 3rd maximum? Any tips would be well appreciated. Thanks!
i'm having a hard time understanding the !second{print $0;second=$2;}$2<second{print $0;exit;}' line
.
I want to understand it to tailor it to find 3rd largest maxes with difference greater than 5 (as used in second largest logic), 4th largsest maxes with that same difference, 5th largest, etc. etc.
Thanks for the help so far. I am in unfamiliar territories with awk here.
if a variable is not assigned any value, it returns false in a boolean check. so "!second" means "if variable <second> has not been set, then do {block}"