nua7
March 25, 2013, 12:18pm
1
Hi All,
I have a file in the format shown below. I need to pick up only those lines which have the highest value in column 4 , based on column1.
The lines marked as bold should be the output. Any help is appreciated!
Input file :
-10.5|6|L|32|MPP||R||012009|E
-10.5|6|L|43|MPF|B|R||012009|E
-10.5|6|L|44|MPF||R||012009|E
-10.5|6|L|54|MR|V|R||012009|E
-10.5|6|L|55|MR|B|R||012009|E
-10.5|6|L|56|MR||R||012009|E
-10.5|6|P|08|MPP||R|F|012009|E
-10.5|6|P|13|MPF|B|R|F|012009|E
-10.5|6|P|14|MPF||R|F|012009|E
-10.5|6|P|22|MR|V|R|F|012009|E
-10.5|6|P|23|MR|B|R|F|012009|E
-10.5|6|P|24|MR||R|F|012009|E
-10.75|6|L|16|||P||012008|E
-10.75|6|L|18|||U||012008|E
-11.3|6|L|51|MR|V|P||012009|E
-11.3|6|P|19|MR|V|P|G|012009|E
-11.4|6|L|35|MR|B|R||IN2010|E
-11.85|6|L|41|MPF|B|P||012009|E
-11.85|6|L|42|MPF||P||012009|E
-11.85|6|L|52|MR|B|P||012009|E
-11.85|6|P|11|MPF|B|P|G|012009|E
-11.85|6|P|12|MPF||P|G|012009|E
-11.85|6|P|20|MR|B|P|G|012009|E
-11.95|6|L|53|MR||P||012009|E
-11.95|6|P|21|MR||P|G|012009|E
-11|6|L|17|||R||012008|E
-12.4|6|L|22|MR|V|R||IN2012|E
-12.4|6|L|34|MR|V|R||IN2010|E
-12.4|6|L|50|MR|V|R||IN2013|E
-12.7|6|L|23|MR|B|R||IN2011|E
-13.67|6|L|16|||P||012007|E
-13.67|6|L|17|||R||012007|E
-13.67|6|L|18|||U||012007|E
-14.92|7|P|09|MPP||P|I|012009|E
-14.93|5|L|25|MPP||P||012009|E
-14.9|5|L|19|MR|B|R||IN2012|E
-14.9|6|L|22|MR|V|R||IN2011|E
-14|6|L|24|MR||R||IN2012|E
-14|6|L|51|MR||R||IN2013|E
-15.14|6|L|16|||P||012006|E
-15.14|6|L|17|||R||012006|E
-15.14|6|L|18|||U||012006|E
-15.18|6|L|33|MPP|V|U||012009|E
-15.18|6|L|34|MPP|E|U||012009|E
-15.18|6|L|35|MPP|V|X||012009|E
-15.18|6|L|36|MPP|E|X||012009|E
-15.18|6|L|45|MPF|B|U||012009|E
-15.18|6|L|46|MPF||U||012009|E
-15.18|6|L|47|MPF|B|X||012009|E
-15.18|6|L|48|MPF||X||012009|E
-15.18|6|L|57|MR|V|U||012009|E
Hi, Try this,
awk -F'\|' 'a[$1]>$4{a[$1]=$4;v[$1]=$0;}END{for(i in v){print v;}}' file
If your file is large then, use sort cmd to sort the file based on col1 and col4 and then, print the last match. Cheers, Ranga:-)
Yoda
March 25, 2013, 12:40pm
4
awk -F\| ' {
if ( $1 in A )
{
split (A[$1], R, "|")
if ( R[4] < $4 )
A[$1] = $0
}
else
A[$1] = $0
} END {
for ( i in A )
print A
} ' file | sort -nr
nua7
March 25, 2013, 12:54pm
5
Thank you all for the replies. But the problem is that your code does not eliminate unwanted entries.
I can use reverse sort command to sort the entries based on 1st and 4th column , but then I will have to print only the first occurence of the string.
sort -ur -k1,4 NPBR2.XTR > tst1
0||P|30|MR||R|S|IN2010|E
0||P|29|MR||R|K|IN2010|E
0||P|28|MR|B|R|S|IN2010|E
0||P|27|MR|B|R|K|IN2010|E
0||P|26|MR|V|R|S|IN2010|E
0||P|25|MR|V|R|K|IN2010|E
0||P|24|MR||P|T|IN2010|E
0||P|23|MR||P|L|IN2010|E
0||P|22|MR|B|P|T|IN2010|E
0||P|21|MR|B|P|L|IN2010|E
0||P|20|MR|V|P|T|IN2010|E
0||P|19|MR|V|P|L|IN2010|E
0||P|15|PDP||X|S|IN2010|E
0||P|14|PDP||X|K|IN2010|E
0||P|13|PDP||R|S|IN2010|E
0||P|12|PDP||R|K|IN2010|E
0||P|11|PDP||P|S|IN2010|E
0||P|10|PDP||P|K|IN2010|E
0|7|P|18|MR||R|H|IN2010|E
0|7|P|17|MR|B|R|H|IN2010|E
0|7|P|16|MR|V|R|H|IN2010|E
0|7|P|15|MR||P|I|IN2010|E
0|7|P|14|MR|B|P|I|IN2010|E
0|7|P|13|MR|V|P|I|IN2010|E
0|7|P|09|PDP||X|H|IN2010|E
0|7|P|08|PDP||R|H|IN2010|E
0|7|P|07|PDP||P|H|IN2010|E
0|7|L|56|MR||A||IN2013|E
rdrtx1
March 25, 2013, 2:27pm
6
try also:
awk '
!col1[$1] {col1[$1]=$1; cnt++; mx[$1]=$4}
{if ($4>=mx[$4]) {mx[$1]=$4; out[cnt-1]=$0}}
END {for (i=0; i<cnt; i++) print out}' FS="|" infile
ctsgnb
March 25, 2013, 3:18pm
7
sort -t\| +0 -1 +3 -4 -r yourfile | awk -F\| '!a[$1]++'
+m Start at the first character of the m+1th field.
-n End at the last character of the nth field (if -N omitted, assume the end of the line).
Just to clarify, if you consider using keys (-k options) and if you consider the line :
-10.5|6|L|32|MPP||R||012009|E
using -k1,4
will use the following key for sorting :
-10.5|6|L|32|MPP||R||012009|E
using -k4
will use the following key for sorting :
-10.5|6|L|32|MPP||R||012009|E
using -k4,4
will use the following key for sorting :
-10.5|6|L|32|MPP||R||012009|E
using -k1,1
will use the following key for sorting :
-10.5|6|L|32|MPP||R||012009|E
using -k1
will use the following key for sorting :
-10.5|6|L|32|MPP||R||012009|E
If no end position is specified the end of line is assumed
so using key :
sort -t\| -k1,1 -k4,4 -r yourfile | awk -F\| '!a[$1]++'
sort -t"|" -k1nr -k4nr filename | awk -F"|" '$1 != index1 { print; index1=$1}'
ctsgnb
March 25, 2013, 4:38pm
9
Won't work : give a test with the example provided and check :
you will get 2 entries whose $1 is -10.5
This is because of the key used ... check my previous post, i have meanwhile updated it.
With a single awk you could also try this which reads the input file twice to preserve the order in the input file and would have modest memory requirements:
awk -F\| 'NR==FNR{if($4>M[$1] || !($1 in M)) M[$1]=$4; next} $4==M[$1]' file file
ctsgnb:
Won't work : give a test with the example provided and check :
you will get 2 entries whose $1 is -10.5
This is because of the key used ... check my previous post, i have meanwhile updated it.
sort -t"|" -k1nr -k4nr filename | awk -F"|" '$1 != index1 { print; index1=$1}'
And this is oputput
-10.5|6|L|56|MR||R||012009|E
-10.75|6|L|18|||U||012008|E
-11|6|L|17|||R||012008|E
-11.3|6|L|51|MR|V|P||012009|E
-11.4|6|L|35|MR|B|R||IN2010|E
-11.85|6|L|52|MR|B|P||012009|E
-11.95|6|L|53|MR||P||012009|E
-12.4|6|L|50|MR|V|R||IN2013|E
-12.7|6|L|23|MR|B|R||IN2011|E
-13.67|6|L|18|||U||012007|E
-14|6|L|51|MR||R||IN2013|E
-14.9|6|L|22|MR|V|R||IN2011|E
-14.92|7|P|09|MPP||P|I|012009|E
-14.93|5|L|25|MPP||P||012009|E
-15.14|6|L|18|||U||012006|E
-15.18|6|L|57|MR|V|U||012009|E
test is done on freeBSD .
ctsgnb
March 26, 2013, 4:12am
12
The test i did with your code was on a linux ubuntu but a reliable code should work independently of the unix distrib.
1 Like
The correct way is to use -k1,1
and -k4,4
instead of -k1
and -k4
.
1 Like