Head command queries

we have a file as below

AREA,COUNTRY,RANK
A,MX,1
A,MX,2
A,MX,5
A,MX,8
A,IN,7
A,IN,5
A,IN,21
B,CN,6
B,CN,2
B,CN,8
B,CN,0

we need the TOP 2 RANK records for the combination of Area, Country as below. i know

head -2

, which gives top 2 records from file but not sure it lists based on specific fields. plz help me out.

A,MX,8
A,MX,5
A,IN,21
A,IN,7
B,CN,8
B,CN,6

Any attempt from your side?

below is my code

sort -r file.txt > sort.txt
PRE_SIZE=""
content="middle.txt"
rm -f $content
rm -f final.txt
while read line
do
 dt=`echo $line | awk -F, '{print $1,$2}'`
 if [ "$PRE_SIZE" = "" ] || [ "$PRE_SIZE" = "$dt" ]
 then

 echo "$line" >> "${content}"
 PRE_SIZE="$dt"

 else

 cat $content | head -2 >> final.txt
 echo "$line" > "${content}"
 PRE_SIZE="$dt"

 fi

done < sort.txt
cat $content | head -2 >> final.txt

my code is creating temporary files, i feel they may degrade performance in case of huge file.
Any suggestions ???

Try

{ head -1 file; tail -n+2 file | sort -t, -k1,2 -k3nr; } | awk -F, '!T[$1,$2]++ {P=NR+1} NR<=P'
AREA,COUNTRY,RANK
A,IN,21
A,IN,7
A,MX,8
A,MX,5
B,CN,8
B,CN,6

---------- Post updated at 11:19 ---------- Previous update was at 11:16 ----------

You may save a process and a pipe by taking advantage of the fact that once the header is found an printed, any later occurrence will be filtered out by awk:

{ head -1 file; sort -t, -k1,2 -k3nr file; } | awk -F, '!T[$1,$2]++ {P=NR+1} NR<=P'
1 Like

Thanks for the suggestion.
i will research on the awk code you provided :slight_smile:

You can achieve the same result with (recent) bash only:

head -1 file
tail -n+2 file | sort -t, -k1,2 -k3nr |
  while IFS=, read A C R X
    do [ "$TMP" != "$A,$C" ] && CNT=2
       [ $(( CNT-- )) -gt 0 ] && { echo $A,$C,$R      
                                   TMP="$A,$C"
                                 } 
    done

Here is another awk approach:-

awk -F, '
        NR == 1 {
                print
                next
        }
        {
                idx = $1 FS $2
                if ( idx in A )
                {
                        if( A[idx] < $3 )
                        {
                                P[idx] = A[idx]
                                A[idx] = $3
                        }
                }
                else
                        A[idx] = $3
        }
        END {
                for ( k in A )
                        print k, A[k] RS k, P[k]
        }
' OFS=, file

@Yoda: you should provide for $3 less than A[idx] but greater than P[idx]...