Rank based on column in Unix

Hello guys

I have requirement to rank a flat file based on column separated by delimeter

I/P

 
1783747|091411|1000
1783747|091411|2000
1783747|091411|2000
1783747|091411|2000
1783747|091411|3000
1783747|091411|4000
1783747|091411|5000
1783747|091411|6000
1783747|091411|7000
1783747|091411|8000
1783747|091411|9000
1783747|091411|10000
 
1783747|091411|1000|1
1783747|091411|2000|2
1783747|091411|2000|2
1783747|091411|2000|2
1783747|091411|3000|3
1783747|091411|4000|4
1783747|091411|5000|5
1783747|091411|6000|6
1783747|091411|7000|7
1783747|091411|8000|8
1783747|091411|9000|9
1783747|091411|10000|10

The rightmost column is rank in ascending order
Please let me know is it possible in unix

It can be done easily in oracle via analytic function if the data is in table
Thanks a lot

here it is:

kent$  echo "1783747|091411|1000
1783747|091411|2000
1783747|091411|2000
1783747|091411|2000
1783747|091411|3000
1783747|091411|4000
1783747|091411|5000
1783747|091411|6000
1783747|091411|7000
1783747|091411|8000
1783747|091411|9000
1783747|091411|10000"|awk -F'|' '{if($3>x){x=$3;i++};print $0"|"i}'
1783747|091411|1000|1
1783747|091411|2000|2
1783747|091411|2000|2
1783747|091411|2000|2
1783747|091411|3000|3
1783747|091411|4000|4
1783747|091411|5000|5
1783747|091411|6000|6
1783747|091411|7000|7
1783747|091411|8000|8
1783747|091411|9000|9
1783747|091411|10000|10
1 Like

Hello Prateek,

There might be no "rank" function available. But you can tweak an awk and can do it.

Something like this:

TESTBOX>awk -F"|" 'NR==1 { a=$NF; $0=$0 OFS $NF/a} NR >1 { $0=$0 OFS $NF/a}1' OFS="|" input_file

It may not be a generic solution but still a begining to explore.

1 Like

@panyam

Your $NF/a part is creative. however it will fail if the $3 is like

...
8000
8100
8200
8300
9000
...

Assuming 3rd field is the one to be sorted

sort -k3 -n -t"|" inputFile | awk -F"|" '{if(x!=$3){i++}{x=$3}}{print $0 FS i}'

--ahamed

@sk1418

Hello,

I did not remeber how the "rank" works(forgot !!!) and hence posted a generic one.

How ever thanks for your valuable post.

Regards
Ravi

Or you could use Perl -

$
$
$ cat f25
1783747|091411|2000
1783747|091411|1000
1783747|091411|2000
1783747|091411|6000
1783747|091411|3000
1783747|091411|4000
1783747|091411|5000
1783747|091411|10000
1783747|091411|7000
1783747|091411|2000
1783747|091411|7001
1783747|091411|9000
$
$
$ sort -t"|" -nk3,3 f25 | perl -F"\|" -lane '$F[3]=$F[2]>$p ? ++$i : $i; print join "|",@F; $p=$F[2]'
1783747|091411|1000|1
1783747|091411|2000|2
1783747|091411|2000|2
1783747|091411|2000|2
1783747|091411|3000|3
1783747|091411|4000|4
1783747|091411|5000|5
1783747|091411|6000|6
1783747|091411|7000|7
1783747|091411|7001|8
1783747|091411|9000|9
1783747|091411|10000|10
$
$

tyler_durden

Actually, the values of the rightmost column mimic the "DENSE_RANK" analytic function of Oracle.

If you want the values of "RANK" analytic function, then -

$
$
$ cat f25
1783747|091411|2000
1783747|091411|1000
1783747|091411|2000
1783747|091411|6000
1783747|091411|3000
1783747|091411|4000
1783747|091411|5000
1783747|091411|10000
1783747|091411|7000
1783747|091411|2000
1783747|091411|7001
1783747|091411|9000
1783747|091411|10000
1783747|091411|10000
1783747|091411|19999
$
$
$ sort -t"|" -nk3,3 f25 | perl -F"\|" -lane '++$n;
                                             if ($F[2] > $p) {++$i; $i = $n if $i < $n}
                                             $F[3] = $i; print join "|",@F; $p=$F[2]'
1783747|091411|1000|1
1783747|091411|2000|2
1783747|091411|2000|2
1783747|091411|2000|2
1783747|091411|3000|5
1783747|091411|4000|6
1783747|091411|5000|7
1783747|091411|6000|8
1783747|091411|7000|9
1783747|091411|7001|10
1783747|091411|9000|11
1783747|091411|10000|12
1783747|091411|10000|12
1783747|091411|10000|12
1783747|091411|19999|15
$
$