Hello guys
I have requirement to rank a flat file based on column separated by delimeter
I/P
1783747|091411|1000
1783747|091411|2000
1783747|091411|2000
1783747|091411|2000
1783747|091411|3000
1783747|091411|4000
1783747|091411|5000
1783747|091411|6000
1783747|091411|7000
1783747|091411|8000
1783747|091411|9000
1783747|091411|10000
1783747|091411|1000|1
1783747|091411|2000|2
1783747|091411|2000|2
1783747|091411|2000|2
1783747|091411|3000|3
1783747|091411|4000|4
1783747|091411|5000|5
1783747|091411|6000|6
1783747|091411|7000|7
1783747|091411|8000|8
1783747|091411|9000|9
1783747|091411|10000|10
The rightmost column is rank in ascending order
Please let me know is it possible in unix
It can be done easily in oracle via analytic function if the data is in table
Thanks a lot
sk1418
2
here it is:
kent$ echo "1783747|091411|1000
1783747|091411|2000
1783747|091411|2000
1783747|091411|2000
1783747|091411|3000
1783747|091411|4000
1783747|091411|5000
1783747|091411|6000
1783747|091411|7000
1783747|091411|8000
1783747|091411|9000
1783747|091411|10000"|awk -F'|' '{if($3>x){x=$3;i++};print $0"|"i}'
1783747|091411|1000|1
1783747|091411|2000|2
1783747|091411|2000|2
1783747|091411|2000|2
1783747|091411|3000|3
1783747|091411|4000|4
1783747|091411|5000|5
1783747|091411|6000|6
1783747|091411|7000|7
1783747|091411|8000|8
1783747|091411|9000|9
1783747|091411|10000|10
1 Like
panyam
3
Hello Prateek,
There might be no "rank" function available. But you can tweak an awk and can do it.
Something like this:
TESTBOX>awk -F"|" 'NR==1 { a=$NF; $0=$0 OFS $NF/a} NR >1 { $0=$0 OFS $NF/a}1' OFS="|" input_file
It may not be a generic solution but still a begining to explore.
1 Like
sk1418
4
@panyam
Your $NF/a part is creative. however it will fail if the $3 is like
...
8000
8100
8200
8300
9000
...
Assuming 3rd field is the one to be sorted
sort -k3 -n -t"|" inputFile | awk -F"|" '{if(x!=$3){i++}{x=$3}}{print $0 FS i}'
--ahamed
panyam
6
@sk1418
Hello,
I did not remeber how the "rank" works(forgot !!!) and hence posted a generic one.
How ever thanks for your valuable post.
Regards
Ravi
Or you could use Perl -
$
$
$ cat f25
1783747|091411|2000
1783747|091411|1000
1783747|091411|2000
1783747|091411|6000
1783747|091411|3000
1783747|091411|4000
1783747|091411|5000
1783747|091411|10000
1783747|091411|7000
1783747|091411|2000
1783747|091411|7001
1783747|091411|9000
$
$
$ sort -t"|" -nk3,3 f25 | perl -F"\|" -lane '$F[3]=$F[2]>$p ? ++$i : $i; print join "|",@F; $p=$F[2]'
1783747|091411|1000|1
1783747|091411|2000|2
1783747|091411|2000|2
1783747|091411|2000|2
1783747|091411|3000|3
1783747|091411|4000|4
1783747|091411|5000|5
1783747|091411|6000|6
1783747|091411|7000|7
1783747|091411|7001|8
1783747|091411|9000|9
1783747|091411|10000|10
$
$
tyler_durden
Actually, the values of the rightmost column mimic the "DENSE_RANK" analytic function of Oracle.
If you want the values of "RANK" analytic function, then -
$
$
$ cat f25
1783747|091411|2000
1783747|091411|1000
1783747|091411|2000
1783747|091411|6000
1783747|091411|3000
1783747|091411|4000
1783747|091411|5000
1783747|091411|10000
1783747|091411|7000
1783747|091411|2000
1783747|091411|7001
1783747|091411|9000
1783747|091411|10000
1783747|091411|10000
1783747|091411|19999
$
$
$ sort -t"|" -nk3,3 f25 | perl -F"\|" -lane '++$n;
if ($F[2] > $p) {++$i; $i = $n if $i < $n}
$F[3] = $i; print join "|",@F; $p=$F[2]'
1783747|091411|1000|1
1783747|091411|2000|2
1783747|091411|2000|2
1783747|091411|2000|2
1783747|091411|3000|5
1783747|091411|4000|6
1783747|091411|5000|7
1783747|091411|6000|8
1783747|091411|7000|9
1783747|091411|7001|10
1783747|091411|9000|11
1783747|091411|10000|12
1783747|091411|10000|12
1783747|091411|10000|12
1783747|091411|19999|15
$
$