Regarding sorting

vivek.bharadwaj · February 5, 2009, 8:07am

I have the following file. Its an output from a du command with certain conditions attached to it. I used du -ah as I need the 1st column to look human readable. sort -nr is not giving me the output I need, nor is sort -dr. Please help out.

 cat testout

 121K   ./OMautomation/pvd
  14M   ./OMautomation/pqms
  14M   ./OMautomation
  14M   .
   5K   ./shell
   5K   ./backup
   3K   ./dir2

I need the file to be re ordered as

  14M   ./OMautomation/pqms
  14M   ./OMautomation
  14M   .
 121K   ./OMautomation/pvd
    5K   ./shell
    5K   ./backup
    3K   ./dir2

zaxxon · February 5, 2009, 9:24am

sort doesn't know if Megabyte is more than Kilobyte. Even sorting by the letter wont help when it comes to Gigabyte. Since alphabetical order is not the same as size order in descriptions ie.

Size order:

G
M
K

Alphabetical order:

G
K
M

So maybe better produce some output where all the sizes are written with zeroes filling up the size description or either write something yourself that changes it to the needed comparable sizes.

vivek.bharadwaj · February 5, 2009, 9:31am

Well I did have the idea of using sed to replace all G's with six zeros, M's with 3 zeros but the problem is in converting them back...suppose a file has 1000K i dont want it to appear as 1KK after converting it backwards

Is there a way to accept each Column as an array, check one array for the pattern(here M or G or K) and then print the corresponding index of the other arrays? I've tried it but just cant seem to do it right.

Hope to get some answers

zaxxon · February 5, 2009, 9:36am

Could easily have awk parsing K, M, G to add the zeroes and then check the length() of the variable to change it back to K, M, G.

vivek.bharadwaj · February 6, 2009, 1:40am

Well I've discovered another problem with what you've said...now suppose I have a folder with size 1.1 M (indeed i do!) then I guess I cant use this logic.

zaxxon · February 6, 2009, 1:48am

I think at first you should list/count-in all possibilities of the output you want to parse. To make it easier, it is recommended to produce an output that is somewhat in line. As we had this in your other thread already by modifying the output of your "du", I still suggest you get a maybe "less human readable" form of your output and sort it. After that you can still parse it back to a human readable form as I stated in my former post regarding awk as example.

vivek.bharadwaj · February 6, 2009, 1:59am

Fine then, I guess there is no easy way out of this one, most probably will go for the sorting of du -k and then depending on length, will add a multiplying factor (like 1/1024 for e.g). Thanks for the input