HELP: I need to sort a text file in an uncommon manner, can't get desired results

Hi All

I have a flat text file. Each line in it contains a "/full path/filename". The last three columns are predictable, but directory depth of each line varies.

I want to sort on the last three columns, starting from the last, 2nd last and 3rd last. In that order. The last three columns have a constant column width:

/../...../.../2009/365/2300.Z

Last col(6): 24h time 0000.Z -> 2300.Z (compressed file)
2nd Last col(3}: Day of Year, 001 -> 365 (366 in leap years)
3rd Last col(4): Year

I'll explain with an example:

  • desired sort key: last col, 2nd last col, 3rd last col

Example: (unsorted)

cat dirList.txt

/home/jake/quarterly/2009/307/1300.Z
/home/jake/quarterly/2009/303/1400.Z
/home/jake/bimonthly/submitted/2009/007/1800.Z
/home/jake/yearly/2009/199/2300.Z

As you can see, the number of columns in this example vary, however, regardless the number of columns, I would like to sort the lines based on last, 2nd last & 3rd last columns, the results should look like this:

cat dirList.txt | unix | magic | commands | here

/home/jake/bimonthly/submitted/2009/007/1800.Z
/home/jake/yearly/2009/199/2300.Z
/home/jake/quarterly/2009/303/1400.Z
/home/jake/quarterly/2009/307/1300.Z

Perhaps an awk + sort command line combo will do the trick, but I'm Google'd out on searching for an answer.

Any help would be greatly appreciated.

Tks,
Jake.

This should do it:

bash-3.2$ cat dirlist.txt
/home/jake/quarterly/2009/307/1300.Z
/home/jake/quarterly/2009/303/1400.Z
/home/jake/bimonthly/submitted/2009/007/1800.Z
/home/jake/yearly/2009/199/2300.Z
bash-3.2$
bash-3.2$
bash-3.2$ perl -le 'print sort { @a=($a=~m#(\d{4})/(\d{3})/(\d{4})\.Z$#);
@b=($b=~m#(\d{4})/(\d{3})/(\d{4})\.Z$#);
$a[0].$a[1].$a[2] <=> $b[0].$b[1].$b[2] } <>' dirlist.txt
/home/jake/bimonthly/submitted/2009/007/1800.Z
/home/jake/yearly/2009/199/2300.Z
/home/jake/quarterly/2009/303/1400.Z
/home/jake/quarterly/2009/307/1300.Z

sed bonanza: :cool:

sed 's|.*\(/[^/]\+/[^/]\+/[^/]\+\)|\1 &|' dirlist.txt| sort -nt'/'|sed 's|[^ ]* ||'
/home/jake/bimonthly/submitted/2009/007/1800.Z
/home/jake/yearly/2009/199/2300.Z
/home/jake/quarterly/2009/303/1400.Z
/home/jake/quarterly/2009/307/1300.Z

pludi:

Awesome, that did it. I keep forgetting the valuable magic (and complexity) of perl.

A thousand thank-you's.

J.

Scrutinizer:

The 'sed' command example did sort from low day of year to highest in each sub-directory. However, I blame my poor wording of my original question. What I needed in the output was a total sort of all file names in a quasi-chronological order (based on filename), oldest first so the output would contain the oldest files, regardless of subdirectory depth, at the top of the file.

Thank you very much for your answer!

J.

Hi Jake, that is very gentle since you really wrote that quite clearly :o. Anyway just for completeness then:

sed 's|.*\(/[^/]\+\)\(/[^/]\+\)\(/[^/]\+\)|\3\2\1 &|' dirlist.txt| sort -nt'/'|sed 's|[^ ]* ||'

You can do it with gawk:

WHINY_USERS=1 gawk -F/ '{a[$(NF-2),$(NF-1),$NF]=$0}END{for(i in a)print a}'

or

gawk -F/ '{a[$(NF-2), $(NF-1), $NF] = $0}
END { n = asorti(a, b); for (i = 1; i <= n; ++i) print a[b]}'

Excellent!

That did exactly what I needed.

And I just remember mom saying "Never bite the hand that feeds you" :wink:

Thanks for your help!

J.