sort date issue

Hi Everyone,

[root@sl ~]# cat b
Sat 12 Sep 2009 10:31:49 PM MYT;a;a;a;Sun 13 Sep 2009 11:32:49 AM MYT;
Sat 13 Sep 2009 10:31:49 PM MYT;a;a;a;Mon 14 Sep 2009 10:31:49 PM MYT;
Sat 14 Sep 2009 10:31:49 PM MYT;a;a;a;Sun 13 Sep 2009 10:31:49 PM MYT;
[root@sl ~]# sort -t';' -k5 b
Sat 13 Sep 2009 10:31:49 PM MYT;a;a;a;Mon 14 Sep 2009 10:31:49 PM MYT;
Sat 14 Sep 2009 10:31:49 PM MYT;a;a;a;Sun 13 Sep 2009 10:31:49 PM MYT;
Sat 12 Sep 2009 10:31:49 PM MYT;a;a;a;Sun 13 Sep 2009 11:32:49 AM MYT;
[root@sl ~]# sort -t';' -nk5 b
Sat 12 Sep 2009 10:31:49 PM MYT;a;a;a;Sun 13 Sep 2009 11:32:49 AM MYT;
Sat 13 Sep 2009 10:31:49 PM MYT;a;a;a;Mon 14 Sep 2009 10:31:49 PM MYT;
Sat 14 Sep 2009 10:31:49 PM MYT;a;a;a;Sun 13 Sep 2009 10:31:49 PM MYT;
[root@sl ~]# sort -t';' -Mk5 b
Sat 12 Sep 2009 10:31:49 PM MYT;a;a;a;Sun 13 Sep 2009 11:32:49 AM MYT;
Sat 13 Sep 2009 10:31:49 PM MYT;a;a;a;Mon 14 Sep 2009 10:31:49 PM MYT;
Sat 14 Sep 2009 10:31:49 PM MYT;a;a;a;Sun 13 Sep 2009 10:31:49 PM MYT;
[root@sl ~]#

I tried few sort, but failed, the output should be
Sat 13 Sep 2009 10:31:49 PM MYT;a;a;a;Mon 14 Sep 2009 10:31:49 PM MYT;
Sat 12 Sep 2009 10:31:49 PM MYT;a;a;a;Sun 13 Sep 2009 11:32:49 AM MYT;
Sat 14 Sep 2009 10:31:49 PM MYT;a;a;a;Sun 13 Sep 2009 10:31:49 PM MYT;

Please advice. Thanks

Can you explain about your expected output?
Which field/data is sorted?
huh! could not find out :confused:

As I understand your problem, you want to sort the file on the colored date fields don't you?

Sat 12 Sep 2009 10:31:49 PM MYT;a;a;a;Sun 13 Sep 2009 11:32:49 AM MYT;
Sat 13 Sep 2009 10:31:49 PM MYT;a;a;a;Mon 14 Sep 2009 10:31:49 PM MYT;
Sat 14 Sep 2009 10:31:49 PM MYT;a;a;a;Sun 13 Sep 2009 10:31:49 PM MYT;

If this is the case, you are in trouble with the sort command as it doesn't sort on plain text dates AFAIK.

So, first you have to loop through your file and convert the date field into some sort of ISO date format like 2009-09-13 21:15:07 and than, pipe it into sort.

Or use awk with this suggestion:

# sortdate.awk
BEGIN{
	FS=OFS=";"
	months="Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec"
	split(months, m, " ")
	for (i in m) mm[m]=i
}
{
	split($5, dte, " ")
	split(dte[5], time, ":")
	dteKey=sprintf("%s%02d%s%s%s%s",dte[4],mm[dte[3]],dte[2],(dte[6]=="PM")?time[1]+12:time[1],time[2],time[3])
	record[dteKey$0]=$0
	sortKey[NR]=dteKey$0
}
END{
	n=asort(sortKey)
	for(i=1;i<=n;i++) print record[sortKey]
}
$ cat file
Sat 12 Sep 2009 10:31:49 PM MYT;a;a;a;Sun 13 Sep 2009 11:32:49 AM MYT;
Sat 13 Sep 2009 10:31:49 PM MYT;a;a;a;Mon 14 Sep 2009 10:31:49 PM MYT;
Sat 14 Sep 2009 10:31:49 PM MYT;a;a;a;Sun 13 Sep 2009 10:31:49 PM MYT;
$ awk -f sortdate.awk file
Sat 12 Sep 2009 10:31:49 PM MYT;a;a;a;Sun 13 Sep 2009 11:32:49 AM MYT;
Sat 14 Sep 2009 10:31:49 PM MYT;a;a;a;Sun 13 Sep 2009 10:31:49 PM MYT;
Sat 13 Sep 2009 10:31:49 PM MYT;a;a;a;Mon 14 Sep 2009 10:31:49 PM MYT;

I think the below command is enough for your requirement.

sort -rk 5 f1.txt

I don't think so. Check on a input file like this:

$ cat file
Sat 12 Sep 2009 10:31:49 PM MYT;a;a;a;Sun 13 Nov 2009 11:32:49 AM MYT;
Sat 13 Sep 2009 10:31:49 PM MYT;a;a;a;Mon 14 Sep 2009 10:31:49 PM MYT;
Sat 14 Sep 2009 10:31:49 PM MYT;a;a;a;Sun 13 Sep 2009 10:31:49 PM MYT;

$ sort -rk 5 file
Sat 14 Sep 2009 10:31:49 PM MYT;a;a;a;Sun 13 Sep 2009 10:31:49 PM MYT;
Sat 12 Sep 2009 10:31:49 PM MYT;a;a;a;Sun 13 Nov 2009 11:32:49 AM MYT;
Sat 13 Sep 2009 10:31:49 PM MYT;a;a;a;Mon 14 Sep 2009 10:31:49 PM MYT;

Ohh, I was lost some where. I forgot it's sorting date wise. Thanks a lot.....

Just noticed the OP requested a descending order on date. Assuming the snippet will not be used past year 3000, just change these two lines:

	record[(3*10^13-dteKey)$0]=$0
	sortKey[NR]=(3*10^13-dteKey)$0

Not tested with more data, but something like:

sort -t';' -r -k5.12,5.15 -k5.8,5.10M -k5.5,5.7 -k5.17,5

---------- Post updated 09-15-09 at 06:03 PM ---------- Previous update was 09-14-09 at 10:12 PM ----------

To sort properly, time has to be in 24 hour format. Otherwise it won't work because 12:59pm is one minute earlier than 1:00pm (similar problem for midnight).

Did a little more testing, the following:

sort -t';' -r -k5.12,5.15 -k5.8,5.10rM -k5.5,5.6 -k5.17,5.24 

seemed to work as verified by this test:

gawk '
BEGIN {
  srand()
  for (i = 0; i < 1e4; ++i) {
    x = 1234567890 + int(rand()*1e5)
    printf("%010d;;;;%s\n", x, strftime("%a %d %b %Y %T %p MYT", x))
  }
}' |sort -t';' -r -k5.12,5.15 -k5.8,5.10rM -k5.5,5.6 -k5.17,5.24 |sort -cr

Interesting that "-M" option to sort on months. I learned something today. And of course, the OP - if (s)he is still interested - should prefer the sort command as it is the right tool for the job. However, I was expecting sort to be much faster than my awk snippet. So I ran a comparative test on a sample file of 10000 records (nice file generation snippet, binlib) and the results are:

time sort -t';' -r -k5.12,5.15 -k5.8,5.10rM -k5.5,5.6 -k5.17,5.24 /tmp/time > /dev/null
real 0m1.434s
user 0m1.404s
sys 0m0.028s

time awk -f sortdate.awk /tmp/time > /dev/null
real 0m1.769s
user 0m1.732s
sys 0m0.036s

sort is faster but with a narrower margin that I would have expected.