How to Sort by Date and Time?

Hi,

The input file is as follows,

22.06.2012 17:58:38 CPUser: xxxxxxx, billedAfterStatus: Active
13.07.2012 08:46:15 CPUser: xxxxxxx, billedAfterStatus: Active
20.07.2012 08:56:24 CPUser: xxxxxxx, billedAfterStatus: Active
20.03.2012 08:56:24 CPUser: xxxxxxx, billedAfterStatus: Active
20.05.2012 08:56:24 CPUser: xxxxxxx, billedAfterStatus: Active

and we need output a follows,

20.03.2012 08:56:24 CPUser: xxxxxxx, billedAfterStatus: Active
20.05.2012 08:56:24 CPUser: xxxxxxx, billedAfterStatus: Active
22.06.2012 17:58:38 CPUser: xxxxxxx, billedAfterStatus: Active
13.07.2012 08:46:15 CPUser: xxxxxxx, billedAfterStatus: Active
20.07.2012 08:56:24 CPUser: xxxxxxx, billedAfterStatus: Active

which command i have to use?

Please Suggest me.

Regards,
Nanthagopal A

For above input, you can try the below command:-

sort -t'.' -k2.2 -k1.2 file
2 Likes

You can do it with the 'sort' command:

$ sort -k 1.7n,1.10n -k 1.4n,1.5n -k 1.1n,1.2n -k 2.1n,2.2n -k 2.4n,2.5n -k 2.7n,2.8n t
20.03.2012 08:56:24 CPUser: xxxxxxx, billedAfterStatus: Active
20.05.2012 08:56:24 CPUser: xxxxxxx, billedAfterStatus: Active
22.06.2012 17:58:38 CPUser: xxxxxxx, billedAfterStatus: Active
13.07.2012 08:46:15 CPUser: xxxxxxx, billedAfterStatus: Active
20.07.2012 08:56:24 CPUser: xxxxxxx, billedAfterStatus: Active
1 Like

I have used the above code and got the result in sorted order as below.

08.05.2012 10:52:32 User: xxxx
15.05.2012 11:30:38 User: xxxx
22.05.2012 11:56:39 User: xxxx
29.05.2012 16:33:04 User: xxxx
19.06.2012 10:51:55 User: xxxx
26.06.2012 11:04:27 User: xxxx
12.06.2012 14:40:27 User: xxxx
05.06.2012 20:46:18 User: xxxx
18.07.2012 03:13:16 User: xxxx

but, the exact output would be like this,

08.05.2012 10:52:32 User: xxxx
15.05.2012 11:30:38 User: xxxx
22.05.2012 11:56:39 User: xxxx
29.05.2012 16:33:04 User: xxxx
05.06.2012 20:46:18 User: xxxx
12.06.2012 14:40:27 User: xxxx
19.06.2012 10:51:55 User: xxxx
26.06.2012 11:04:27 User: xxxx
18.07.2012 03:13:16 User: xxxx

so, how to change the

sort -t'.' -k2.2 -k1.2 

code and also
explain the above code.

Regards,
Nanthagopal A

sort -t"."  -k3.1,3.4n -k2,2n -k1,1n temp

But this sorts for only years,months,days..let me know if you want to dig into time also..

And yeah, Spacebar's code is working great for even time..you can use that also

No, i don't need for time. Now it's working fine.
Thank you for your reply.

sort -t"."  -k3.1,3.4n -k2,2n -k1,1n

Could you please explain the above code.

Regards,
Nanthagopal A

-t "." 

this option stands for identifying delimiter and here its dot

-k3.1,3.4n

"k start,stop" stands for field to start and stop at...first am sorting on years which happens to be 3rd column..so 3.1 means third column first character until fourth character..
same applies for all the other "k" options...where second is for sorting on months and then sorting on days...

1 Like

Note that if you want to sort by date and time, with the data you have the following sort command will work:

sort -k1.7,1 -k1.4,1.5 -k1.1,1.2 -k2,2 input

Note that this sort uses the default whitespace field delimiter, instead of a period because the time stamps aren't delimited by a period. Note also that since your data has fixed-wdith date and time fields with leading zeros, you can use either an alphanumeric sort or a numeric sort for the date fields. Using an alphanumeric sort for the time stamp allows us to use one key -k2,2 instead of three keys -k2.1,2.2 -k2.4,2.5 -k2.7,2.8 to sort the time. (A numeric sort -k2n,2 would only sort on the hour since ":" is not a numeric character.)
The sort keys (in order) are from the 7th character in the 1st field to the end of the 1st field (year), from the 4th character through the 5th character in the 1st field (month), from the 1st character through the 2nd character in the 1st field (day), and from the start of the 2nd field to the end of the 2nd field (time).

Since your sample input data only had one time stamp per day, I tested the above command using the following input (which duplicates the input you provided but changes the date stamps on the 2nd occurrence of each entry to verify that differences in hours, minutes, and seconds all sort correctly:

input:
08.05.2012 10:52:32 User: xxxx
15.05.2012 11:30:38 User: xxxx
22.05.2012 11:56:39 User: xxxx
29.05.2012 16:33:04 User: xxxx
19.06.2012 10:51:55 User: xxxx
26.06.2012 11:04:27 User: xxxx
12.06.2012 14:40:27 User: xxxx
05.06.2012 20:46:18 User: xxxx
18.07.2012 03:13:16 User: xxxx
08.05.2012 11:52:32 User: xxxx
15.05.2012 10:30:38 User: xxxx
22.05.2012 11:57:39 User: xxxx
29.05.2012 16:32:04 User: xxxx
19.06.2012 10:50:55 User: xxxx
26.06.2012 11:05:27 User: xxxx
12.06.2012 14:40:28 User: xxxx
05.06.2012 20:46:17 User: xxxx
18.07.2012 01:23:45 User: xxxx

output:
08.05.2012 10:52:32 User: xxxx
08.05.2012 11:52:32 User: xxxx
15.05.2012 10:30:38 User: xxxx
15.05.2012 11:30:38 User: xxxx
22.05.2012 11:56:39 User: xxxx
22.05.2012 11:57:39 User: xxxx
29.05.2012 16:32:04 User: xxxx
29.05.2012 16:33:04 User: xxxx
05.06.2012 20:46:17 User: xxxx
05.06.2012 20:46:18 User: xxxx
12.06.2012 14:40:27 User: xxxx
12.06.2012 14:40:28 User: xxxx
19.06.2012 10:50:55 User: xxxx
19.06.2012 10:51:55 User: xxxx
26.06.2012 11:04:27 User: xxxx
26.06.2012 11:05:27 User: xxxx
18.07.2012 01:23:45 User: xxxx
18.07.2012 03:13:16 User: xxxx

With a slight overhead (about double run time) you could use dateutils dconv tool to normalise the data, sort them, and convert them back:

dconv -i '%d.%m.%Y %T' -S <<EOF | sort | dconv -f '%d.%m.%Y %T' -S
22.06.2012 17:58:38 CPUser: xxxxxxx, billedAfterStatus: Active
13.07.2012 08:46:15 CPUser: xxxxxxx, billedAfterStatus: Active
20.07.2012 08:56:24 CPUser: xxxxxxx, billedAfterStatus: Active
20.03.2012 08:56:24 CPUser: xxxxxxx, billedAfterStatus: Active
20.05.2012 08:56:24 CPUser: xxxxxxx, billedAfterStatus: Active

gives you:

20.03.2012 08:56:24 CPUser: xxxxxxx, billedAfterStatus: Active
20.05.2012 08:56:24 CPUser: xxxxxxx, billedAfterStatus: Active
22.06.2012 17:58:38 CPUser: xxxxxxx, billedAfterStatus: Active
13.07.2012 08:46:15 CPUser: xxxxxxx, billedAfterStatus: Active
20.07.2012 08:56:24 CPUser: xxxxxxx, billedAfterStatus: Active

It's slightly more robust than pure sort solutions, see Don's comment for instance.

I guess I don't know what you mean by "more robust".

The sort utility is available on any UNIX system and on any Linux system; the dconv utility is not available on many of those systems. (For example, dconv is not available on OS X.)

The sort command I provided will work for any dates from year 0001 through 9999 as long as all of the dates are in this format. I don't have access to dconv, but I'm guessing that if you add the line:

20.07.2000 08:56:24 CPUser: xxxxxxx, billedAfterStatus: Historical

to your input file, you'll find that it ends up last on the output rather than first.
(And, yes I know that fixing it for this example only involves adding 3 characters to your command line. :rolleyes: ) However, if you happen to be using a programming environment with a 32-bit time_t, your date range is much more limited.

Hi.

The utility msort is available in many repositories. Here's an example solution using it on Don Cragun's recent demonstration data:

#!/usr/bin/env bash

# @(#) s1	Demonstrate ordering by date and time fields, msort.
# See: http://freecode.com/projects/msort

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C msort diff

FILE=${1-data1}

pl " Input data file $FILE:"
cat $FILE

pl " Results:"
msort -q --line -n 1,1 -c date -f d.m.y -n 2,2 -c time $FILE |
tee f1

pe
if diff f1 expected-output.txt
then
  pe " Output matches."
else
  pe " Output mis-matches as noted."
fi

exit 0

producing:

% ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 3.0.0-1-amd64, x86_64
Distribution        : Debian GNU/Linux wheezy/sid 
bash GNU bash 4.1.5
msort 8.53
diff (GNU diffutils) 3.2

-----
 Input data file data1:
08.05.2012 10:52:32 User: xxxx
15.05.2012 11:30:38 User: xxxx
22.05.2012 11:56:39 User: xxxx
29.05.2012 16:33:04 User: xxxx
19.06.2012 10:51:55 User: xxxx
26.06.2012 11:04:27 User: xxxx
12.06.2012 14:40:27 User: xxxx
05.06.2012 20:46:18 User: xxxx
18.07.2012 03:13:16 User: xxxx
08.05.2012 11:52:32 User: xxxx
15.05.2012 10:30:38 User: xxxx
22.05.2012 11:57:39 User: xxxx
29.05.2012 16:32:04 User: xxxx
19.06.2012 10:50:55 User: xxxx
26.06.2012 11:05:27 User: xxxx
12.06.2012 14:40:28 User: xxxx
05.06.2012 20:46:17 User: xxxx
18.07.2012 01:23:45 User: xxxx

-----
 Results:
08.05.2012 10:52:32 User: xxxx
08.05.2012 11:52:32 User: xxxx
15.05.2012 10:30:38 User: xxxx
15.05.2012 11:30:38 User: xxxx
22.05.2012 11:56:39 User: xxxx
22.05.2012 11:57:39 User: xxxx
29.05.2012 16:32:04 User: xxxx
29.05.2012 16:33:04 User: xxxx
05.06.2012 20:46:17 User: xxxx
05.06.2012 20:46:18 User: xxxx
12.06.2012 14:40:27 User: xxxx
12.06.2012 14:40:28 User: xxxx
19.06.2012 10:50:55 User: xxxx
19.06.2012 10:51:55 User: xxxx
26.06.2012 11:04:27 User: xxxx
26.06.2012 11:05:27 User: xxxx
18.07.2012 01:23:45 User: xxxx
18.07.2012 03:13:16 User: xxxx

 Output matches.

See URL mentioned in script comments if msort is not in your repository.

Best wishes ... cheers, drl

I mean that dconv is actually made to parse and operate on dates, sort is not.

True.

You guessed wrong. And I'm curious as to what 3 characters you meant? Also, dconv doesn't depend on time_t at all, so it will work exactly the same on 16-bit, 32-bit and 64-bit systems (all tested).

dconv -S -i '%d.%m.%Y %T' <<EOF | sort | dconv -f '%d.%m.%Y %T' -S
22.06.2012 17:58:38 CPUser: xxxxxxx, billedAfterStatus: Active
13.07.2012 08:46:15 CPUser: xxxxxxx, billedAfterStatus: Active
20.07.2012 08:56:24 CPUser: xxxxxxx, billedAfterStatus: Active
20.03.2012 08:56:24 CPUser: xxxxxxx, billedAfterStatus: Active
20.05.2012 08:56:24 CPUser: xxxxxxx, billedAfterStatus: Active
20.07.2000 08:56:24 CPUser: xxxxxxx, billedAfterStatus: Historical
EOF
  =>
20.07.2000 08:56:24 CPUser: xxxxxxx, billedAfterStatus: Historical
20.03.2012 08:56:24 CPUser: xxxxxxx, billedAfterStatus: Active
20.05.2012 08:56:24 CPUser: xxxxxxx, billedAfterStatus: Active
22.06.2012 17:58:38 CPUser: xxxxxxx, billedAfterStatus: Active
13.07.2012 08:46:15 CPUser: xxxxxxx, billedAfterStatus: Active
20.07.2012 08:56:24 CPUser: xxxxxxx, billedAfterStatus: Active

OK. I'll take your word for it. My system doesn't have dconv and there are no dconv man pages in the sets of man pages provided here on unix.com.

Without access to a dconv man page I (apparently incorrectly) assumed that dconv -S -i '%d.%m.%Y %T' used the strptime() format string argument to convert the text date string into a seconds since the Epoch value, sort sorted the seconds since the Epoch values, and dconv -f '%d.%m.%Y %T' -S converted seconds since the Epoch back to the original text string format using an strftme() date format string. Since the number of seconds since the Epoch in 2012 is represented as a ten digit decimal string in the range 1325xxxxxx-1357xxxxxx and the number of seconds since the Epoch in 2000 was a nine digit string in the range 946xxxxxx-978xxxxxx, I expected the alphanumeric sort specified by sort to sort "9" after "1". If my assumption had been correct, it could have been fixed by telling sort to do a numeric sort sort -n instead of an alphanumeric sort sort .

Ah I see. No, what happens is dconv with no output format specifiers will return ISO 8601 dates (aka 2012-10-31T17:44:00) which are sortable just like that, UNLESS of course you go before the year 1000 or after the year 9999, but ISO isn't defined for those dates anyway :).

1 Like

OK. Thanks. Yes, the ISO 8601 format for a complete representation of the local date and time can be sorted as a single alphanumeric field using a single (default) key in sort.

So with the input file format being discussed in this thread, the commands:

dconv -S -i '%d.%m.%Y %T' < input | sort | dconv -f '%d.%m.%Y %T' -S

and

sort -k1.7,1 -k1.4,1.5 -k1.1,1.2 -k2,2 input

will produce identical results on systems that have the dconv utility installed.

would this work for date and time in the following format;
MM/DD/YY HH:MM

by editing your command as;

sort -t '/' -k2.2, -k1.2

Assuming that all of the year values being sorted have the same value for the first two digits (assuming that the YY is the last two digits of a four digit year) and that there are always the same number of spaces between YY and HH , you could either use:

sort -k1.7,1 -k1.1,1.2 -k1.3,1.4 -k2,2

or

sort -t '/' -k3.1,3.2 -k1,2 -k3.3,3