Multi line sorting in Linux

I have log files with following format -

YYYY/MM/DD HH:mm:ss.msec|field2|filed3|  log message

Now the message itself can be multi line message containing new line character.
for e.g.

2013/02/05 15:33:12.234|abc|xyz| This is first single line message.
2013/02/05 15:33:12.786|abc|xyz| This is a multiple
       line message continued
to many lines.
2013/02/05 15:33:12.413|abc|xyz| This is second single line message.
2013/02/05 15:33:12.945|abc|xyz| This is last single line message.

I would like to sort this file based on time stamp ascending order, i.e., output like this -

2013/02/05 15:33:12.234|abc|xyz| This is first single line message.
2013/02/05 15:33:12.413|abc|xyz| This is second single line message.
2013/02/05 15:33:12.786|abc|xyz| This is a multiple
       line message continued
to many lines.
2013/02/05 15:33:12.945|abc|xyz| This is last single line message.

Thanks in advance for looking to it and helping out.

Try sth like this...

 
$ awk '/2013/{if(s){print s}s=$0}
!/2013/{s=s"_^_"$0}END{print s}' file3 | sort | sed 's/_\^_/\n/g'

2013/02/05 15:33:12.234|abc|xyz| This is first single line message.
2013/02/05 15:33:12.413|abc|xyz| This is second single line message.
2013/02/05 15:33:12.786|abc|xyz| This is a multiple
       line message continued
to many lines.
2013/02/05 15:33:12.945|abc|xyz| This is last single line message.

1 Like

Thanks Pamu....

I haven't tried your suggestion......
but just for understanding, what it does is -

  1. check the pattern 2013 .
  2. if the pattern is found, print it as such.
  3. If the pattern is not found, prefix each pf those lines with _^_, then sort and replace back.

pls correct if wrong.

so making it generic, i can use a 4 digit year pattern as well, so as not to restrict with 2013, and infact can use my time stamp prefix itself as pattern. right?

another,
does

{s=s"_^_"$0}END{print s}

takes care of newline replacement as well?

Thanks again for your time.

Hi gini32,
Reformatting pamu's script and adding line numbers for discussion purposes:

1 awk '
2 /2013/{if(s){print s}
3         s=$0}
4 !/2013/{s=s"_^_"$0}
5 END{    print s}
6 ' file3 | sort | sed 's/_\^_/\n/g'

Note that the line numbers cannot actually appear in your awk script; they are just to make this discussion easier.
The awk program is made up of the commands on lines 2 through 5.

Line 2 selects any line that contains the string 2013 and assumes that it is the 1st line of an entry. (If 2013 could appear anywhere other than at the start of a line, it would be safer to change /2013/ to /^2013/ so the line will be selected only if 2013 appears as the 1st four characters on the line.) The first time you get here, the variable s will be an empty string and the print command will not be executed.

Line 3 then sets s to the current input line.

Line 4 appends every line that does not contain the string 2013 to the end of the variable s using the string _^_ (rather than newline) as the output line separator. (If you change /2013/ to /^2013/ on line 2, you need to make the same change on line 4.
Lines 2-4 are then repeated until all lines have been read from the input file.

Line 5 prints the last line from the value accumulated in the variable s .

Line 6 specifies that the input file for the awk script is the file named file3 , sorts the output from awk, and then uses sed to change the _^_ line separators that were inserted by awk back into newline characters.

Note that this script assumes that the concatenated lines won't be longer than {LINE_MAX} bytes on your system. If this isn't true the script may fail because awk, sort, and sed are only guaranteed to work if input and output files being processed are text files (which, by definition, have lines no longer than {LINE_MAX} bytes including the terminating newline character. (You can find the value of {LINE_MAX} on your system by running the command:

getconf LINE_MAX

On systems that conform the POSIX or UNIX Standards, {LINE_MAX} must be at least 2048.

1 Like

Thanks a ton to both Pamu and Don Cragun.
That really helped.

(GNU) sort on Linux has a -z option to sort NULL delimited, multi-line records:

sed 's/^2013/\x0&/' log-file |sort -z |tr -d '\0'
2 Likes

Thanks binlib.
sort -z has got its worth.
But

tr -d '\0'

is not working to convert back null characters....but let me figure out myself.....

Thanks again.