concatenate log file lines up to timestamp

AlanC · May 28, 2009, 4:30pm

Hi,

Using sed awk or perl I am trying to do something similar to

but my requirement is slightly different. What I am trying to accomplish is to reformat a logfile such that all lines start with the timestamp line and any lines that do no start with a timestamp are appended to the last line with a timestamp. Optionally I would like to do this up to the first semicolon.

A simplified input would be somthing like this

2009-05-27 02:37:27.283 The quick
brown fox;
The quick
brown fox
2009-05-28 10:10:28.000 Mary
had a
little lamb.
2009-06-01 19:37:29.000 Jack and Jill ran up the hill;

and ideally the output would be

2009-05-27 02:37:27.283 The quick brown fox;
2009-05-28 10:10:28.000 Mary had a little lamb.
2009-06-01 19:37:29.000 Jack and Jill ran up the hill;

although this is also acceptable

2009-05-27 02:37:27.283 The quick brown fox; The quick brown fox
2009-05-28 10:10:28.000 Mary had a little lamb.
2009-06-01 19:37:29.000 Jack and Jill ran up the hill;

The log files can be up to 10MB in size and there can be a hundred lines or more between timestamps. The purpose of this is to format the file so that it can be loaded into a database.

Any suggestions/solutions would be greatly appreciated.

Thanks,
Alan

vgersh99 · May 28, 2009, 5:19pm

something to start with - adjust the date pattern as needed.

nawk -f alan.awk myFile

alan.awk:

BEGIN {
   PATdate="^[12][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]"
}
$0 ~ PATdate {printf("%c%s%c", (p)?ORS:"",$0, (/;$/)?ORS:"") ;p=(/;$/)?0:1;next}
p && /;$/ { p=0; print}
p {printf(" %s", $0)}

ghostdog74 · May 28, 2009, 9:10pm

if your system have Python,

#!/usr/bin/env python
fh=open("file")
s=""
f=0
for items in fh:
    items=items.strip()
    if f and  items.startswith("2009"):
        if ";" in s:
            ind=s.index(";")
            print s[:ind] #print from start till where ; is
        else:
            print s 
        s=""  
        f=0        
    if items.startswith("2009"): 
        f=1 #set flag        
        print items,
        continue
    if f and not items.startswith("2009"):
        # join up those lines that doesn't start with 2009
        s=s+items
fh.close() #close the file

output

# more file
2009-05-27 02:37:27.283 The quick
brown fox;
The quick
brown fox
2009-05-28 10:10:28.000 Mary
had a
little lamb.
2009-06-01 19:37:29.000 Jack and Jill ran up the hill;
adsf
sldkfdf
2009-05-28 10:10:28.000 Mary test
tester fmsd
2009-05-28 10:10:28.000

# ./test.py
2009-05-27 02:37:27.283 The quick brown fox
2009-05-28 10:10:28.000 Mary had alittle lamb.
2009-06-01 19:37:29.000 Jack and Jill ran up the hill; adsfsldkfdf
2009-05-28 10:10:28.000 Mary test tester fmsd
2009-05-28 10:10:28.000

summer_cherry · May 29, 2009, 1:28am

sed:

sed -n '/[0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}/ {
1 {
	h
}
1 !{
	x
	s/\n/ /g
	p
	$ {
		x
		p
	}
	$ !{
	d
	}
}
}
/[0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}/ !{
 H
}' a.txt

perl:

undef $/;
my $str=<DATA>;
$str=~s/\n/ /g;
$str=~s/(?<=.)(?=[0-9]{4}-[0-9]{2}-[0-9]{2})/\n/g;
print $str;
__DATA__
2009-05-27 02:37:27.283 The quick
brown fox;
The quick
brown fox
2009-05-28 10:10:28.000 Mary
had a
little lamb.
2009-06-01 19:37:29.000 Jack and Jill ran up the hill;

-----Post Update-----

sed:

sed -n '/[0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}/ {
1 {
	h
}
1 !{
	x
	s/\n/ /g
	p
	$ {
		x
		p
	}
	$ !{
	d
	}
}
}
/[0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}/ !{
 H
}' a.txt

perl:

undef $/;
my $str=<DATA>;
$str=~s/\n/ /g;
$str=~s/(?:(?<=.))(?:(?=[0-9]{4}-[0-9]{2}-[0-9]{2}))/\n/g;
print $str;
__DATA__
2009-05-27 02:37:27.283 The quick
brown fox;
The quick
brown fox
2009-05-28 10:10:28.000 Mary
had a
little lamb.
2009-06-01 19:37:29.000 Jack and Jill ran up the hill;

AlanC · June 29, 2009, 11:38am

Thank you all very much. All your responses were excellent. It seems like the awk or python examples will work best for me.

Thanks,
-Alan