awk/sed to extract column bases on partial match

pkabali · April 7, 2011, 1:45am

Hi
I have a log file which has outputs like the one below

[2011-04-06T17:57:15.661-07:00]  [octetstring] [NOTIFICATION] [OVD-20044] [com.octetstring.accesslog] [tid: 15]  [ecid: 0000IwhTc8C8pmP_IdH7if1Dauw_000H5q,0] [arg: 24196] [arg: 1] [arg: 0]  [arg: 9] [arg: 3712] [arg: 0] [arg: 486183328] [arg: 2147483648] conn=24,196  op=1 RESULT err=0 tag=0 nentries=9 etime=3,712 dbtime=0  mem=486,183,328/2,147,483,648

Now most of the time I am only interested in the time ( the first column) and a column that begins with etime i.e etime=someNumber. This column seems to shift around so I cant do a simple awk based on the column number. Is there a way to search for the word etime and output the etime=Xyz?

Hope I communicated what I was trying to do.I basically need to extract the field based on a partial match.

Thanks

Franklin52 · April 7, 2011, 2:31am

For the time:

sed 's/^\[\([^]]*\)].*/\1/' file

For the etime:

sed 's/.*etime=\([^ ]*\) .*/\1/' file

Both separate by a space:

sed 's/^\[\([^]]*\)].*etime=\([^ ]*\) .*/\1 \2/' file

pkabali · April 7, 2011, 2:53am

Hi Franklin52
Thank for posting but the solution doesnt seem to be working. It is not only printing entire lines it is also printing lines without etimes.

I would require an output like

[2011-04-06T17:57:15.661-07:00] etime=3,712

or better yet

2011-04-06 17:57:15 etime=3,712

for lines with etimes, and nothing for lines without etimes

currently i am using

grep -etime filename | awk '{print $1 " " $32}'

But as I explained earlier the etime field moves around and hence I miss out on certain output.

Thnaks

Franklin52 · April 7, 2011, 3:52am

The solution works fine with the line of your first post, probably the other lines of the file are formatted differently.

pkabali · April 7, 2011, 4:09am

Hi Franklin
Could you please explain what you are doing with the statements? I am relatively new to sed and awk and would appreciate the help. This way I can try and figure out if the code you wrote needs to be modified in a certain way for it to work on my system or file.
Would it help if I attached a few more lines of logs?
Thanks

kurumi · April 7, 2011, 4:18am

you can use awk just fine. since the time you want seems to contain 2 ":", we will use that as the regex pattern to capture.

$ awk '{for(i=1;i<=NF;i++) if($i~/.*:.*:.*|etime/) {print $i} }' file
[2011-04-06T17:57:15.661-07:00]
etime=3,712

Or if you have Ruby(1.9+)

$ ruby -ane '$F.each{|x| puts x if x.count(":")>1 or x[/etime/]}' file
[2011-04-06T17:57:15.661-07:00]
etime=3,712

Peasant · April 7, 2011, 4:35am

awk '{ if ( match($0,/etime=?[0-9]*,?[0-9]+/)) print substr($1,2,10)" "substr($1,13,8), substr($0,RSTART,RLENGTH) }' filename

Assumes that $1 will always be string in that specific format.

[2011-04-06T17:57:15.661-07:00]

Regards.

pkabali · April 7, 2011, 4:36am

Hi Kurumi
Thanks for your reply.could you please explain your code? Also is it possible to print the time and etime on the same line?

does the ~/etime/ mean anything which contains an etime?
I managed to hack together a code, that works ( based on the fact the time is always column 1 and ~/etime/ from your code ) but it would be really useful if you could explain the logic behind the script you created.

Here is what i ended up with

awk '{for(i=1;i<=NF;i++) if($i~/etime/) {print $1 " " $i} }' testaccess.log

kurumi · April 7, 2011, 7:03am

go through the columns, check for "etime", if found, print column 1 and the etime colum. NF means number of fields