I am trying to read an Oracle listener log file line by line and need to separate the lines into several fields. The field delimiter for the line happens to be an asterisk.
I have the script below to start with but when running it, the echo command is globbing it to include other information that I don't need.
I've also tried doing awk -F "\" and that does not make any difference besides giving the warning awk: warning: escape sequence `\' treated as plain `*'
I also need to somehow extract the line below to each respective fields, i.e. CONNECT_DATA, PROGRAM, USER, SERVER, SERVICE_NAME,HOST and PORT :(.
Here's wishing Oracle could have provided something to parse their own log. Maybe there is a program/script/utility out there that can parse log files of any format?
I will have to somehow change the timestamp to YYYYMMDD. For the time being, I need to be able to get around the asterisk globbing to start with.
You do not need to escape it in awk. With awk a single character field separator, that is not a space character, is not treated as a regular expression string, but as a literal character.
When double quoting $line , the * chars will be preserved, and your awk scripts will work. Did you consider reading the variables immediately with bash ?
The awk I have is actually gawk, see below. Didn't know that is the case.
OS is Red Hat Enterprise Linux Server release 5.11 (Tikanga)
The multple awk-s is 'coz I am trying to assign each field to a variable that I can further need to awk again :(. Not sure if I can just use one awk to assign them to multiple variables. Can I replace the multiple awks to just a single awk?
After extracting to the timestamp variable, I will be converting that to YYYYMMDD.
For the connectstring variable, I will need to further break that down to CONNECT_DATA, PROGRAM, USER, SERVER, SERVICE_NAME,HOST and PORT. Don't know how to do that yet. Trying to get around the asterisk problem for the time being.
$ cat z.ksh
#!/bin/ksh
LOG=x.out
while read line
do
echo "- Processing --> $line"
timestamp=`echo $line | awk -F"*" '{ print $1 }'`
connectstring=`echo $line | awk -F"*" '{ print $2 }'`
result=`echo $line | awk -F"*" '{ print $3 }'`
service=`echo $line | awk -F"*" '{ print $4 }'`
returncode=`echo $line | awk -F"*" '{ print $5 }'`
echo "- timestamp = $timestamp"
echo "- connectstring = $connectstring"
echo "- result = $result"
echo "- service = $service"
echo "- returncode = $returncode"
echo
done < $LOG
###########
# THE END #
###########
$ ./z.ksh
- Processing --> 15-DEC-2016 10:19:24 * (CONNECT_DATA=(CID=(PROGRAM=JDBC Thin Client)(HOST=__jdbc__)(USER=testuser))(SERVER=DEDICATED)(SERVICE_NAME=test_app.x.y.z)) * (ADDRESS=(PROTOCOL=tcp)(HOST=60.11.22.123)(PORT=55440)) * establish * test_app.x.y.z * 12666
- timestamp = 15-DEC-2016 10:19:24 x.out z.ksh (CONNECT_DATA=(CID=(PROGRAM=JDBC Thin Client)(HOST=__jdbc__)(USER=testuser))(SERVER=DEDICATED)(SERVICE_NAME=test_app.x.y.z)) x.out z.ksh (ADDRESS=(PROTOCOL=tcp)(HOST=60.11.22.123)(PORT=55440)) x.out z.ksh establish x.out z.ksh test_app.x.y.z x.out z.ksh 12666
- connectstring =
- result =
- service =
- returncode =
$ which awk
/bin/awk
$ ls -l /bin/*awk*
lrwxrwxrwx 1 root root 4 Feb 11 2013 /bin/awk -> gawk
-rwxr-xr-x 1 root root 338744 Jun 13 2012 /bin/gawk
-rwxr-xr-x 1 root root 3089 Jun 13 2012 /bin/igawk
-rwxr-xr-x 1 root root 338760 Jun 13 2012 /bin/pgawk
$
$ awk --version
GNU Awk 3.1.5
Copyright (C) 1989, 1991-2005 Free Software Foundation.
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
Trying awk -F"*" does work out fine from the command line as you mentioned. It is during the echo run that it is failing.
result = (ADDRESS=(PROTOCOL=tcp)(HOST=60.11.22.123)(PORT=55440))
Can you actually use awk's printf to wrap a field/column so that for example if a field is 50 characters i want it to print to 2 lines of 25 characters each each of which are printed as the second field?
Thanks a lot, I'll worry about the wrapping thing later on. I may not need it if I am deconstructing the connectstring further
I'll post the updated script after I've broken down the connectstring information/variable. I will be using cut this time, I'll post it in case there is a better approach to it.
Currently, script now look as below. Not sure if there is a better way to get around the case/esac thingy.
'final' script so far as below. Thanks to everyone's input especially RudiC
Perhaps some guidance on a 'cleaner' code, mainly on the multiple awk thingy and on the case/esac clause. Maybe there is a shorthand version of some sort that I can use
'final' code sort of as below.
Not sure if there is a shorthand way of doing the case/esac thing that converts the date to YYYY-MM-DD
Had to use multiple awks to further break down the connectstring. Not 'clean' but don't know o fany other way of doing it.
BTW, is there any way to get assign the 'original' line of text and using IFS="*" at the same time?
I want to preserve the original line somehow so if the RETURNCODE is > 0, I want to get the whole contents of the line. Currently as a workaround, I just combine all values of TS CS HOST RESULT SERVICE RETURNCODE
Hi the awk statements are indeed "not clean" but since they are inside a loop they also slow things down considerably.
Here is an alternative way that you could try using the shell's read statement, which should be much faster:
{
IFS=' -' read day month year timestamp
IFS='(=)' read x x x x x x program x x x x x user x x x x x service_name x
IFS='(=)' read x x x x app_protocol x x app_host x x app_port x
} << EOF
$TS
$CS
$HOST
EOF
The repeated x variables are used to contain the things that we do not need.
The IFS variable is local to the read command and it is used to specify the characters that separate the fields in a given string.
A "Here document" ( << EOF ) is used to emulate a file that contains the three variables TC , CS and HOST on a separate line.
The curly braces { .. } to together with << EOF are used to define a block of shell code that uses the here document as input.
---
Here is another way of writing it, using separate here documents, which may be preferable since it provides a cleaner look and maybe is a bit easier to read for the human eye:
IFS=' -' read day month year timestamp << EOF
$TS
EOF
IFS='(=)' read x x x x x x program x x x x x user x x x x x service_name x << EOF
$CS
EOF
IFS='(=)' read x x x x app_protocol x x app_host x x app_port x << EOF
$HOST
EOF