String has * as the field delimiter and I need echo/awk to escape it, how?

newbie_01 · January 6, 2017, 10:38am

Hi,

I am trying to read an Oracle listener log file line by line and need to separate the lines into several fields. The field delimiter for the line happens to be an asterisk.

I have the script below to start with but when running it, the echo command is globbing it to include other information that I don't need.

Below is a sample run of the script z.ksh

$
$ ls -altr
total 16
drwxr-xr-x 3 oracle oinstall 4096 Jan  7 03:59 ..
-rw-r--r-- 1 oracle oinstall  243 Jan  7 04:03 x.out
-rwxr--r-- 1 oracle oinstall  586 Jan  7 04:12 z.ksh
drwxr-xr-x 2 oracle oinstall 4096 Jan  7 04:12 .
$ ./z.ksh
- Processing  --> 15-DEC-2016 10:19:24 * (CONNECT_DATA=(CID=(PROGRAM=JDBC Thin Client)(HOST=__jdbc__)(USER=testuser))(SERVER=DEDICATED)(SERVICE_NAME=test_app.x.y.z)) * (ADDRESS=(PROTOCOL=tcp)(HOST=60.11.22.123)(PORT=55440)) * establish * test_app.x.y.z * 12666
- timestamp = 15-DEC-2016 10:19:24 x.out z.ksh (CONNECT_DATA=(CID=(PROGRAM=JDBC Thin Client)(HOST=__jdbc__)(USER=testuser))(SERVER=DEDICATED)(SERVICE_NAME=test_app.x.y.z)) x.out z.ksh (ADDRESS=(PROTOCOL=tcp)(HOST=60.11.22.123)(PORT=55440)) x.out z.ksh establish x.out z.ksh test_app.x.y.z x.out z.ksh 12666
- connectstring =
- result =
- service =
- returncode =

$ cat z.ksh
#!/bin/ksh

LOG=x.out

while read line
do
   echo "- Processing  --> $line"
   timestamp=`echo $line | awk -F"[*]" '{ print $1 }'`
   connectstring=`echo $line | awk -F"[*]" '{ print $2 }'`
   result=`echo $line | awk -F"[*]" '{ print $3 }'`
   service=`echo $line | awk -F"[*]" '{ print $4 }'`
   returncode=`echo $line | awk -F"[*]" '{ print $5 }'`

   echo "- timestamp = $timestamp"
   echo "- connectstring = $connectstring"
   echo "- result = $result"
   echo "- service = $service"
   echo "- returncode = $returncode"

   echo
done < $LOG

###########
# THE END #
###########

I've also tried doing awk -F "\" and that does not make any difference besides giving the warning awk: warning: escape sequence `\' treated as plain `*'

I also need to somehow extract the line below to each respective fields, i.e. CONNECT_DATA, PROGRAM, USER, SERVER, SERVICE_NAME,HOST and PORT :(.

(CONNECT_DATA=(CID=(PROGRAM=JDBC Thin Client)(HOST=__jdbc__)(USER=testuser))(SERVER=DEDICATED)(SERVICE_NAME=test_app.x.y.z)) * (ADDRESS=(PROTOCOL=tcp)(HOST=60.11.22.123)(PORT=55440))

Here's wishing Oracle could have provided something to parse their own log. Maybe there is a program/script/utility out there that can parse log files of any format?

I will have to somehow change the timestamp to YYYYMMDD. For the time being, I need to be able to get around the asterisk globbing to start with.

Can't install Splunk/logstash unfortunately.

Any advice much appreciated. Thanks in advance.

vgersh99 · January 6, 2017, 10:44am

-F'*' worked for gawk. What OS are you on?
try echo "$line"
Why do you need so many awk-s? Cannot you do it with just one?

Scrutinizer · January 6, 2017, 10:49am

You do not need to escape it in awk. With awk a single character field separator, that is not a space character, is not treated as a regular expression string, but as a literal character.

An extended regular expression can be used to separate fields by assigning a string containing the expression to the built-in variable FS, either directly or as a consequence of using the -F sepstring option. The default value of the FS variable shall be a single <space>. The following describes FS behavior:

If FS is a null string, the behavior is unspecified.

If FS is a single character:

a. If FS is <space>, skip leading and trailing <blank> and <newline> characters; fields shall be delimited by sets of one or more <blank> or <newline> characters.

b. Otherwise, if FS is any other character c, fields shall be delimited by each single occurrence of c.

Otherwise, the string value of FS shall be considered to be an extended regular expression. Each occurrence of a sequence matching the extended regular expression shall delimit fields.

awk:regular expressions

RudiC · January 6, 2017, 10:54am

When double quoting $line , the * chars will be preserved, and your awk scripts will work. Did you consider reading the variables immediately with bash ?

while IFS="*" read TS CS RS SV RC REST
  do    echo "- timestamp = $TS"
        echo "- connectstring = $CS"
        echo "- result = $RS"
        echo "- service = $SV"
        echo "- returncode = $RC"
  done <  $LOG
- timestamp = 15-DEC-2016 10:19:24 
- connectstring =  (CONNECT_DATA=(CID=(PROGRAM=JDBC Thin Client)(HOST=__jdbc__)(USER=testuser))(SERVER=DEDICATED)(SERVICE_NAME=test_app.x.y.z)) 
- result =  (ADDRESS=(PROTOCOL=tcp)(HOST=60.11.22.123)(PORT=55440)) 
- service =  establish 
- returncode =  test_app.x.y.z

EDIT: Can't you, BTW, make ORACLE use other delimiters?

newbie_01 · January 6, 2017, 11:39am

Hi,

Thanks for your reply.

The awk I have is actually gawk, see below. Didn't know that is the case.
OS is Red Hat Enterprise Linux Server release 5.11 (Tikanga)
The multple awk-s is 'coz I am trying to assign each field to a variable that I can further need to awk again :(. Not sure if I can just use one awk to assign them to multiple variables. Can I replace the multiple awks to just a single awk?

After extracting to the timestamp variable, I will be converting that to YYYYMMDD.

For the connectstring variable, I will need to further break that down to CONNECT_DATA, PROGRAM, USER, SERVER, SERVICE_NAME,HOST and PORT. Don't know how to do that yet. Trying to get around the asterisk problem for the time being.

 $ cat z.ksh
#!/bin/ksh

LOG=x.out

while read line
do
   echo "- Processing  --> $line"
   timestamp=`echo $line | awk -F"*" '{ print $1 }'`
   connectstring=`echo $line | awk -F"*" '{ print $2 }'`
   result=`echo $line | awk -F"*" '{ print $3 }'`
   service=`echo $line | awk -F"*" '{ print $4 }'`
   returncode=`echo $line | awk -F"*" '{ print $5 }'`

   echo "- timestamp = $timestamp"
   echo "- connectstring = $connectstring"
   echo "- result = $result"
   echo "- service = $service"
   echo "- returncode = $returncode"

   echo
done < $LOG

###########
# THE END #
###########

 $ ./z.ksh
- Processing  --> 15-DEC-2016 10:19:24 * (CONNECT_DATA=(CID=(PROGRAM=JDBC Thin Client)(HOST=__jdbc__)(USER=testuser))(SERVER=DEDICATED)(SERVICE_NAME=test_app.x.y.z)) * (ADDRESS=(PROTOCOL=tcp)(HOST=60.11.22.123)(PORT=55440)) * establish * test_app.x.y.z * 12666
- timestamp = 15-DEC-2016 10:19:24 x.out z.ksh (CONNECT_DATA=(CID=(PROGRAM=JDBC Thin Client)(HOST=__jdbc__)(USER=testuser))(SERVER=DEDICATED)(SERVICE_NAME=test_app.x.y.z)) x.out z.ksh (ADDRESS=(PROTOCOL=tcp)(HOST=60.11.22.123)(PORT=55440)) x.out z.ksh establish x.out z.ksh test_app.x.y.z x.out z.ksh 12666
- connectstring =
- result =
- service =
- returncode =

 $ which awk
/bin/awk
 $ ls -l /bin/*awk*
lrwxrwxrwx 1 root root      4 Feb 11  2013 /bin/awk -> gawk
-rwxr-xr-x 1 root root 338744 Jun 13  2012 /bin/gawk
-rwxr-xr-x 1 root root   3089 Jun 13  2012 /bin/igawk
-rwxr-xr-x 1 root root 338760 Jun 13  2012 /bin/pgawk
 $
$ awk --version
GNU Awk 3.1.5
Copyright (C) 1989, 1991-2005 Free Software Foundation.

This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.

Trying awk -F"*" does work out fine from the command line as you mentioned. It is during the echo run that it is failing.

$ awk -F"*" '{ printf "%-20s %-40s %-15s %-10s %-10s \n", $1 , $2 , $3 , $4 , $5 }' x.out
15-DEC-2016 10:19:24   (CONNECT_DATA=(CID=(PROGRAM=JDBC Thin Client)(HOST=__jdbc__)(USER=testuser))(SERVER=DEDICATED)(SERVICE_NAME=test_app.x.y.z))   (ADDRESS=(PROTOCOL=tcp)(HOST=60.11.22.123)(PORT=55440))   establish   test_app.x.y.z

$ awk -F"*" '{ print $2, $3 }' x.out
 (CONNECT_DATA=(CID=(PROGRAM=JDBC Thin Client)(HOST=__jdbc__)(USER=testuser))(SERVER=DEDICATED)(SERVICE_NAME=test_app.x.y.z))   (ADDRESS=(PROTOCOL=tcp)(HOST=60.11.22.123)(PORT=55440))

$ export x=`head -1 x.out`
$ echo $x
15-DEC-2016 10:19:24 x.out z.ksh (CONNECT_DATA=(CID=(PROGRAM=JDBC Thin Client)(HOST=__jdbc__)(USER=testuser))(SERVER=DEDICATED)(SERVICE_NAME=test_app.x.y.z)) x.out z.ksh (ADDRESS=(PROTOCOL=tcp)(HOST=60.11.22.123)(PORT=55440)) x.out z.ksh establish x.out z.ksh test_app.x.y.z x.out z.ksh 12666
$ echo $x | awk -F"*" '{ print $2 }'

$ echo $x | awk -F"*" '{ print $0 }'
15-DEC-2016 10:19:24 x.out z.ksh (CONNECT_DATA=(CID=(PROGRAM=JDBC Thin Client)(HOST=__jdbc__)(USER=testuser))(SERVER=DEDICATED)(SERVICE_NAME=test_app.x.y.z)) x.out z.ksh (ADDRESS=(PROTOCOL=tcp)(HOST=60.11.22.123)(PORT=55440)) x.out z.ksh establish x.out z.ksh test_app.x.y.z x.out z.ksh 12666
$ echo $x | awk -F"*" '{ print $1 }'
15-DEC-2016 10:19:24 x.out z.ksh (CONNECT_DATA=(CID=(PROGRAM=JDBC Thin Client)(HOST=__jdbc__)(USER=testuser))(SERVER=DEDICATED)(SERVICE_NAME=test_app.x.y.z)) x.out z.ksh (ADDRESS=(PROTOCOL=tcp)(HOST=60.11.22.123)(PORT=55440)) x.out z.ksh establish x.out z.ksh test_app.x.y.z x.out z.ksh 12666
$ echo $x | awk -F"*" '{ print $3 }'

Can you actually use awk to wrap a field so that the string below ...

15-DEC-2016 10:19:24   (CONNECT_DATA=(CID=(PROGRAM=JDBC Thin Client)(HOST=__jdbc__)(USER=testuser))(SERVER=DEDICATED)(SERVICE_NAME=test_app.x.y.z))   (ADDRESS=(PROTOCOL=tcp)(HOST=60.11.22.123)(PORT=55440))   establish   test_app.x.y.z

Can be printed to be as below?

15-DEC-2016 10:19:24   (CONNECT_DATA=(CID=(PROGRAM=JDBC Thin Client)(HOST=__jdbc__)					establish   test_app.x.y.z
                       (USER=testuser))(SERVER=DEDICATED)(SERVICE_NAME=test_app.x.y.z))   
		       (ADDRESS=(PROTOCOL=tcp)(HOST=60.11.22.123)(PORT=55440))

---------- Post updated at 11:26 AM ---------- Previous update was at 11:20 AM ----------

Hi,

Unfortunately, can't get ORACLE to use a different delimiter.

---------- Post updated at 11:39 AM ---------- Previous update was at 11:26 AM ----------

Thanks a lot RudiC, I am using the code that you posted at the moment and that works just fine at the moment.

Now I need to break down the following strings further?

connectstring = (CONNECT_DATA=(CID=(PROGRAM=JDBC Thin Client)(HOST=__jdbc__)(USER=testuser))(SERVER=DEDICATED)(SERVICE_NAME=test_app.x.y.z))
result = (ADDRESS=(PROTOCOL=tcp)(HOST=60.11.22.123)(PORT=55440))

Can you actually use awk's printf to wrap a field/column so that for example if a field is 50 characters i want it to print to 2 lines of 25 characters each each of which are printed as the second field?

rudic:

When double quoting $line , the * chars will be preserved, and your awk scripts will work. Did you consider reading the variables immediately with bash ?

while IFS="*" read TS CS RS SV RC REST
  do    echo "- timestamp = $TS"
   echo "- connectstring = $CS"
   echo "- result = $RS"
   echo "- service = $SV"
   echo "- returncode = $RC"
  done <  $LOG
- timestamp = 15-DEC-2016 10:19:24 
- connectstring =  (CONNECT_DATA=(CID=(PROGRAM=JDBC Thin Client)(HOST=__jdbc__)(USER=testuser))(SERVER=DEDICATED)(SERVICE_NAME=test_app.x.y.z)) 
- result =  (ADDRESS=(PROTOCOL=tcp)(HOST=60.11.22.123)(PORT=55440)) 
- service =  establish 
- returncode =  test_app.x.y.z

EDIT: Can't you, BTW, make ORACLE use other delimiters?

RudiC · January 6, 2017, 11:46am

And, I guess, you don't want to break at 25 chars but at the nearest parenthesis?

I don't know of such a function in awk ; you need to program it step by step yourself. Mayhap perl provides sth. alike?

RudiC · January 6, 2017, 12:03pm

Something along this line?

awk -F\* '
        {while (length($2) > 0) {T[++C] = substr ($2, 1, 20)
                                 $2 = substr ($2, 21)
                                }
         printf "%20s\t%20s\t%s\t%s\n", $1, T[1], $3, $4
         for (i=2; i<=C; i++) printf "%20s\t%20s\n", "", T
        }
' file
15-DEC-2016 10:19:24 	 (CONNECT_DATA=(CID=	 (ADDRESS=(PROTOCOL=tcp)(HOST=60.11.22.123)(PORT=55440)) 	 establish 
                    	(PROGRAM=JDBC Thin C
                    	lient)(HOST=__jdbc__
                    	)(USER=testuser))(SE
                    	RVER=DEDICATED)(SERV
                    	ICE_NAME=test_app.x.
                    	              y.z))

newbie_01 · January 6, 2017, 12:15pm

Hi RudiC

Thanks a lot, I'll worry about the wrapping thing later on. I may not need it if I am deconstructing the connectstring further

I'll post the updated script after I've broken down the connectstring information/variable. I will be using cut this time, I'll post it in case there is a better approach to it.

Currently, script now look as below. Not sure if there is a better way to get around the case/esac thingy.

$ ./z2.ksh
- timestamp = 15-DEC-2016 10:19:24  // 2016-12-15 10:19:24
- connectstring =  (CONNECT_DATA=(CID=(PROGRAM=JDBC Thin Client)(HOST=__jdbc__)(USER=testuser))(SERVER=DEDICATED)(SERVICE_NAME=test_app.x.y.z))
- host =  (ADDRESS=(PROTOCOL=tcp)(HOST=60.11.22.123)(PORT=55440))
- result =  establish
- service =  test_app.x.y.z
- returncode =  12666
-------------------------------------------------------------

$ cat z2.ksh
#!/bin/ksh

#15-DEC-2016 10:19:24 * (CONNECT_DATA=(CID=(PROGRAM=JDBC Thin Client)(HOST=__jdbc__)(USER=testuser))(SERVER=DEDICATED)(SERVICE_NAME=test_app.x.y.z)) * (ADDRESS=(PROTOCOL=tcp)(HOST=60.11.22.123)(PORT=55440)) * establish * test_app.x.y.z * 12666

#LOG=y.out
LOG=x.out

while IFS="*" read TS CS HOST RESULT SERVICE RETURNCODE
do
   timestamp=`echo $TS | awk '{ print $2 }'`
   year=`echo $TS | awk '{ print $1 }' | awk -F"-" '{ print $3 }'`
   day=`echo $TS | awk '{ print $1 }' | awk -F"-" '{ print $1 }'`
   month=`echo $TS | awk '{ print $1 }' | awk -F"-" '{ print $2 }'`

   case $month in
      "JAN" ) mm="01" ;;
      "FEB" ) mm="02" ;;
      "MAR" ) mm="03" ;;
      "APR" ) mm="04" ;;
      "MAY" ) mm="05" ;;
      "JUN" ) mm="06" ;;
      "JUL" ) mm="07" ;;
      "AUG" ) mm="08" ;;
      "SEP" ) mm="09" ;;
      "OCT" ) mm="10" ;;
      "NOV" ) mm="11" ;;
      "DEC" ) mm="12" ;;
   esac
   TS2="$year-$mm-$day $timestamp"

   echo "- timestamp = $TS // $TS2"
   echo "- connectstring = $CS"
   echo "- host = $HOST"
   echo "- result = $RESULT"
   echo "- service = $SERVICE"
   echo "- returncode = $RETURNCODE"
   echo "-------------------------------------------------------------"
   echo
done <  $LOG

###########
# THE END #
###########

$

RudiC · January 6, 2017, 12:31pm

Wrapping taken a bit further:

awk -F\* '
function splitstring(FLD, ARR,          C)
        {while (length(FLD) > 0)        {ARR[++C] = substr (FLD, 1, FL)
                                         FLD = substr (FLD, FL + 1)
                                        }
         return C
        }
NR == 1 {FMT = "%20s\t%-" FL "s\t%-" FL "s\t%s\n"
        }
        {C1 = splitstring($2, T1)
         C2 = splitstring($3, T2)
         C  = (C1>C2)?C1:C2 
         for (i=1; i<=C; i++)   {printf FMT, $1, T1, T2, $4
                                 $1 = $3 = $4 = ""
                                }
        }
' FL=25 file
15-DEC-2016 10:19:24 	 (CONNECT_DATA=(CID=(PROG	 (ADDRESS=(PROTOCOL=tcp)(	 establish 
                    	RAM=JDBC Thin Client)(HOS	HOST=60.11.22.123)(PORT=5	
                    	T=__jdbc__)(USER=testuser	5440)) 	
                    	))(SERVER=DEDICATED)(SERV		
                    	ICE_NAME=test_app.x.y.z))

newbie_01 · January 6, 2017, 12:44pm

Hi,

'final' script so far as below. Thanks to everyone's input especially RudiC

Perhaps some guidance on a 'cleaner' code, mainly on the multiple awk thingy and on the case/esac clause. Maybe there is a shorthand version of some sort that I can use

Script works as I wanted it to be nonetheless.

$ ./z2.ksh
- timestamp = 15-DEC-2016 10:19:24
  TS2 = 2016-12-15 10:19:24
- connectstring =  (CONNECT_DATA=(CID=(PROGRAM=JDBC Thin Client)(HOST=__jdbc__)(USER=testuser))(SERVER=DEDICATED)(SERVICE_NAME=test_app.x.y.z))
  program = JDBC Thin Client
  user = testuser
  service_name = test_app.x.y.z
- host =  (ADDRESS=(PROTOCOL=tcp)(HOST=60.11.22.123)(PORT=55440))
  app_protocol = tcp
  app_host = 60.11.22.123
  app_port = 55440
- result =  establish
- service =  test_app.x.y.z
- returncode =  12666
-------------------------------------------------------------

$ cat z2.ksh
#!/bin/ksh

#15-DEC-2016 10:19:24 * (CONNECT_DATA=(CID=(PROGRAM=JDBC Thin Client)(HOST=__jdbc__)(USER=testuser))(SERVER=DEDICATED)(SERVICE_NAME=test_app.x.y.z)) * (ADDRESS=(PROTOCOL=tcp)(HOST=60.11.22.123)(PORT=55440)) * establish * test_app.x.y.z * 12666

#LOG=y.out
LOG=x.out

while IFS="*" read TS CS HOST RESULT SERVICE RETURNCODE
do
   timestamp=`echo $TS | awk '{ print $2 }'`
   year=`echo $TS | awk '{ print $1 }' | awk -F"-" '{ print $3 }'`
   day=`echo $TS | awk '{ print $1 }' | awk -F"-" '{ print $1 }'`
   month=`echo $TS | awk '{ print $1 }' | awk -F"-" '{ print $2 }'`

   case $month in
      "JAN" ) mm="01" ;;
      "FEB" ) mm="02" ;;
      "MAR" ) mm="03" ;;
      "APR" ) mm="04" ;;
      "MAY" ) mm="05" ;;
      "JUN" ) mm="06" ;;
      "JUL" ) mm="07" ;;
      "AUG" ) mm="08" ;;
      "SEP" ) mm="09" ;;
      "OCT" ) mm="10" ;;
      "NOV" ) mm="11" ;;
      "DEC" ) mm="12" ;;
   esac
   TS2="$year-$mm-$day $timestamp"

  program=`echo $CS | awk -F"(" '{ print $4 }' | awk -F"=" '{ print $2 }' | awk -F")" '{ print $1}'`
   user=`echo $CS | awk -F"(" '{ print $6 }' | awk -F"=" '{ print $2 }' | awk -F")" '{ print $1}'`
   service_name=`echo $CS | awk -F"(" '{ print $8 }' | awk -F"=" '{ print $2 }' | awk -F")" '{ print $1}'`

   app_protocol=`echo $HOST | awk -F"(" '{ print $3 }' | awk -F"=" '{ print $2 }' | awk -F")" '{ print $1}'`
   app_host=`echo $HOST | awk -F"(" '{ print $4 }' | awk -F"=" '{ print $2 }' | awk -F")" '{ print $1}'`
   app_port=`echo $HOST | awk -F"(" '{ print $5 }' | awk -F"=" '{ print $2 }' | awk -F")" '{ print $1}'`
   echo "- timestamp = $TS"
   echo "  TS2 = $TS2"
   echo "- connectstring = $CS"
   echo "  program = $program"
   echo "  user = $user"
   echo "  service_name = $service_name"
   echo "- host = $HOST"
   echo "  app_protocol = $app_protocol"
   echo "  app_host = $app_host"
   echo "  app_port = $app_port"
   echo "- result = $RESULT"
   echo "- service = $SERVICE"
   echo "- returncode = $RETURNCODE"
   echo "-------------------------------------------------------------"
   echo
done <  $LOG

###########
# THE END #
###########

$

RudiC · January 6, 2017, 12:53pm

Try

M=...JANFEBMARAPRMAYJUNJULAUGSEPOCTNOVDEC
TMP=${M%${TS:3:3}*}
echo ${TS:7:4}-$((${#TMP}/3))-${TS:0:2} ${TS:12}
2016-12-15 10:19:24

Please be aware that some serious error handling needs to be added.

newbie_01 · January 6, 2017, 9:41pm

Hi RudiC

'final' code sort of as below.
Not sure if there is a shorthand way of doing the case/esac thing that converts the date to YYYY-MM-DD

Had to use multiple awks to further break down the connectstring. Not 'clean' but don't know o fany other way of doing it.

BTW, is there any way to get assign the 'original' line of text and using IFS="*" at the same time?

I want to preserve the original line somehow so if the RETURNCODE is > 0, I want to get the whole contents of the line. Currently as a workaround, I just combine all values of TS CS HOST RESULT SERVICE RETURNCODE

$ ./z2.ksh
- timestamp = 15-DEC-2016 10:19:24
  TS2 = 2016-12-15 10:19:24
- connectstring =  (CONNECT_DATA=(CID=(PROGRAM=JDBC Thin Client)(HOST=__jdbc__)(USER=testuser))(SERVER=DEDICATED)(SERVICE_NAME=test_app.x.y.z))
  program = JDBC Thin Client
  user = testuser
  service_name = test_app.x.y.z
- host =  (ADDRESS=(PROTOCOL=tcp)(HOST=60.11.22.123)(PORT=55440))
  app_protocol = tcp
  app_host = 60.11.22.123
  app_port = 55440
- result =  establish
- service =  test_app.x.y.z
- returncode =  12666
-------------------------------------------------------------

$ cat z2.ksh
#!/bin/ksh

#15-DEC-2016 10:19:24 * (CONNECT_DATA=(CID=(PROGRAM=JDBC Thin Client)(HOST=__jdbc__)(USER=testuser))(SERVER=DEDICATED)(SERVICE_NAME=test_app.x.y.z)) * (ADDRESS=(PROTOCOL=tcp)(HOST=60.11.22.123)(PORT=55440)) * establish * test_app.x.y.z * 12666

#LOG=y.out
LOG=x.out

while IFS="*" read TS CS HOST RESULT SERVICE RETURNCODE
do
   timestamp=`echo $TS | awk '{ print $2 }'`
   year=`echo $TS | awk '{ print $1 }' | awk -F"-" '{ print $3 }'`
   day=`echo $TS | awk '{ print $1 }' | awk -F"-" '{ print $1 }'`
   month=`echo $TS | awk '{ print $1 }' | awk -F"-" '{ print $2 }'`

   case $month in
      "JAN" ) mm="01" ;;
      "FEB" ) mm="02" ;;
      "MAR" ) mm="03" ;;
      "APR" ) mm="04" ;;
      "MAY" ) mm="05" ;;
      "JUN" ) mm="06" ;;
      "JUL" ) mm="07" ;;
      "AUG" ) mm="08" ;;
      "SEP" ) mm="09" ;;
      "OCT" ) mm="10" ;;
      "NOV" ) mm="11" ;;
      "DEC" ) mm="12" ;;
   esac
   TS2="$year-$mm-$day $timestamp"

   program=`echo $CS | awk -F"(" '{ print $4 }' | awk -F"=" '{ print $2 }' | awk -F")" '{ print $1}'`
   user=`echo $CS | awk -F"(" '{ print $6 }' | awk -F"=" '{ print $2 }' | awk -F")" '{ print $1}'`
   service_name=`echo $CS | awk -F"(" '{ print $8 }' | awk -F"=" '{ print $2 }' | awk -F")" '{ print $1}'`

   app_protocol=`echo $HOST | awk -F"(" '{ print $3 }' | awk -F"=" '{ print $2 }' | awk -F")" '{ print $1}'`
   app_host=`echo $HOST | awk -F"(" '{ print $4 }' | awk -F"=" '{ print $2 }' | awk -F")" '{ print $1}'`
   app_port=`echo $HOST | awk -F"(" '{ print $5 }' | awk -F"=" '{ print $2 }' | awk -F")" '{ print $1}'`

   echo "- timestamp = $TS"
   echo "  TS2 = $TS2"
   echo "- connectstring = $CS"
   echo "  program = $program"
   echo "  user = $user"
   echo "  service_name = $service_name"
   echo "- host = $HOST"
   echo "  app_protocol = $app_protocol"
   echo "  app_host = $app_host"
   echo "  app_port = $app_port"
   echo "- result = $RESULT"
   echo "- service = $SERVICE"
   echo "- returncode = $RETURNCODE"
   echo "-------------------------------------------------------------"
   echo
done <  $LOG

###########
# THE END #
###########

$

Scrutinizer · January 7, 2017, 2:16am

Hi the awk statements are indeed "not clean" but since they are inside a loop they also slow things down considerably.

Here is an alternative way that you could try using the shell's read statement, which should be much faster:

{
  IFS=' -'  read day month year timestamp
  IFS='(=)' read x x x x x x program x x x x x user x x x x x service_name x
  IFS='(=)' read x x x x app_protocol x x app_host x x app_port x
} << EOF
$TS
$CS
$HOST
EOF

The repeated x variables are used to contain the things that we do not need.
The IFS variable is local to the read command and it is used to specify the characters that separate the fields in a given string.
A "Here document" ( << EOF ) is used to emulate a file that contains the three variables TC , CS and HOST on a separate line.
The curly braces { .. } to together with << EOF are used to define a block of shell code that uses the here document as input.

---
Here is another way of writing it, using separate here documents, which may be preferable since it provides a cleaner look and maybe is a bit easier to read for the human eye:

IFS=' -'  read day month year timestamp << EOF
$TS
EOF

IFS='(=)' read x x x x x x program x x x x x user x x x x x service_name x << EOF
$CS
EOF

IFS='(=)' read x x x x app_protocol x x app_host x x app_port x << EOF
$HOST
EOF

RudiC · January 7, 2017, 7:29am

Why not, then,

while IFS='*-=)(' read DD MON YRTMS _ _ _ _ _ _ PRG _ _ _ _ _ USR _ _ _ _ _ _ SVC _ _ _ _ _ _ PTC _ _ HOST _ _ PRT _ _ RES SVC2 RET
   do   TMP=${M%$MON*}
        echo $DD
        echo $((${#TMP}/3))
        echo ${YRTMS%% *}
        echo ${YRTMS#* }
        echo $PRG 
        echo $USR 
        echo $SVC 
        echo $PTC 
        echo $HOST 
        echo $PRT 
        echo $RES 
        echo $SVC2 
        echo $RET
    done <  file
15
12
2016
10:19:24
JDBC Thin Client
testuser
test_app.x.y.z
tcp
60.11.22.123
55440
establish
test_app.x.y.z
12666