Seperate complicated fields with awk

sdohn · January 27, 2009, 10:08am

Hello, I want to separate fields from an log output like this:

11-JUL-2008 23:14:25 * (CONNECT_DATA=(SERVICE_NAME=WUMMER.IM.HERE.EXELLENT.COM)(CID=(PROGRAM=D:\oracle\product\10.2.0\client_1\jdk\jre\bin\java.exe)(HOST=X900005199)(USER=FTET1))) * (ADDRESS=(PROTOCOL=tcp)(HOST=45.137.251.223)(PORT=2196)) * establish * WUMMER.IM.HERE.EXELLENT.COM * 0
11-JUL-2008 23:20:20 * (CONNECT_DATA=(SID=P1VPMHAM)(CID=(PROGRAM=)(HOST=__jdbc__)(USER=))) * (ADDRESS=(PROTOCOL=tcp)(HOST=133.52.24.148)(PORT=1462)) * establish * WUMMER * 0

into:

$1 = 11-JUL-2008 23:14:25
$2 = (CONNECT_DATA=(SERVICE_NAME=WUMMER.IM.HERE.EXELLENT.COM)
$3= (CID=(PROGRAM=D:\oracle\product\10.2.0\client_1\jdk\jre\bin\java.exe)
$4= (HOST=X900005199)
$5= (USER=FTET1)
$6= (ADDRESS=(PROTOCOL=tcp)
$7= (HOST=45.137.251.223)
$8= (PORT=2196)

I've tried to play with the FS seperator with mixed results:
awk -F'(*[^(]*)' '{ print $1 " " $2 " " $3 }' listener.log

Anyone an idea for me, I think i need the correct regular expression.

joeyg · January 27, 2009, 10:37am

What I did was replace any ( with ~( so I could use the ~ as a delimiter.

> cat file149
11-JUL-2008 23:14:25 * (CONNECT_DATA=(SERVICE_NAME=WUMMER.IM.HERE.EXELLENT.COM)(CID=(PROGRAM=D:\oracle\product\10.2.0\clien t_1\jdk\jre\bin\java.exe)(HOST=X900005199)(USER=FTET1))) * (ADDRESS=(PROTOCOL=tcp)(HOST=45.137.251.223)(PORT=2196)) * establish * WUMMER.IM.HERE.EXELLENT.COM * 0
11-JUL-2008 23:20:20 * (CONNECT_DATA=(SID=P1VPMHAM)(CID=(PROGRAM=)(HOST=__jdbc__)(USER=))) * (ADDRESS=(PROTOCOL=tcp)(HOST=133.52.24.148)(PORT=1462)) * establish * WUMMER * 0

> sed "s/(/~(/g" <file149 >file149.a
> awk -F"~" '{print "1="$1,"\n2="$2$3,"\n3="$4$5,"\n4="$6,"\n5="$7"\n"}' file149.a
1=11-JUL-2008 23:14:25 *  
2=(CONNECT_DATA=(SERVICE_NAME=WUMMER.IM.HERE.EXELLENT.COM) 
3=(CID=(PROGRAM=D:\oracle\product\10.2.0\clien t_1\jdk\jre\bin\java.exe) 
4=(HOST=X900005199) 
5=(USER=FTET1))) * 

1=11-JUL-2008 23:20:20 *  
2=(CONNECT_DATA=(SID=P1VPMHAM) 
3=(CID=(PROGRAM=) 
4=(HOST=__jdbc__) 
5=(USER=))) *

quirkasaurus · January 27, 2009, 10:44am

Forget regular expressions. That isn't going to happen.
What you should probably do... is explain what you eventually want to do with
the variables. My initial questions are:
why awk?
why do they have to be in positions $1 through $8?
Once there, what do you want to do with them?
My point is -- the end result is what you're after -- hopefully -- not
whether we can put them in positions 1 through 8 for awk to do something with.
However, taking this nasty log file and converting it to your whims, like so:

cat << EOF |
11-JUL-2008 23:14:25 * (CONNECT_DATA=(SERVICE_NAME=WUMMER.IM.HERE.EXELLENT.COM)(CID=(PROGRAM=D:\oracle\product\10.2.0\client_1\jdk\jre\bin\java.exe)(HOST=X900005199)(USER=FTET1))) * (ADDRESS=(PROTOCOL=tcp)(HOST=45.137.251.223)(PORT=2196)) * establish * WUMMER.IM.HERE.EXELLENT.COM * 0
11-JUL-2008 23:20:20 * (CONNECT_DATA=(SID=P1VPMHAM)(CID=(PROGRAM=)(HOST=__jdbc__)(USER=))) * (ADDRESS=(PROTOCOL=tcp)(HOST=133.52.24.148)(PORT=1462)) * establish * WUMMER * 0
EOF
###---------------------------------------
### retain space for date, removed later on
###---------------------------------------
sed -e 's/ /@/' \
-e 's/)/) /g' \
|
###---------------------------------------
### convert all spaces to newlines
###---------------------------------------
tr ' ' '\012' |
###---------------------------------------
### delete blank lines, asterisk only lines and parenthise only lines
###---------------------------------------
sed -e '/^$/d' \
-e '/^\*/d' \
-e '/^)$/d' \
|
###---------------------------------------
### some line numbering...
###---------------------------------------
nl -nln |
###---------------------------------------
### grab only the 1-8 "fields"
###---------------------------------------
grep '^[1-8] ' |
###---------------------------------------
### convert to one line
###---------------------------------------
while read num line; do
print -n "$line "
if [ $num -eq 8 ]; then
print
fi
done |
###---------------------------------------
### and there they are... in positions 1-8
###---------------------------------------
awk 'BEGIN{ OFS="|"; }
{ print( $1, $2, $3, $4, $5, $6, $7, $8 ); }' |
###---------------------------------------
### oh. and remove the at sign for the date.
###---------------------------------------
sed -e 's/@/ /'

It's a complex mess, indeed.

sdohn · January 27, 2009, 11:07am

Thanks a lot User joeyg for your solution, now I can further remove what I'm not wanting on the lines.

brgds from User sdohn

quirkasaurus · January 27, 2009, 11:10am

i like the tilde solution, too. even better!

but figured i'd post mine anyways -- hopefully some of the ideas are valuable...

sdohn · January 27, 2009, 11:11am

Thank you for your solution to this complex problem.
The reason for me was to seperate the Values for putting them in a database. Now I can do a report with sql with the data.

brgds from user sdohn

quirkasaurus · January 27, 2009, 11:13am

Cool. Then the script is useful. It converts everything to a pipe-delimited output.
Just load from there.