Script to parse a file faster

My example file is as given below:

conn=1 uid=oracle
conn=2 uid=db2
conn=3 uid=oracle
conn=4 uid=hash
conn=5 uid=skher
conn=6 uid=oracle
conn=7 uid=mpalkar
conn=8 uid=anarke
conn=1 op=-1 msgId=-1 - fd=104 slot=104 LDAPS connection from 10.10.5.6 to 10.18.6.5
conn=2 op=-1 msgId=-1 - fd=104 slot=104 LDAPS connection from 10.20.35.10 to 10.18.6.5
conn=3 op=-1 msgId=-1 - fd=104 slot=104 LDAPS connection from 10.30.35.19 to 10.18.6.5
conn=4 op=-1 msgId=-1 - fd=104 slot=104 LDAPS connection from 10.40.35.11 to 10.18.6.5
conn=5 op=-1 msgId=-1 - fd=104 slot=104 LDAPS connection from 10.50.35.12 to 10.18.6.5
conn=6 op=-1 msgId=-1 - fd=104 slot=104 LDAPS connection from 10.10.35.14 to 10.18.6.5
conn=7 op=-1 msgId=-1 - fd=104 slot=104 LDAPS connection from 10.20.35.15 to 10.18.6.5
conn=8 op=-1 msgId=-1 - fd=104 slot=104 LDAPS connection from 10.20.35.16 to 10.18.6.5

I need to write a scipt which will grep "uid=oracle" and find the IP address the connection is initiated from
using the connection ID "conn=x"

This is a sample file which I have kind of simplified and the actually file is in GBs.

I tried doing this using the cat and for loop but it takes atleast 2 days to complete the script.

Is there a faster way to do this using perl or awk?

I would like an output something like this:
connid=x IP=w.x.y.z

Any help would certainly be appreciated!

Here's an awk programme that will give you what I think you want:

#!/usr/bin/ksh

awk '
    /uid=oracle/ { split( $1, a, "=" ); uids[a[2]] = 1; next; }
    /connection from/ {
        split( $1, a, "=" );
        if( uids[a[2]] )
            printf( "connid=%s IP=%s\n", a[2],  $(NF-2) );
    }
' input-file-name

It may take a few minutes to chew a few GiB file, but I think it would be faster than what you've experienced.

$ ruby -ane 'BEGIN{a=[]}; a<<$F[0] if $F[1]=="uid=oracle"; print "#{$F[0]}: #{$F[9]}\n" if a.include?($F[0]); ' file
[26/Aug/2011:11:24:20 +0000] conn=9978792 op=1 msgId=2 - SRCH base="ou=people,dc=abc,dc=com" scope=1 filter="(&(objectClass=shadowAccount)(uid=oracle))" attrs="uid userPassword shadowLastChange shadowMax shadowMin shadowWarning shadowInactive shadowExpire shadowFlag"
[26/Aug/2011:11:24:21 +0000] conn=9978793 op=1 msgId=2 - SRCH base="ou=people,dc=abc,dc=com" scope=1 filter="(&(objectClass=shadowAccount)(uid=oracle))" attrs="uid userPassword shadowLastChange shadowMax shadowMin shadowWarning shadowInactive shadowExpire shadowFlag"
[26/Aug/2011:11:24:22 +0000] conn=9978794 op=1 msgId=2 - SRCH base="ou=people,dc=abc,dc=com" scope=1 filter="(&(objectClass=shadowAccount)(uid=oracle))" attrs="uid userPassword shadowLastChange shadowMax shadowMin shadowWarning shadowInactive shadowExpire shadowFlag"
[26/Aug/2011:11:24:23 +0000] conn=9978795 op=1 msgId=2 - SRCH base="ou=people,dc=abc,dc=com" scope=1 filter="(&(objectClass=shadowAccount)(uid=oracle))" attrs="uid userPassword shadowLastChange shadowMax shadowMin shadowWarning shadowInactive shadowExpire shadowFlag"
[26/Aug/2011:11:24:30 +0000] conn=9978802 op=1 msgId=2 - SRCH base="ou=people,dc=abc,dc=com" scope=1 filter="(&(objectClass=shadowAccount)(uid=oracle))" attrs="uid userPassword shadowLastChange shadowMax shadowMin shadowWarning shadowInactive shadowExpire shadowFlag"
[26/Aug/2011:11:24:21 +0000] conn=9978793 op=-1 msgId=-1 - fd=559 slot=559 LDAPS connection from 10.20.13.2:30999 to 10.183.7.45
[26/Aug/2011:11:24:21 +0000] conn=9978793 op=-1 msgId=-1 - SSL 256-bit AES-256
[26/Aug/2011:11:24:21 +0000] conn=9978793 op=0 msgId=1 - BIND dn="" method=128 version=3
[26/Aug/2011:11:24:21 +0000] conn=9978793 op=0 msgId=1 - RESULT err=0 tag=97 nentries=0 etime=0 dn=""
[26/Aug/2011:11:24:21 +0000] conn=9978793 op=1 msgId=2 - SRCH base="ou=people,dc=abc,dc=com" scope=1 filter="(&(objectClass=shadowAccount)(uid=oracle))" attrs="uid userPassword shadowLastChange shadowMax shadowMin shadowWarning shadowInactive shadowExpire shadowFlag"
[26/Aug/2011:11:24:21 +0000] conn=9978793 op=1 msgId=2 - RESULT err=0 tag=101 nentries=1 etime=0
[26/Aug/2011:11:24:22 +0000] conn=9978793 op=2 msgId=0 - RESULT err=80 tag=120 nentries=0 etime=0
[26/Aug/2011:11:24:22 +0000] conn=9978793 op=-1 msgId=-1 - closing from 10.104.15.2:30988 - A1 - Client aborted connection -

Now to modify the script given by you for the file mentioned above, do I make it this:

#!/usr/bin/ksh  awk '     /uid=oracle/ { split( $3, a, "=" ); uids[a[2]] = 1; next; }     /connection from/ {         split( $3, a, "=" );         if( uids[a[2]] )             printf( "connid=%s IP=%s\n", a[2],  $(NF-2) );     } ' input-file-name

Can you try the below one..? In fact based on the your sample file, the below msgId filter would not be required,however..

awk '/oracle/{a[$1]=$1;next}/msgId/{if(a[$1]){printf("%s IP=%s\n", a[$1],$(NF-2))}}' inputfile

Thanks a lot agama, I have modified the script to suit my requirement but could you please tell me how the code works.

I am really not able to figure it out.

Glad you were able to get something to work. Some comments that should help explain things:

#!/usr/bin/ksh awk ' 
    # for each record from the input file, test to see if we should execute each block of code...
    /uid=oracle/ {              # execute this block when the string "uid=orical" is found in the record
        split( $3, a, "=" );    # split the third field into the array a using "=" as the seperator; a[2] is the id
        uids[a[2]] = 1;         # track all ids that we have seen
        next;                   # skip the remainder of the programme, read next record and start processing
    }

    /connection from/ {         # execute this block when "connection from" is in the record
        split( $3, a, "=" );    # split the connection id into array a
        if( uids[a[2]] )        # if we saw this id as an oricle id earlier, uids[id] 
                                #will be non-zero and thus true,  then print the info
            printf( "connid=%s IP=%s\n", a[2], $(NF-2) ); 
    } ' input-file-name

And not to 'backseat mod' here, but please place code-tags around any code or sample in/output. It really helps to have those kinds of information not mashed into a paragraph.

Thanks a lot, this information is really helpful.