Awk - Compare fields and increment variables

mv652 · May 22, 2009, 4:45am

Hi,

My first post to this group...

I have a need to to parse a source file which is a capture from a network analyser.

I have two fields that need to be checked:

Field 7 represents the packet length (an integer), and
Field 4 represents a network address (e.g. 192.168.25.3)
The first check is to find 2 consecutive lines that have the same integer in Field 7 i.e. the same length. Original file may not always have these lines consecutive though, but I am ok to ignore those lines if it is too difficult to include those.
Then, once we have these two lines, check the text in Field 4 for these lines and inidicate the value within the text that is 'first' and increment a variable.

What I'm after is to understand how many times address A is first compared to address B.

My expected output from the sample below would be:

"239.25.30.25 is first once" and "239.25.30.26 is first twice.

Even an output like "239.25.30.25 - 1, 239.25.30.26 - 2" would be great.

Example source:

No. Time Source Destination Protocol Info Length
1 20:44:19.525910000 192.168.30.25 239.25.30.25 UDP Source port: dnp Destination port: 20000 94
2 20:44:19.525932000 192.168.30.26 239.25.30.26 UDP Source port: dnp Destination port: 20000 94
3 20:44:19.525989000 192.168.30.26 239.25.30.26 UDP Source port: dnp Destination port: 20000 114
4 20:44:19.526037000 192.168.30.25 239.25.30.25 UDP Source port: dnp Destination port: 20000 114
13 20:44:19.693262000 192.168.30.26 239.25.30.26 UDP Source port: dnp Destination port: 20000 193
14 20:44:19.693295000 192.168.30.25 239.25.30.25 UDP Source port: dnp Destination port: 20000 193

I believe Awk should be able to take of this, but my awk skills are not good enough to come up with something decent.

I hope someone may be able to point me in the right direction.

Thanks,
Mario

cambridge · May 22, 2009, 5:25am

awk '
    $1 ~ /^[0-9]/ {
        if (!last) { last=$12; ip=$4 }
        else
        {
            if (last==$12) ipc[ip]++
            last=0
        }
    }

    END { for (ip in ipc) print ip, ipc[ip] }
' inputfile

produces the following output from your example file: 239.25.30.25 1
239.25.30.26 2

panyam · May 22, 2009, 5:35am

somethng like this you can try

awk 'BEGIN { prev=0 ; count=1 } { if ( prev==$NF) count++;else { count=1;;prev=$NF } print $4,"-", $12 "- " count}'  File.txt

hope tht the last column is sorted

vidyadhar85 · May 22, 2009, 5:40am

i couldn't understand your problem(req) fully but i tried this..

awk 'FNR%2{var=$NF;next}{if(var==$NF){if($4=="239.25.30.26"){v25 += 1}else{v26 += 1}}}END{print "239.25.30.25 - "v25"\n239.25.30.26 - "v26}' filename

mv652 · May 22, 2009, 5:47am

That's great, you've given me what I needed!!

Thank you so much for replying.

Best Regards,
Mario

mv652 · May 22, 2009, 6:17am

vidyadhar85,

To answer your question, I have a text file containing data from a network capture.

The data is duplicated (on purpose) and is sent to two destinations (multicast addresses). Sometimes data for one destination is received first, other times data to the other destination is first.

I'm trying to work out which destination is usually first depending on the sample I capture.

I've just seen that depending on which awk string I run from the replies above, I get different output / results from the replies received, so will probably still need to verify which gives me the most correct answer for a particular sample.

I think cambridge's script works best for me so far.

Thanks again.

Mario

cambridge · May 22, 2009, 6:26am

Note that my script only works with consecutive lines. It gets more convoluted if you want to handle other cases, as you'll need to decide how many lines should be allowed between each for it to be a valid sample.

mv652 · May 22, 2009, 8:04am

Understood.

One thing I may be able to do is sort the packets first by order of packet size, making sure they are still in order of time received after that.

Thanks for confirming.

Mario

mv652 · May 26, 2009, 4:17am

I'm trying to understand better what your script does so I can manipulate it to do an additional check on the first field (I want to ensure the lines been compared have the first field within 5 integers of each other).

I'd like to work it out myself, but just need a bit of help with understanding the syntax below...

So, in the part:

awk '
$1 ~ /^[0-9]/ {
if (!last) { last=$12; ip=$4 }

I understand you set the variables last and ip to $12 and $4 respectively, but I'm not sure what the (!last) check does..

Is it "if last exists"?

I get the 'else' to be:

{
if (last==$12) ipc[ip]++
last=0
}

"if last is the same as the 12th field, increment the the ip counter"

Please excuse this 'silly' question, but I'm really trying to get my head around this and not finding it easy.

Regards,

cambridge · May 26, 2009, 5:25am

I'm setting the variable 'last' to the value in the 12th field, but only if I've not set this variable before or if the variable contains the value 0. That's what 'if (!last)' means, if the 'last' variable is blank or 0. In AWK, if you reference a variable that's never been set before, it's the same as if it were blank or 0.

Well ipc is an associative array. The key is the 'ip' variable set earlier. Yes, we're maintaining counters for each unique IP address we come across where the 12th field is the same on two consecutive lines, and this is where that counter is incremented.

0-9 ↩︎

mv652 · May 26, 2009, 6:04am

Ok, great.

Thank you for taking the time to answer.

Regards,