Sort uniq or awk

LDHB2012 · January 16, 2013, 11:02am

Hi again,

I have files with the following contents

datetime,ip1,port1,ip2,port2,number

How would I find out how many times ip1 field shows up a particular file? Then how would I find out how many time ip1 and port 2 shows up?

Please mind the file may contain 100k lines.

Don_Cragun · January 16, 2013, 11:50am

I don't understand what you're trying to do.

Are you saying that you have a comma separated file, but some lines have less than 5 commas and you want to know how many lines have at least one comma but less than 4 commas?

shamrock · January 16, 2013, 12:06pm

Post a sample of the input and the output as we cant read your mind...

jnojr · January 16, 2013, 12:13pm

ldhb2012:

Hi again,

I have files with the following contents
datetime,ip1,port1,ip2,port2,number
How would I find out how many times ip1 field shows up a particular file? Then how would I find out how many time ip1 and port 2 shows up?

Please mind the file may contain 100k lines.

grep ip1 filename | wc -l

grep ip1 filename | grep port2 | wc -l

Don_Cragun · January 16, 2013, 1:50pm

jnojr:

Originally Posted by LDHB2012
Hi again,

I have files with the following contents

Code:
datetime,ip1,port1,ip2,port2,number

How would I find out how many times ip1 field shows up a particular file? Then how would I find out how many time ip1 and port 2 shows up?

Please mind the file may contain 100k lines.
grep ip1 filename | wc -l
grep ip1 filename | grep port2 | wc -l

Note that neither your script above nor the script below will find out how many times "port 2" shows up; both script will find out how many times "port2" shows up.

Since you allow both ip1 and port2 to appear anywhere in a line whether or not a field has other characters before or after "ip1" and "port2" in the field and no matter which field contains them, I don't see any way to use sort or uniq to do what you want. This following awk script should be much more efficient than running wc twice and grep three times:

awk '!/ip1/{next}
        {c++}
/port2/ {c2++}
END     {printf("%d\n%d\n", c, c2)}' filename

RudiC · January 17, 2013, 5:08am

Making wild guesses that the requestor wants any possible combination of field 2 and field 5 in a large comma separated file of numbers, I get at

$ awk  'BEGIN {SUBSEP=":"}
         NR==1 {next}
         {IP1[$2]++;PORT2[$2,$5]++}
         END {for (i in PORT2) {x=index(i,SUBSEP); j=substr(i,1,x-1); print "IP1=" j ": " IP1[j] ", PORT2=" substr(i,x+1)":  "PORT2}}
        ' FS="," file
IP1=1: 1, PORT2=5:  1
IP1=2: 2, PORT2=5:  1
IP1=2: 2, PORT2=6:  1
IP1=3: 3, PORT2=5:  2
IP1=3: 3, PORT2=6:  1

Input file would be sth like

datetime,ip1,port1,ip2,port2,number
9,1,8,7,5,4 
9,2,8,7,5,4
9,3,8,7,5,4
9,3,8,7,5,4
9,2,8,7,6,4
9,3,8,7,6,4

LDHB2012 · January 17, 2013, 6:04pm

Thank you all for replying and I apologize for not making myself clear when asking this question. I think RudiC may have got it the closest, but I'm still not sure it's exactly what I'm looking for.

So, in my very large file, I have 6 fields and 300,000 lines of the same type of fields, obviously each line will contain different information in each field. But some lines will contain the same value in certain fields.

Here's my original qusetion:
"How would I find out how many times ip1 field shows up a particular file? Then how would I find out how many time ip1 and port 2 shows up?"

To explain this better: ip1 is not a header, but simply represents an IP address in field 2. Port2 simply represents a port number in field 5, associated with IP address in field 4. My goal is find out how many similar instances in field 2(ip1), shows up in this one file. THEN, I want to know how many times that same IP address shows up with the SAME port number (lets just say field 5).

I know awk could probably do this. But I was wondering if sort or uniq can recognize field values, as awk would recognize the field ip1, as $2.

I hope that clears it up. Thanks again...

Don_Cragun · January 17, 2013, 6:29pm

ldhb2012:

Thank you all for replying and I apologize for not making myself clear when asking this question. I think RudiC may have got it the closest, but I'm still not sure it's exactly what I'm looking for.

So, in my very large file, I have 6 fields and 300,000 lines of the same type of fields, obviously each line will contain different information in each field. But some lines will contain the same value in certain fields.

Here's my original qusetion:
"How would I find out how many times ip1 field shows up a particular file? Then how would I find out how many time ip1 and port 2 shows up?"

To explain this better: ip1 is not a header, but simply represents an IP address in field 2. Port2 simply represents a port number in field 5, associated with IP address in field 4. My goal is find out how many similar instances in field 2(ip1), shows up in this one file. THEN, I want to know how many times that same IP address shows up with the SAME port number (lets just say field 5).

I know awk could probably do this. But I was wondering if sort or uniq can recognize field values, as awk would recognize the field ip1, as $2.

I hope that clears it up. Thanks again...

When you figure out whether or not RudiC got what you want, let us know.

If his script didn't do what you want, please explain what he did wrong AND show us the output you want for the sample input file he used as his test case.

RudiC · January 18, 2013, 2:18pm

Or, run the script on a snippet of of your real data, show input and output and comment. The script's output shows the count for all combinations of IP1 and PORT2 showing up in the file. The (repeated, equal) count of an IP1 is the sum of all related PORT2 occurrences.