Sort uniq or awk

Hi again,

I have files with the following contents

datetime,ip1,port1,ip2,port2,number

How would I find out how many times ip1 field shows up a particular file? Then how would I find out how many time ip1 and port 2 shows up?

Please mind the file may contain 100k lines.

I don't understand what you're trying to do.

Are you saying that you have a comma separated file, but some lines have less than 5 commas and you want to know how many lines have at least one comma but less than 4 commas?

Post a sample of the input and the output as we cant read your mind...

grep ip1 filename | wc -l

grep ip1 filename | grep port2 | wc -l

Note that neither your script above nor the script below will find out how many times "port 2" shows up; both script will find out how many times "port2" shows up.

Since you allow both ip1 and port2 to appear anywhere in a line whether or not a field has other characters before or after "ip1" and "port2" in the field and no matter which field contains them, I don't see any way to use sort or uniq to do what you want. This following awk script should be much more efficient than running wc twice and grep three times:

awk '!/ip1/{next}
        {c++}
/port2/ {c2++}
END     {printf("%d\n%d\n", c, c2)}' filename

Making wild guesses that the requestor wants any possible combination of field 2 and field 5 in a large comma separated file of numbers, I get at

$ awk  'BEGIN {SUBSEP=":"}
         NR==1 {next}
         {IP1[$2]++;PORT2[$2,$5]++}
         END {for (i in PORT2) {x=index(i,SUBSEP); j=substr(i,1,x-1); print "IP1=" j ": " IP1[j] ", PORT2=" substr(i,x+1)":  "PORT2}}
        ' FS="," file
IP1=1: 1, PORT2=5:  1
IP1=2: 2, PORT2=5:  1
IP1=2: 2, PORT2=6:  1
IP1=3: 3, PORT2=5:  2
IP1=3: 3, PORT2=6:  1

Input file would be sth like

datetime,ip1,port1,ip2,port2,number
9,1,8,7,5,4 
9,2,8,7,5,4
9,3,8,7,5,4
9,3,8,7,5,4
9,2,8,7,6,4
9,3,8,7,6,4

Thank you all for replying and I apologize for not making myself clear when asking this question. I think RudiC may have got it the closest, but I'm still not sure it's exactly what I'm looking for.

So, in my very large file, I have 6 fields and 300,000 lines of the same type of fields, obviously each line will contain different information in each field. But some lines will contain the same value in certain fields.

Here's my original qusetion:
"How would I find out how many times ip1 field shows up a particular file? Then how would I find out how many time ip1 and port 2 shows up?"

To explain this better: ip1 is not a header, but simply represents an IP address in field 2. Port2 simply represents a port number in field 5, associated with IP address in field 4. My goal is find out how many similar instances in field 2(ip1), shows up in this one file. THEN, I want to know how many times that same IP address shows up with the SAME port number (lets just say field 5).

I know awk could probably do this. But I was wondering if sort or uniq can recognize field values, as awk would recognize the field ip1, as $2.

I hope that clears it up. Thanks again...

When you figure out whether or not RudiC got what you want, let us know.

If his script didn't do what you want, please explain what he did wrong AND show us the output you want for the sample input file he used as his test case.

Or, run the script on a snippet of of your real data, show input and output and comment. The script's output shows the count for all combinations of IP1 and PORT2 showing up in the file. The (repeated, equal) count of an IP1 is the sum of all related PORT2 occurrences.