Count occurrence of string in a column using awk

Hi,
I want to count the occurrences of strings in a column and display as in example below:
Input:

get1 345 789 098 
get2 567 982 090 
fet4 777 610 632
get1 800 544 230
get1 600 788 451
get2 892 321 243
get1 673 111 235
fet3 789 220 278
fet4 768 222 341

output:

4 get1 345 789 098 
2 get2 567 982 090 
2 fet4 777 610 632
4 get1 800 544 230
4 get1 600 788 451
2 get2 892 321 243
4 get1 673 111 235
1 fet3 789 220 278
2 fet4 768 222 341

An awk approach by joining the input file:

awk 'NR==FNR{A[$1]++;next}{$0=A[$1] OFS $0}1' file file
1 Like

Thank you for your effort, but I'm not getting any output.

---------- Post updated at 01:49 PM ---------- Previous update was at 01:20 PM ----------

Thanks Its working now!

Try this awk script...

awk '{x[$1]++;y[$1]=$0;z[NR]=$1}END{for(i=1;i<=NR;i++) print x[z], y[z]}' file

Please one more thing:

with count of 2 or less i want to show the row with minimum value in column 4; with count greater than 2, i want to show the rows with minimum and maximum value in column 4

Input:

Count col2 col3 col4 col5
4 get1 345 789 098 
2 get2 567 982 090 
2 fet4 777 610 632
4 get1 800 544 230
4 get1 600 788 451
2 get2 892 321 243
4 get1 673 111 235
1 fet3 789 220 278
2 fet4 768 222 341

output:

4 get1 673 111 235
4 get1 345 789 098 
2 get2 892 321 243
2 fet4 768 222 341
1 fet3 789 220 278

I noticed some inconsistencies in your required output.

If you are trying to print min and max based on column 4, then you could try:

awk '
        {
                A[$1]++
                if (!(_min[$1]))
                        _min[$1] = $4
                rmax[$1] = _max[$1] < $4 ? $0 : rmax[$1]
                rmin[$1] = _min[$1] >= $4 ? $0 : rmin[$1]
                _max[$1] = _max[$1] < $4 ? $4 : _max[$1]
                _min[$1] = _min[$1] > $4 ? $4 : _min[$1]
                next
        }
        END {
                for ( k in A )
                {
                        print A[k] OFS rmin[k]
                        if ( rmin[k] != rmax[k] )
                                print A[k] OFS rmax[k]
                }
        }
' file

Output:

4 get1 345 789 098
4 get1 600 788 451
1 fet3 789 220 278
2 get2 567 982 090
2 get2 892 321 243
2 fet4 768 222 341
2 fet4 777 610 632

Thank you for your effort.
Please Let me explain again

I want to print for every value in column2:

  • two rows if column1 is greater than 2 (rows with max. and min. value in column4)
  • One row if column1 is <= 2( row with min. value in column4)

Input:

4 get1 345 789 098 
2 get2 567 982 090 
2 fet4 777 610 632
4 get1 800 544 230
4 get1 600 788 451
2 get2 892 321 243
4 get1 673 111 235
1 fet3 789 220 278
2 fet4 768 222 341

Output:

4 get1 673 111 235
4 get1 345 789 098 
2 get2 892 321 243
2 fet4 768 222 341
1 fet3 789 220 278

OK, then replace if ( rmin[k] != rmax[k] ) with if ( A[k] > 2 ) in the code that I posted.

Here is what I got for below input:

get1 345 789 098
get2 567 982 090
fet4 777 610 632
get1 800 544 230
get1 600 788 451
get2 892 321 243
get1 673 111 235
fet3 789 220 278
fet4 768 222 341

Output:

4 get1 345 789 098
4 get1 600 788 451
1 fet3 789 220 278
2 get2 567 982 090
2 fet4 768 222 341
1 Like