How to count pattern in column

ahjiefreak · December 3, 2007, 3:01am

Hi,

I have another problem here on Bash. Assume I have an output file containing two columns which the second one contains :-

abc`dgdf`sdfdg`
dfgfg```ssd`

I would like to count the "`" characters for each line (which is first column). I am wondering what command can I used other than AWK to calculate these characters for each line.

I tried to use AWK by using if ($2)=="`", count[$1]=count++; however, it doesnt work.

Please advise. Thanks.

-Jason

Franklin52 · December 3, 2007, 9:58am

awk 'BEGIN {FS="`"} {print $1, NF-1}' file

Regards

ahjiefreak · December 3, 2007, 6:21pm

Hi franklin,

Thanks for your reply. It works well, but I need to know more about this command. From the Field Separator "`", does that means NF is actually referring to the field separator since we have set only one in the BEGIN loop?

If that is so, if i have other field separator let say '&' or '%' etc..how could I differentiate them using NF since NF to my understanding is referring to number of fields.

Please advise. Thanks.

Rgrds,
Jason

Franklin52 · December 4, 2007, 4:46am

The default fieldseperator of awk is a tab or spaces, which is change here with the BEGIN statement.
The fieldseperator is used as a trick to count the character.
NF is the number of fields of each row so if you have 5 fields there will be 4 fieldseperators.

Regards

ahjiefreak · December 4, 2007, 5:33am

Hi Franklin,

I agree with what you explained. But given the fact if I have one line consist of different patterns such as "`" (the one you showed me) and "$".
or even more other patterns,

If that is the case how can I use NF to differentiate these characters.

For your information, my input is a line of statements which is something like below:-

halo`world$$``

If using the method you showed me, we could have one NF tricked one at a time. How could we represent for both patterns in this case.

Please advise. Hope you could get my meaning. Thanks.

-Jason

Franklin52 · December 4, 2007, 6:39am

You can't use more than one fieldseparator in this case, for more characters you should use another method.

Regards

vgersh99 · December 4, 2007, 6:42am

awk -F'[`$]' ' {print $1, NF-1}' file

Franklin52 · December 4, 2007, 6:50am

Maybe I misunderstood the question, but he want to count more characters e.g.:

file:

Line1`abc$bn`$$

Output:

Line1: 2 "`" and 3 "$"

Regards

vgersh99 · December 4, 2007, 7:20am

I guess we'll need the OP to clarify what needs to be done with the example of the input and the desired output!

ahjiefreak · December 4, 2007, 8:23am

Hi there,

My question is actually want to count different characters in one line; let say "`" , "$" ,"%" etc.

What Franklin understanding is correct:-

Inside the file:-

Line1`abc$bn`$$

Output of the file

Line1: 2 "`" and 3 "$"

What other way can I used to implement these functions? Please advise. Thanks.

Rgrds,
Jason

vgersh99 · December 4, 2007, 9:02am

something along these lines:

nawk -f ah.awk myFile

ah.awk:

BEGIN {
  # if the variable 'var' has not been assigned to, assign a default value of
  # '%$`' to it.
  # The variable 'var' contains a list of characters to frequency usage for.
  if (pat=="")
    pat="%$`"

  # substitute every single character in 'var' with ' [char] ' string:
  # '%' -> ' [%] '; '$' -> ' [$] ' etc...
  gsub(/./, " [&] ", pat)

  # split string in 'pat' by ' ' and store results in array 'patA' (parN - holds the number of entries the patA array):
  # patA[1]='[%]'; patA[2]='[$]' etc....
  patN=split(pat, patA, " ")
}
{
  # print the line number - NF holds the current line number 
  printf("Line %d:", FNR)

  # iterate through ALL the entries ( 1 ---> patN) in patA array
  # for every entry in the array, substitute (gsub) the current entry if found in the current line ($0) with 
  # the empty string "" (nothing). The substitution/gsub returns the NUMBER of the successful substitutions
  # This is the number a particular character/string/regex appears in a given string ($0 - your record/line).
  for(i=1; i <= patN; i++)
     printf(" %d '%s' %s", gsub(patA, "", $0), patA, (i==patN) ? "\n" : "")
}

or to define your own set of chars to count:
nawk -v pat='^&*#' -f ah.awk myFile

ahjiefreak · December 4, 2007, 5:34pm

Hi,

Could you please explain to me what is the context of :-

nawk -v pat='^&*#' -f ah.awk myFile

Where do ah.awk comes? I assume "myFile" is the existing shell script.

And honeslty the code you shown me just stunned me. Correct me if I am wrong.

From my understanding,the first thing is to compare the blank spaces, and insert the patterns to pat array?

From the pat array which contain [%,$,`] , then you split the pat array to pat2 which holds the number of elements of pattern.

From these number of patterns, then what I could not really understood is:-

for(i=1; i <= patN; i++)
printf(" %d '%s' %s", gsub(patA[i], "", $0), patA[i], (i==patN) ? "\n" : "")

In this for loop, from 1 to number of lines in the file, you are trying to somehow check the pattern? But I do not really understand especially on
gsub(patA[i], "", $0), patA[i], (i==patN) ? "\n" : "") doing. Perhaps you could help to shed some light on this.

Sorry as I am quite new to the shell programming.

P/s: Is there any other ideas which we could use just awk and for loop without using the "ah.awk"?

Rgrds,
Jason

vgersh99 · December 4, 2007, 6:53pm

-v pat='^&#' - defines an awk variable called 'pat'. I assigned a string '^&#' to that variable. This variable will contain ALL characters you want to get the frequency stat for.

Sorry - I forgot to show this in my post - modified the post now.
'ah.awk' is an awk script from original post - that's the meat of implementation.

no, 'myFile' is the file you need to parse and find the character frequencies per line.

Instead commenting on your comments.... I'll document the code in the original post.