How to count no of occurences of a character in a string in UNIX

kamesh83 · March 15, 2006, 1:11am

i have a string like echo "a|b|c" . i want to count the | symbols in this string . how to do this .plz tell the command

matrixmadhan · March 15, 2006, 2:02am

echo $((`echo "a|b|c" | sed 's/[^|]//g' | wc -c` - 1 ))

blowtorch · March 15, 2006, 2:17am

Similar, but a bit different:

echo $(($(echo "a|b|c"|sed 's/[a-z]//g'|wc -c)-1))

matrixmadhan · March 15, 2006, 2:54am

if the above is the solution,
the input is restricted to characters from a-z delimited by '|'
if the input includes numbers,
the solution should include numbers also and similiarly for special characters.

Hence, retaining '|' characters would be easier to maintain.

kamesh83 · March 15, 2006, 8:28am

Hey madhan ,

thanks for sending the aboove command to count the no of times a character occurred. its working fine.

but i found a simple way of doing this.
Just count the no of fields between each | symbol and subtract 1 from the total count

the code is:

no_fields=`echo "a|b|c " |awk -F"|" '{print NF}' `
echo $no_fields

it will return 3 and subtract 1 from it to get answer as 2

it will work for string of any length

matrixmadhan · March 15, 2006, 8:46am

substract it directly then,
no subsequent step needed

echo "a|b|c " |awk -F"|" '{print NF-1}'

Unbeliever · March 17, 2006, 7:09am

Probably the simplest way is:

echo "a|b|c" | tr -dc '|' | wc -c

The 'tr' command simply deletes any input character that is not a '|' leaving you with just the character you want to count.

Similar discussion is here

Unbeliever

matrixmadhan · March 17, 2006, 8:00am

but your command (soln) uses three commands and 2 kernel DS

Unbeliever · March 17, 2006, 8:43am

Well yes it uses three very basica unix commands each with very low over heads and works no matter what the input is: Running all 4 command variations posted so far 1000 times in a row I get the following timings on an unused machine (I basically surrounded each command with a while loop).

echo $((`echo "a|b|c" | sed 's/[^|]//g' | wc -c` - 1 )) > /dev/null
real 0m10.109s
user 0m2.880s
sys 0m12.590s

echo $(($(echo "a|b|c"|sed 's/[a-z]//g'|wc -c)-1)) > /dev/null
real 0m10.141s
user 0m2.910s
sys 0m11.950s

echo $(($(echo 'a|b|c' |awk -F"|" '{print NF}') -1)) > /dev/null
real 0m10.838s
user 0m3.340s
sys 0m8.630s

echo 'a|b|c' | tr -dc '|' | wc -c > /dev/null
real 0m6.962s
user 0m2.770s
sys 0m8.960s

So I suppose it depends on how you define simplest

matrixmadhan · March 17, 2006, 9:15am

of course yes, it depends upon how we define simplest.

I just ran the following two commands in a loop for 10,000 times,
could you please verify it.

# !/usr/bin/ksh
i=1
while [ $i -le 10000 ]
do
#with each of the following command run individually in the script
#echo 'a|b|c' | tr -dc '|' | wc -c > /dev/null
#echo $(($(echo 'a|b|c' |awk -F"|" '{print NF}') -1)) > /dev/null
i=$(($i + 1))
done
exit 0

following is the time taken,
**********************************************
echo 'a|b|c' | tr -dc '|' | wc -c > /dev/null
real 6m23.95s
user 1m59.82s
sys 5m2.26s
**********************************************
echo $(($(echo 'a|b|c' |awk -F"|" '{print NF}') -1)) > /dev/null
real 6m6.93s
user 1m23.27s
sys 3m18.99s
**********************************************

only the two commands have been considered for example.
when a particular sample is run for a longer time - output may differ.

Particularly, I dont find any use in just discussing the time taken by each of the commands when it is run,

it could be that the sed and awk are complex programs when compared to tr; hence naturally they consume more time to execute.
This is just my suggestion and the actual reason could be different.

Unbeliever · March 17, 2006, 9:53am

We are not *just* discusing the times taken, I added them to a thread in which a larger discussion was already taking place. My presentation of timings was in response to your apparent questioning of whether or not my solution was 'simple' or not. My personal opinion is that for trivial operations, if the simplest commands are used to perform the job, then they probably form the simplest solution. Hence my own particular solution. It does not invalidate any otehr solution, just presents a different one.

Other people often prefer to use the same tool, be it perl, awk, ruby etc to do just about everything. /shrug everyone has their own way.

Perderabo · March 17, 2006, 1:39pm

Actually, I think the timings are very interesting. Sometimes I want speed and I'm willing to tolerate a bit of complexity to achieve the speed. ksh can do this using just builtin functions. I have been fiddling with several techniques. There isn't a super obvious optimal choice. But I have settled on:

echo "a|b|c" | { read x
        x=${x}X
        n=0
        while ((${#x}>1)); do
                typeset -L1 c=$x ; typeset -R$((${#x}-1)) x
                [[ $c = \| ]] && ((n=n+1))
        done ; echo $n ; }
echo $n   # superfluous echo
exit 0

This won't work right in any shell except ksh, not even pdksh. The requirement is to read the string from a pipe. A fork() will happen to provide the echo process at the start of the pipeline. But with ksh, the last command in a pipeline is executed in the context of the parent shell if it is a builtin. This allows stuff like "echo foo | read bar" to work in (only) ksh. I was a little nervous that I could count all of the braced statements as "a builtin". That is the reason for the second "echo $n". The parent shell sees the new value for n, proving that the entire loop was processed inside ksh. Also remember that ksh will compile a loop and execute the compiled code. Lesser shells need to re-interpret the source code on each iteration of a loop.

I don't represent this as simple, only fast. But I didn't do any timings so I don't have any numbers.