Number of Vowels

Hi Guys,

It may look silly, but I am stuck on it and need some help

How to count number of vowels in a text file

Regards,
Gaurav Goel

Why you need that? is it a school work?

i hope the number of posts by me will tell you that i am not in school. :slight_smile:
actually one of my collegue is stuck into a problem and asked for my help.

So, i guess rather than arguing on whether its school work or not, we find the solution to it

We had post once where an OP's colleague was stuck on a problem which looked like homework. It turned out to be the he was trying to solve his kids schoolwork. And the thread was closed with no further discussion.

Its a judgement call. I am sure you made the best one in this regard.

I will leave the thread open.

Here's one way to do it.

#! /bin/sh -f

vowels=0

while IFS="\n" read line
do
    length=${#line}
    if [[ $length -eq 1 ]] ; then
    continue
    fi ;
    new_line=$(echo "$line" | tr -d 'aeiouAEIOU')
    new_length=${#new_line}
    vowel=$(($vowel + $length - $new_length))
done < input.txt

echo Vowels=$vowel

I tried it on a few lines from the sh man pages

[/tmp]$ cat input.txt
       Bash  is  an  sh-compatible  command language interpreter that executes
       commands read from the standard input or from a file.  Bash also incor-
[/tmp]$ ./try.sh 
Vowels=42
[/tmp]$ 

This is one way of achieving the count of vowels in the file 1.txt

tr '[a-z]' '[A-Z]' < 1.txt | tr -sc 'AEIOU' '[\012*]' | sort | uniq -c

An alternative in Python, if you have it.

Input.txt:

all = open("input.txt").read()
vowels="aAeEiIoOuU"
for v in vowels:
        total = total + all.count(v) 
 	print "Vowel: ", v, "count: ", all.count(v)
print "Total vowels: " , total

Output:

Vowel:  a count:  14
Vowel:  A count:  0
Vowel:  e count:  11
Vowel:  E count:  0
Vowel:  i count:  6
Vowel:  I count:  0
Vowel:  o count:  8
Vowel:  O count:  0
Vowel:  u count:  3
Vowel:  U count:  0
Total vowels: 42

hey guys thanks for your responses.
I will be trying these out.

You are right vino, that this is a very intresting problem.
It seemed easy at te first look but once I started thinking about it, it became complex.

I am looking for a single step solution, Charu can you please explain the command

Thanks and Regards,
Gaurav Goel

ok here is one more

grep "[aeiouAEIOU]" filename.txt | tr -cd 'aeiouAEIOU' | wc -c

I am not sure whether u want the total count or the count of 'A', 'E', 'I', 'O' and 'U' separately.
However the one-liner that I have given will give separate counts.

First, the command

tr '[a-z]' '[A-Z]' < 1.txt

will convert the contents of 1.txt to capital letters. Then, the command

tr -sc 'AEIOU' '[\012*]'

will tokenize the contents of the file to give only vowels.

On this output you are sorting so that you could use the uniq command on it.
uniq -c will precede each output line with the count of the number of times the line occurred in the input.

Some alternatives...

$ fold -1 input.txt|egrep -i '[aeiou]'|sort |uniq -c
     14 a
     11 e
      6 i
      8 o
      3 u
$ tr -dc AEIOUaeiou < input.txt | fold -1 | sort |uniq -c
     14 a
     11 e
      6 i
      8 o
      3 u
$ fold -1 input.txt|egrep -ic '[aeiou]'
42
$ tr -dc AEIOUaeiou < input.txt | wc -c
     42

Hi All,

Thanks for such quick and amazing responses.
Thanks a lot.
Maybe I need to explore the power of tr command.

Thanks and Regards,
Gaurav Goel

well.. this wont work in every case..
let's suppose 1.txt is like this:

$ cat 1.txt
aaaa

The output is :

$ tr '[a-z]' '[A-Z]' < 1.txt | tr -sc 'AEIOU' '[\012*]' | sort | uinq -c
   1 AAAA

which is wrong.. as there are four vowels :mad:

as there is nothing else to replace the new line character with...
the repetitive characters are counted as 1

awk '{
 for(i=1;i<=NF;i++)
 if($i=="vowels")
 n++
 }
END{
print n
}' file

fold is another useful command... Thanks :slight_smile:

The traditional hurdle is that in English, y is a vowel in some contexts, while not in others. (In "yet" it's pronounced as /j/ and so is not a vowel; in "fly" it is pronounced as /ai/ and is a vowel.) There are numerous textbook examples of how to solve this, which is probably another reason why the suspicion was raised that this might be homework.