Hi Guys,
It may look silly, but I am stuck on it and need some help
How to count number of vowels in a text file
Regards,
Gaurav Goel
Hi Guys,
It may look silly, but I am stuck on it and need some help
How to count number of vowels in a text file
Regards,
Gaurav Goel
Why you need that? is it a school work?
i hope the number of posts by me will tell you that i am not in school.
actually one of my collegue is stuck into a problem and asked for my help.
So, i guess rather than arguing on whether its school work or not, we find the solution to it
We had post once where an OP's colleague was stuck on a problem which looked like homework. It turned out to be the he was trying to solve his kids schoolwork. And the thread was closed with no further discussion.
Its a judgement call. I am sure you made the best one in this regard.
I will leave the thread open.
Here's one way to do it.
#! /bin/sh -f
vowels=0
while IFS="\n" read line
do
length=${#line}
if [[ $length -eq 1 ]] ; then
continue
fi ;
new_line=$(echo "$line" | tr -d 'aeiouAEIOU')
new_length=${#new_line}
vowel=$(($vowel + $length - $new_length))
done < input.txt
echo Vowels=$vowel
I tried it on a few lines from the sh man pages
[/tmp]$ cat input.txt
Bash is an sh-compatible command language interpreter that executes
commands read from the standard input or from a file. Bash also incor-
[/tmp]$ ./try.sh
Vowels=42
[/tmp]$
This is one way of achieving the count of vowels in the file 1.txt
tr '[a-z]' '[A-Z]' < 1.txt | tr -sc 'AEIOU' '[\012*]' | sort | uniq -c
An alternative in Python, if you have it.
Input.txt:
all = open("input.txt").read()
vowels="aAeEiIoOuU"
for v in vowels:
total = total + all.count(v)
print "Vowel: ", v, "count: ", all.count(v)
print "Total vowels: " , total
Output:
Vowel: a count: 14
Vowel: A count: 0
Vowel: e count: 11
Vowel: E count: 0
Vowel: i count: 6
Vowel: I count: 0
Vowel: o count: 8
Vowel: O count: 0
Vowel: u count: 3
Vowel: U count: 0
Total vowels: 42
hey guys thanks for your responses.
I will be trying these out.
You are right vino, that this is a very intresting problem.
It seemed easy at te first look but once I started thinking about it, it became complex.
I am looking for a single step solution, Charu can you please explain the command
Thanks and Regards,
Gaurav Goel
ok here is one more
grep "[aeiouAEIOU]" filename.txt | tr -cd 'aeiouAEIOU' | wc -c
I am not sure whether u want the total count or the count of 'A', 'E', 'I', 'O' and 'U' separately.
However the one-liner that I have given will give separate counts.
First, the command
tr '[a-z]' '[A-Z]' < 1.txt
will convert the contents of 1.txt to capital letters. Then, the command
tr -sc 'AEIOU' '[\012*]'
will tokenize the contents of the file to give only vowels.
On this output you are sorting so that you could use the uniq command on it.
uniq -c
will precede each output line with the count of the number of times the line occurred in the input.
Some alternatives...
$ fold -1 input.txt|egrep -i '[aeiou]'|sort |uniq -c
14 a
11 e
6 i
8 o
3 u
$ tr -dc AEIOUaeiou < input.txt | fold -1 | sort |uniq -c
14 a
11 e
6 i
8 o
3 u
$ fold -1 input.txt|egrep -ic '[aeiou]'
42
$ tr -dc AEIOUaeiou < input.txt | wc -c
42
Hi All,
Thanks for such quick and amazing responses.
Thanks a lot.
Maybe I need to explore the power of tr command.
Thanks and Regards,
Gaurav Goel
well.. this wont work in every case..
let's suppose 1.txt is like this:
$ cat 1.txt
aaaa
The output is :
$ tr '[a-z]' '[A-Z]' < 1.txt | tr -sc 'AEIOU' '[\012*]' | sort | uinq -c
1 AAAA
which is wrong.. as there are four vowels
as there is nothing else to replace the new line character with...
the repetitive characters are counted as 1
awk '{
for(i=1;i<=NF;i++)
if($i=="vowels")
n++
}
END{
print n
}' file
fold is another useful command... Thanks
The traditional hurdle is that in English, y is a vowel in some contexts, while not in others. (In "yet" it's pronounced as /j/ and so is not a vowel; in "fly" it is pronounced as /ai/ and is a vowel.) There are numerous textbook examples of how to solve this, which is probably another reason why the suspicion was raised that this might be homework.