counts the number of distinct words

I'm looking to write a sample shell script that counts the number of distinct words in a text file given as Argument.
Remark: White space characters are spaces, tabs, form feeds, and new lines.

JUST with this commands tr, sort, grep. wc.

Thanks.

It sounds like you already know which commands you need to use. Where are you stuck? Have you read the man pages for those commands?

Actully yes but i don't know how i can use it to give me distinct words.

please if you can just guide me how i can do it. thank you

Difficult to "guide" you how to do it without just telling you how to do it.... and that way you won't learn anything!

Try using tr to strip out all punctuation (see the -d option), then using tr again to convert all spaces to carriage returns and all upper-case characters to lower-case. Then you can sort the output using the unique option (see the man page) so that you end up with only distinct words, and then count the number of lines produced using wc.

perl:

$file=shift;
open(FH,"<file");
while(<FH>){
	@arr=split(" ",$_);
	for($i=0;$i<=$#arr;$i++){
		$hash{$arr[$i]}++;
	}
}
close(FH);
for $key (keys %hash){
	print $key,"--->",$hash{$key},"\n";
}

just pass those agrument to grep -c

scriptname abc efg
grep -c "$1" filename---this will give the count of abc in file
similarly do for remainning...

I don't think you understood the question vidyadhar85.

can you explain please :confused:

He wants to count the number of "distinct" (i.e. unique) words... e.g. if the words were "Blah yak blah blah yak rhubarb" the answer would be 3 ("blah yak rhubarb").

tr -s '[:space:]' '\n' < infile | sort -u | wc -l

The way the question is posed it looks like homework. But I cannot be sure. Don't post homework questions, please.

With that limitation (grep, tr, sort, wc) definitely looks like homework.

If you could use xargs:

xargs -a FILENAME -n 1|sort -u|wc -l

My friends this is not HW it's just challenge between my friends :slight_smile:
Anyway i try to solve it:

grep -c | sort -u test.txt | tr -d "\t \v \f [unct:] [:upper:] "

but actually i don't know how i can sort this commands ?? so anybody can correct it for me.

Thanks everyone replayed to me.

What's the grep for? It doesn't do anything without at least a regular expression to search for, and usually also a file name.

Jim' solution already solved your problem; did you really not try the solutions posted here?

# Change any whitespace into a newline
tr '\t\v\f ' '\n' <test.txt |
# sort, deleting any repeated occurrences of the same word
sort -u |
# count how many we have
wc -l

This is very much a staple of introductory Unix text books; I'd recommend that you finish the first chapter before you accept any more challenges.

Something like this:

for word in $(cat $filename)
do
echo $word|tr -d [:punc:]|tr [:upper:] [:lower:] 
done|sort |uniq -c

Just like Annihilannic suggested.

echo `cat file` | tr '[:lower:]' '[:upper:]' | tr '\t\v\f ' '\n' | sort -u | wc -l