question about wc

Hey my friend was asking me if i knew a way to cout how many different words in a file. I told him no not off hand, but i was thinking about it, and i started to wonder also. I imagine this is probably pretty simple im just missing something, I keep confusing my self with how you would compair and filter out the same words twice or more. If any one knows of a way to do this id like to know.....

thanks

may sound weird but give it a try,

1) get all the words in a single column. example say your file is

a b df sd ff d

make it
a
b
df
sd
ff
.
.

2) sort the output of step 1
3) use uniq command on the output of step2

oh i see, i didnt know about the uniq command, thanks a lot!

post back if you get something, would like to see if it really works out.
theoretically it should work fine

sort -u filename|wc

dont thinks this will work

Works on AIX / Solaris / HP & Redhat

I know about the sort command with -u option and that it works

What i meant to say that the command which you have given is not the total solution for the question posted above. But that is again what i think, you may be right as well.

This is what I got from your command

but as per my understanding the output should have been 10 and not 12

Gaurav

ok i was working with the first idea. here is what i have so far. but for some reason this is not working, the command works but i think there is a problem with the input.

#!/bin/csh

echo "Please enter a filename: "
set filename = $<

set dif = `tr -d '.:"$(),-' < $filename | tr '[A-Z]' '[a-z]' | tr ' ' '\n' | sort | uniq | wc -l`
set num = `wc -l`

echo "Thank you, your file has $num words and $dif different words."

echo " " 

maybe someone can catch it..

one problem at the first glance

where is thwe filename mate

i want the user to be able to input the filename, then run the command on that filename

got that but the thing is in the command you have wriiten in the script to set num you have forgotten to mention the name of the file

set num = `wc -l filename`

oh yah duh!! lol

Ok this is the output im getting, why is the echo statement messed up.
Please enter a filename:
testfile.txt
words.ou, your file has 239 words and 159

heres the code:

#!/bin/csh

echo "Please enter a filename: "

set filename = $<

set num = ` wc -w $filename | awk '{ print $1 } ' `
set dif = `tr -d '.:"(),-' < $filename | tr '[A-Z]' '[a-z]' | tr ' ' '\n' | sort | uniq | wc -l`

echo "Thank you, your file has $num words and $dif different words."

echo " " 

So many tr's, pipes, combination of sort+uniq.

See if this works.

tr -cs '[:alnum:]' '[\n*]' < test.c | sort -u | wc -l
 libra% tr -cs '[:alnum:]' '[\n*]' < testfile.txt | sort -u | wc -l
     165

vs.

libra% tr -d '.:"(),-' < testfile.txt | tr '[A-Z]' '[a-z]' | tr ' ' '\n' | sort | uniq | wc -l
     159

what does the

do?

'[:alnum:]' doesnt do anything. Rather, it is a character class for alphabets and numbers.

See info -f coreutils --index-search='Character classes'