How to find out the weird blank characters?

I have a text file downloaded from the web, I want to count the unique words used in the file, and a person's speaking length during conversation by counting the words between the opening and closing quotation marks which differ from the standard ASCII code. Also I found out the file contains some weird blank characters that are invisible from stdout which are the entry that has 118391 and the one has 6380 occurrence in the example.
It seems to me the file was processed with Mac PC by the single/double quotes I can guess, but I am not sure. Here is the output of my Ubuntu terminal:

tr -d '[:blank:]' < infile.txt | grep -o "." | sort | uniq -c | head
      4 �
   1089 �
   1098 �
  12146 �
  12147 �
 118391 
   6380 
  12237 about
     31 alot
    154 apple
 

1) How do I find out the invisible "blank/empty" characters in the file so that I can get rid of them in order to count the words?
2) How do I count the speaking duration of a person at conversations by the opening/closing double quotation pair? What I tried is:

grep "�.*�" infile.txt 

This regex is too greedy that sometime combines adjacent dialogues into single one.
Thanks!

Put the output through hexdump -C to see the output in hex.

I don't think it's wise to get rid of them, because they separate (and thus define) the words. Leave them in, count them, and then eliminate the "blank" count.
Those non-ASCII opening and closing double quotes are multibyte unicode characters. It might be easier to convert them to ASCII- quotes beforehand. Same holds true mayhap for the "blank" chars above...

Once converted, this

awk -F\" '
        {while (!(NF%2))        {getline X
                                 $0 = $0 " " X
                                }
         for (i=2; i<=NF; i+=2) print gsub (/[A-Za-z0-9]+/, "&", $i)
        }
' file

might give you a feeling for the "speech length".