If we are concerned with each column,usually this is very easy using the wc command in shell, however,the problem is:
I have file with 8 rows but 100 columns.
1.I want to find the number of different values a certain "row" takes
2.The values are in fact characters for, example, "john", "david,"steve"
3.Sometimes the value is empty, when the value is empty, I want to ignore it completely when I am trying to figure out the number different values a certain row takes.
awk -F, '{
cl = $1; sub(/ .*/, "", cl)
sub(/[^ ]* /,"")
while (++i <= NF)
$i ~ /^ *$/ || _[$i]++ || c++
printf "%s has %d unique names\n",
cl, c
i = c = 0; split("",_)
}' infile
Attempt with Python:
[script]
#! /usr/bin/env python
import fileinput, re
p = re.compile('^\s*$')
for l in fileinput.input():
l = l.rstrip()
l = l.split(',')
cl, l[0] = l[0].split()[:2]
u, c = {}, 0
for r in l:
u[r] = 1
for k in u.iterkeys():
if not p.match(k):
c = c + 1
print cl, "has", c, "unique names"