Character count per record

subrat · April 12, 2010, 7:43am

I have a flat file. How can i retrive the character count per record in whole file. Can anybody assist me on this

Cheers

vgersh99 · April 12, 2010, 7:44am

What's a 'record'?
Posting a sample file and a desired output would help (using code tags).

subrat · April 12, 2010, 7:50am

e.g

TestFile

100 NEWYORK SALES APPR
200 LA MARKETING TRRPSS
300 ROME RECEP TTLS

Desired output:-

1 - 19
2 - 20
3 - 16

(Space woud not be include)

devtakh · April 12, 2010, 7:55am

Try this:

awk '{gsub(" ","");print NR,"-",length($0)}' filename

cheers,
Devaraj Takhellambam

ygemici · April 12, 2010, 9:13am

you can try this

while read line; do a=` echo $line | sed 's/ //g'`  ; echo "${#a}"; done <TestFile

clx · April 12, 2010, 9:54am

also,

while read line ; do echo $line | tr -cd '[:alnum:]' | wc -c; done < file

durden_tyler · April 12, 2010, 9:58am

And a Perl solution -

$
$ cat -n testfile
     1  100 NEWYORK SALES APPR
     2  200 LA MARKETING TRRPSS
     3  300 ROME RECEP TTLS
$
$ perl -lne 's/ //g; print "$. - ",scalar length $_' testfile
1 - 19
2 - 20
3 - 16
$

tyler_durden

alister · April 12, 2010, 10:04am

As far as we know, the only thing to be excluded is a space (and linefeed). The complement of the alnum class will exclude a lot more than that.

Regards,
Alister

fubaya · April 12, 2010, 11:26am

Cant test it right now but this should work:

while read line; do line2=${line// /}; echo ${#line2}; done < file

Edit: heres a test

$ echo "123 45 678 90" | while read line ; do line=${line// /}; echo ${#line}; done
10
$

alister · April 12, 2010, 11:31am

Oh! So close to a pure sh solution, if not for that sed We can use the shell's field splitting to get rid of the spaces for us. My attempt at a posix-compliant sh solution:

#!/bin/sh

oifs=$IFS
IFS=' '

i=0
while read -r s; do
    set -- $s
    s=''
    for w; do
        s="$s$w"
    done
    echo $((++i)) - ${#s}
done < "$1"

IFS=$oifs
unset -v i oifs s w

Sample run:

$ cat data
100 NEWYORK SALES APPR
200 LA MARKETING TRRPSS
300 ROME RECEP TTLS

$ ./count.sh data
1 - 19
2 - 20
3 - 16

Sourcing Tangent

If your shell's source/dot/. command assigns positional parameters, then a subshell is not necessary. Before finishing, the script will restore IFS and unset any variables it has set (although if they [i,oifs,s,w] existed prior to the script's execution, they will have been stomped and unset).

bash 2.05b assigns the positional parameters, but its source command will stomp the current shell's positional parameters.

ksh93 also assigns the positional parameters, but its source command does not stomp the current shell's positional parameters (in this respect, it's more function-like, as the original values are still there when the sourced script has completed).

In either shell, if no argument is passed to the sourced script, the current shell's $1 is visible and used (unlike a function invocation, where it would be unset).

Bash 2.05 source run:

$ set dontexist

$ echo $1
dontexist

$ . count.sh 
-bash: dontexist: No such file or directory

$ . count.sh data
1 - 19
2 - 20
3 - 16

$ echo $1
300

300 is the first word in the final line, the last value assigned to $1 before the script finished.

ksh93 source run:

$ set dontexist

$ echo $1
dontexist

$ . count.sh              
ksh: .: count.sh: cannot open [No such file or directory]

$ . ./count.sh
ksh: .: line 13: dontexist: cannot open [No such file or directory]

$ . ./count.sh data
1 - 19
2 - 20
3 - 16

$ echo $1
dontexist

Aside from the preservation of $1, mentioned earlier, note that ksh will not source code that is not in $PATH unless the command name contains a forward slash. bash 2.05b did (perhaps the current version no longer does), which isn't a good thing.

Before I finish, I just want to say that I am not disparaging the bash shell. I am simply pointing out some of the differences between the source implementation of two shells.

Regards,
Alister