to grep and print their counts

cdfd123 · October 13, 2007, 6:39am

suppose u have a file
ACFCFACCACARCSHFARCVJVASTVAJFTVAJVGHBAJ

another file
A
C
F
R

then output shud be
A= 9
C=7
F=3
R=2

Thanks

radoulov · October 13, 2007, 9:53am

with zsh:

zsh 4.3.4% cat file
ACFCFACCACARCSHFARCVJVASTVAJFTVAJVGHBAJ
zsh 4.3.4% cat file1
A
C
F
R
zsh 4.3.4% <file1 while read;do printf "%s=%d\n" "$REPLY" "${#$(<file)//[^$REPLY]}";done
A=9
C=7
F=4
R=2

P.S. F is 4, not 3

matrixmadhan · October 13, 2007, 12:14pm

one more,

awk 'BEGIN{ while( getline < "firstfile" ) { len = length($0); for ( i=1; i<=len; i++ ) { arr[substr($0, i, 1)]++ } }} { printf "%s=%d\n",  $0, arr[$0] }' secondfile

cdfd123 · October 13, 2007, 12:23pm

radoulov:

with zsh:

zsh 4.3.4% cat file
ACFCFACCACARCSHFARCVJVASTVAJFTVAJVGHBAJ
zsh 4.3.4% cat file1
A
C
F
R
zsh 4.3.4% <file1 while read;do printf "%s=%d\n" "$REPLY" "${#$(<file)//[^$REPLY]}";done
A=9
C=7
F=4
R=2

P.S. F is 4, not 3

sorry not getting with it
if u use shell script or with awk
Thanks
thanks

matrixmadhan · October 13, 2007, 12:38pm

Did you try with what I had posted ?

drl · October 13, 2007, 1:16pm

Hi.

Using standard utilities:

#!/usr/bin/env sh

# @(#) s1       Demonstrate split of line into characters, filter, sort, count.

set -o nounset
echo

debug=":"
debug="echo"

## Use local command version for the commands in this demonstration.

echo "(Versions used in this script displayed with local utility "version")"
version bash sed grep sort uniq

echo

FILE=${1-data1}

sed -e 's/\(.\)/\1\n/g' $FILE |
grep -f data2 |
sort |
uniq -c

exit 0

To produce:

% ./s1

(Versions used in this script displayed with local utility version)
GNU bash, version 2.05b.0(1)-release (i386-pc-linux-gnu)
GNU sed version 4.1.2
grep (GNU grep) 2.5.1
sort (coreutils) 5.2.1
uniq (coreutils) 5.2.1

      9 A
      7 C
      4 F
      2 R

Noting that "F" is indeed 4: using a local utility:

% describe -c data1
    1 lines read
    1 words read
   40 chars read

   39 length of longest line
    1 occurrences
    1 line at which first seen
    1 line at which last seen

   39 length of shortest (non-zero length) line
    1 occurrences
    1 line at which first seen
    1 line at which last seen

  1:1 columns:lines.
       1 newline
       9 A
       1 B
       7 C
       4 F
       1 G
       2 H
       4 J
       2 R
       2 S
       2 T
       5 V

Best wishes ... cheers, drl

ghostdog74 · October 13, 2007, 2:44pm

awk 'BEGIN{FS=""}
 FNR==NR{
     for(i=1;i<=NF;i++){
          a[$i]++
     }
     next;
 }
 { print $1,a[$1]}
' "file" "file2"

radoulov · October 13, 2007, 3:37pm

... and another one

awk 'NR==FNR{p=$0;next}
{r=FS=$0;$0=p;printf "%s=%d\n",r,NF-1}' file1 file2

Use nawk or /usr/xpg4/bin/awk on Solaris.

Example:

% cat file1              
ACFCFACCACARCSHFARCVJVASTVAJFTVAJVGHBAJ
% cat file2              
A
C
F
R
% awk 'NR==FNR{p=$0;next}
{r=FS=$0;$0=p;printf "%s=%d\n",r,NF-1}' file1 file2
A=9
C=7
F=4
R=2

radoulov · October 13, 2007, 3:43pm

Z-Shell is a shell ...

cdfd123 · October 13, 2007, 4:14pm

thanx matrix
it really works....
thanx

drl · October 13, 2007, 5:16pm

Hi.

A brief improvisation on radoulov's inventive zsh script -- move the read outside the loop -- probably a negligible difference for short files, however, could make a difference for longer files (a manifestation of my habit of moving invariant code outside Fortran loops ):

#!/bin/zsh

# @(#) s2       Demonstrate small optimization, avoid repeated reads.

FILE=${1-data1}

echo
echo " Input $FILE:"
cat $FILE

# <data2 while read;do printf "%s=%d\n" "$REPLY" "${#$(<data1)//[^$REPLY]}";done

echo
echo " Results:"
t1=$(<$FILE)
while read
do
  printf "%s=%d\n" "$REPLY" "${#${t1//[^$REPLY]}}"
done <data2

exit 0

Producing:

% ./s2

 Input data1:
ACFCFACCACARCSHFARCVJVASTVAJFTVAJVGHBAJ

 Results:
A=9
C=7
F=4
R=2

cheers, drl

radoulov · October 13, 2007, 6:03pm

Definitely,
just wanted to change my post
(sorry, wrote it quickly :)).

summer_cherry · October 14, 2007, 11:34pm

Hi,
Hope this one can help you.

sed '/^$/d' b > tb
sed '/^$/d' a > ta
awk '{
if (length($1)>3)
for (i=1;i<=length($1);i++)
arr[substr($1,i,1)]++
else
print $1" = "arr[$1]
}
' tb ta
rm tb ta