Count consecutive characters

Need to count consecutive characters in a string and give the output as below
i/p= aaaabbcaa
o/p= a4b2c1a2

Any attempts / ideas / thoughts from your side?

Tried something like this, but I guess this will work only for first identical characters (a) and will not work for b.

i/p=aaaabbcaa
length=${#i/p} 

for (i=0;i<$length;i++)
do
tmp=""
character=${ip:"$i"}
	if [[ $character != $tmp ]]
	then
	o/p=$character
	tmp=$character
	else
	o/p=$character$i
	fi
done

Hmmm - there's quite some syntax errors in your script (assuming your (unmentioned) shell is bourne type, e.g. bash , or ksh ), obviously logic error(s) as well. Would you mind to use an awk solution?

awk  '
        {LAST = $1
         CNT = 0
         for (i=1; i<=NF; i++)  {if ($i == LAST) CNT++
                                 else           {printf "%s%d", LAST, CNT
                                                 CNT = 1
                                                }
                                 LAST = $i
                                }
         printf "%s%d\n", LAST, CNT
        }
' FS="" file
a4b2c1a2

given your awk version allows for a zero length field separator yielding every single char in the input line as a field of its own .

1 Like
echo aaaabbcaa | fold -w 1 | uniq -c | awk '{l=l$2$1} END {print l}'
3 Likes

Here is a bash solution.

Note slash ( / ) is not allowed as a part of variable names, so I renamed i/p and o/p to ip and op respectively

ip=aaaabbcaa

for((i=0; i<${#ip}; i++))
do
    ((found++))
    character=${ip:i:1}
    nextchar=${ip:i+1:1}
    if [[ $character != $nextchar ]]
    then
        op=$op$character$found
        found=0
    fi
done
echo $op
1 Like
ip="aaaabbcaa"
op=$(echo "$ip" | awk '{while (/./) {c=substr($0, 1, 1); match($0, c "*", a); printf c RLENGTH; sub(a[0], "")}}')
echo "$op"

Another one:

echo aaaabbcaa | sed 's/\(.\)\1*/& /g' | awk '{for(i=1; i<=NF; i++) $i=substr($i,1,1) length($i)}1' OFS=

Nice idea.

However I would avoid using match() and sub() as these will interpret regex characters. [ in the input causes fatal Unmatched error and ? or . characters cause incorrect output.

$ ip="aaa"
$ op=$(echo "$ip" | awk '{while (/./) {c=substr($0, 1, 1); match($0, c "*", a); printf c RLENGTH; sub(a[0], "")}}')
awk: cmd. line:1: (FILENAME=- FNR=1) fatal: Unmatched [, [^, [:, [., or [=: /[*/

$ ip="aaa???bbb"
$ op=$(echo "$ip" | awk '{while (/./) {c=substr($0, 1, 1); match($0, c "*", a); printf c RLENGTH; sub(a[0], "")}}')
$ echo $op
a3?3?2?1b3

ip="aaa.bbb"
$ op=$(echo "$ip" | awk '{while (/./) {c=substr($0, 1, 1); match($0, c "*", a); printf c RLENGTH; sub(a[0], "")}}')
$ echo "$op"
a3.4
ip=aaaabbcaa
op=
for s in `echo $ip |sed -r 's/((\w)\2*)/\1 /g'; do  op=$op${s:0:1}${#s}; done

# op= a4b2c1a2
1 Like

Nice use of backreferences!
Here is a Posix variant:

#!/bin/sh
 ip=aaaabbcaa
 op=
 for s in `echo "$ip" | sed 's/\(\(.\)\2*\)/\1 /g'`
 do
  del=${s#?}
  op=$op${s%$del}${#s}
 done
 echo "$op"

And a variant of Chubler's post#6:

#!/bin/bash
ip=aaaabbcaa
len=${#ip}
lchar=${ip:0:1}
for((i=1; i<=$len; i++))
do
  ((found++))
  char=${ip:i:1}
  if [[ $char != $lchar ]]
  then
    op=$op$lchar$found
    found=0
    lchar=$char
  fi
done
echo "$op"