Find position of character in multiple strings in a file

Twinklefingers · July 1, 2012, 5:11pm

Greetings.

I have a file with information like this:

AMNDHRKEOEU?AMNDHRKEOEU?AMNDHRKEOEU?AMNDHRKEOEU?
AMNDHRKEEU?AMNDHREOEU?
AMNDHREU?AHRKEOEU?AMNDHRKEU?AMNDKEOEU?

What I need to extract is the position, in every line, of every occurrence of '?'

A desired output would be something like:

1:12, 24, 36, 48
2:11, 22
3:9, 18, 28, 38

I'm using the bash shell

alister · July 1, 2012, 5:46pm

One possible approach: you can use AWK with the field separator set to ? . Then, for each record, inspect the number of fields and their lengths to determine where the ? characters occur.

Analogously, you can do the same thing in the shell using IFS, read, the set builtin, a for-loop, and $#.

Regards,
Alister

complex.invoke · July 1, 2012, 10:02pm

awk '{printf("%d:",NR);len=0;for(i=1;i!=NF;++i){len+=length($i)+1;printf("%d ",len)}printf("\n")}' FS="?" infile

or

awk '{printf("%d:",NR)}{for(i=1;i<=NF;++i)if($i == "?")printf("%d ",i)}{printf("\n")}' FS="" infile

Twinklefingers · July 1, 2012, 10:15pm

Thanks, huaihaizi3. The first one worked perfectly!

alister · July 2, 2012, 2:30am

Just in case, be aware that should a blank line occurr in the data, the initial value of i will be greater than NF and the highlighted expression will never be true, an infinite loop (or at least a loop that runs until a field size limit is triggered by $i ). Using < should fix it.

Regards,
Alister

jayan_jay · July 2, 2012, 3:29am

a="AMNDHRKEOEU?AMNDHRKEOEU?AMNDHRKEOEU?AMNDHRKEOEU?"
$ echo $a | fold -w 1 | grep -n "?"

Scrutinizer · July 2, 2012, 3:51am

Also to get rid of the spurious space at the end or to be able to specify an output field separator, we could use something like this:

awk -F? '{p=0; for(i=1;i<NF;i++)s=(i==1?NR ":":s OFS)(p+=length($i)+1)} p{print s}' OFS=", " infile