Greetings.
I have a file with information like this:
AMNDHRKEOEU?AMNDHRKEOEU?AMNDHRKEOEU?AMNDHRKEOEU?
AMNDHRKEEU?AMNDHREOEU?
AMNDHREU?AHRKEOEU?AMNDHRKEU?AMNDKEOEU?
What I need to extract is the position, in every line, of every occurrence of '?'
A desired output would be something like:
1:12, 24, 36, 48
2:11, 22
3:9, 18, 28, 38
I'm using the bash shell
One possible approach: you can use AWK with the field separator set to ?
. Then, for each record, inspect the number of fields and their lengths to determine where the ?
characters occur.
Analogously, you can do the same thing in the shell using IFS, read, the set builtin, a for-loop, and $#.
Regards,
Alister
awk '{printf("%d:",NR);len=0;for(i=1;i!=NF;++i){len+=length($i)+1;printf("%d ",len)}printf("\n")}' FS="?" infile
or
awk '{printf("%d:",NR)}{for(i=1;i<=NF;++i)if($i == "?")printf("%d ",i)}{printf("\n")}' FS="" infile
1 Like
Thanks, huaihaizi3. The first one worked perfectly!
Just in case, be aware that should a blank line occurr in the data, the initial value of i will be greater than NF and the highlighted expression will never be true, an infinite loop (or at least a loop that runs until a field size limit is triggered by $i
). Using <
should fix it.
Regards,
Alister
a="AMNDHRKEOEU?AMNDHRKEOEU?AMNDHRKEOEU?AMNDHRKEOEU?"
$ echo $a | fold -w 1 | grep -n "?"
Also to get rid of the spurious space at the end or to be able to specify an output field separator, we could use something like this:
awk -F? '{p=0; for(i=1;i<NF;i++)s=(i==1?NR ":":s OFS)(p+=length($i)+1)} p{print s}' OFS=", " infile