count and number instances of a character in sed or awk

I currently use LaTeX together with a sed script to set cloze test papers for my students. I currently pepend and equals sign to the front of the words I want to leave out in the finished test, =perpendicular, for example. I am able to number the blanks using a variable in LaTeX. I would like to switch from using LatTeX to using groff typesetting.

My question is this: can I label or replace the equals signs with a running count with sed or awk, or even troff/groff for that matter? If so, how might I do it?

Thanks

A small awk script to preprocess your input:

awk '
    /=/ {
        for( i = 1; i <= NF; i++ )
        {
            if( substr( $(i), 1, 1 ) == "=" )
                $(i) = "__________[" count++ "]";    # corrected
        }
    }
    { print; }
' text-file

Using your original message, I put in some equals

I currently use LaTeX together with a sed script to
set cloze =test papers for my students. I currently pepend
=and equals sign to the front of the words I want to leave out
in the finished =test, =perpendicular, for example. I am
able to number the =blanks using a variable in LaTeX. I
would like to switch from using =LatTeX to using =groff typesetting.

and this is the output:

I currently use LaTeX together with a sed script to 
set cloze __________[0] papers for my students. I currently pepend 
__________[1] equals sign to the front of the words I want to leave out 
in the finished __________[2] __________[3] for example. I am 
able to number the __________[4] using a variable in LaTeX. I 
would like to switch from using __________[5] to using __________[6] typesetting.

Hope this gets you going. (I'm not a *roff heavy, so it might be possible to do at formatting time, I just am not sure.)

---------- Post updated at 12:09 ---------- Previous update was at 12:08 ----------

If you'd like numbering to start with 1, change count++ to ++count .

1 Like

Thank you! Thank you! Thank you. This is exactly the type of thing I was looking for. My sed skills are rudimentary and my awk not even up to that level, but this gives me somewhere to start. I don't know if *roff can handle it at formatting time either, but this may just help me avoid trying to find out -- user/newbie friendly, *roff documentation is not terribly common.

Thanks again!

---------- Post updated at 02:01 AM ---------- Previous update was at 01:37 AM ----------

This seems to be an excellent start, but one small hiccup: running it on the following text gives mostly the desired results:

I am =honored to be with you =today at your commencement =from =one =of =the =finest universities in the world. I never =graduated from =college. Truth be told, this is the =closest I've =ever =gotten to a =college =graduation. Today I want to tell you three stories =from =my =life. That's it. =No =big =deal. Just three stories.

Results in :

I am __________[1] to be with you __________[2] at your commencement __________[3] __________[4] __________[5] __________[6] __________[7] universities in the world. I never __________[8] from __________[9] Truth be told, this is the __________[10] I've __________[11] __________[12] to a __________[9]=graduation. Today I want to tell you three stories __________[3] __________[13] __________[14] That's it. __________[15] __________[16] __________[17] Just three stories.

The __________[18] __________[19] is __________[20] connecting the __________[21]

Things go slightly haywire around the word "graduation." What might be going on?

I think you want something like this:

nawk '/=/{for(i=1;i<=NF;i++) if ($i ~/^=/) $i=("_______[" ++count "]" FS substr($i,2))}1' myFile
2 Likes

@vgersh99

Does the code

($i ~/^=/)

mean, that we search for the ^ beginning = equals sign. If we find it we prepend all the found strings in $i with ________.

Thanks,
jaysunn

Yes, almost. There's only one 'string' in $i. And prepend $i with "_____[' followed by the running tally of 'count' followed by "]" and FS (space in our case).

Thanks for all replies. This seems to work beautifully now:

 awk '
    /=/ {
        for( i = 1; i <= NF; i++ )
        {
            if( substr( $(i), 1, 1 ) == "=" )
                $(i) = "__________[" count++ "]";    # corrected
        }
    }
    { print; }

This text:

I am =honored to be with you =today at your
commencement =from =one =of =the =finest universities
in the world. I never =graduated from =college. Truth
be told, this is the =closest I've =ever =gotten to a
=college =graduation. Today I want to tell you three
stories =from =my =life. That's it. =No =big =deal.
Just three stories.

gives:

I am __________[0] to be with you __________[1] at your
commencement __________[2] __________[3] __________[4] __________[5] __________[6] universities
in the world. I never __________[7] from __________[8] Truth
be told, this is the __________[9] I've __________[10] __________[11] to a
__________[12] __________[13] Today I want to tell you three
stories __________[14] __________[15] __________[16] That's it. __________[17] __________[18] __________[19]
Just three stories.

Sorry for the lack of tags and poor formatting before. This is my first thread/post on your site.

Glad you got it figured out. One of the things that makes this forum great, IMHO, is that everybody jumps in to help. I don't know why I used gsub() rather than a straight assignment this morning, and didn't catch the problem that it would cause until after your post, but the others chimed in with solutions/corrections before I got back to see and fix my code.

As for using code tags which make it easier to read data and programmes, you might have a read over this:
The UNIX and Linux Forums - BB Code List

Cheers

This also works brilliantly.

Given:

I am =honored to be with you =today at your commencement 
=from =one =of =the =finest universities in the world. I 
never =graduated from =college. Truth be told, this is the
=closest I've =ever =gotten to a =college =graduation.
Today I want to tell you three stories =from =my =life.
That's it. =No =big =deal. Just three stories.

this code

nawk '/=/{for(i=1;i<=NF;i++) if ($i ~/^=/) $i=("[" ++count "]__________" FS substr($i,2))}1' filename

gives:

I am [1]__________ honored to be with you [2]__________ today
 at your commencement [3]__________ from [4]__________ one
 [5]__________ of [6]__________ the [7]__________ finest universities 
in the world. I never [8]__________ graduated from [9]__________ college.
 Truth be told, this is the [10]__________ closest I've [11]__________ ever
 [12]__________ gotten to a [13]__________ college [14]__________
 graduation. Today I want to tell you three stories [15]__________ from
 [16]__________ my [17]__________ life. That's it. [18]__________ No
 [19]__________ big [20]__________ deal. Just three stories.

Thanks so much.

BTW, what is the purpose of the 1 after the close braces at the end?

nawk '/=/{for(i=1;i<=NF;i++) if ($i ~/^=/) $i=("[" ++count "]__________" FS substr($i,2))}1'

---------- Post updated at 03:47 PM ---------- Previous update was at 02:59 PM ----------

The successful solution to the original question of how to replace a string beginning with an equals sign with a running question number and a blank space for students to write answers -- "=college" becomes "(8)__________", for example) has led me to a secondary question.:

How might I replace each letter of the word with two underscore and a space -- "=college." becomes "(8)__ __ __ __ __ __ __." Or for even weaker students, "=college." becomes "(8)c __ __ __ __ __ __."