Hi there,
first of all this is not homework...this is a new type of exercise for practicing vocabulary with my students.
I have a file consisting of two columns, separated by a tab, each line consisting of a word and its definition, separated by a line break.
What i need is to replace a number of random letters of the defined word with an underscore. The number of letters would depend on the length of the word, but half its number would be ok...ideas?
a true � of Islam \t follower
the recent � of two CIA agents \t disappearance
The restructuring is designed to give a sharper � on key markets. \t focus
a large country house with beautiful landscaped � \t gardens
OUTPUTFILE
a true � of Islam \t fo _ _o_er
the recent � of two CIA agents \t d_ _a_ _ear_ ce
The restructuring is designed to give a sharper � on key markets. \t f c_s
a large country house with beautiful landscaped � \t _ar_e_s
zeroth approximation .. you need to eliminate the leading spaces in $2, no check is done to not to replace an already set "_" with another one, nor replacement of adjacent characters. Try
awk -F"\t" '
{LEN = split ($2, T, "")
$2 = ""
for (i=1; i<=LEN/2; i++) T[int(rand()*LEN)] = "_"
for (i=1; i<=LEN; i++) $2 = $2 T
}
1
' file
a true � of Islam fol__w_r
the recent � of two CIA agents di_app_a___ce
The restructuring is designed to give a sharper � on key markets. f_c_s
a large country house with beautiful landscaped � ___de_s
Extract a few of the offending lines into a small file and run the script several times - there should be varying substitutions; at least there are when I do.
As said in the beginning - it's an approximation to be refined if further conditions need to be met.
I did a thorough analysis of the script and its results. There's no systematic error ignoring certain / certain length words. The algorithm to select characters is a rudimetary one and has its flaws as pointed out before. By defining more sophisticated rules on how to use / evaluate / improve the random number generation and application (no zero target, no double substitution), you could certainly stabilize the results, but to what avail?
I'm OK to close the thread.