hello,
I had posted earlier help for a script for splitting concatenated words . The script was supposed to read words from a master file and split concatenated words in the slave/input file.
Thanks to the help I got, the following script which works very well was posted. It detects residues by placing a ! before the residual element.
However the script does not take the largest string for splitting which leads to problems.
An example will help:
given that the master file has
narayan
narayana
prakash
aprak
ash
In the case of narayanaprakash, I get:
narayan, aprak and ash
instead of
narayana prakash.
How do I get the script to produce the second instead of the first?
Many thanks for all the earlier help and hope this problem of largest string first can be resolved:
#Util to split names which are conjoined
NR==FNR{a[$1]; next}
function lsr(c,p) {
for(p=length(c);p;p--)
if(tolower(substr(c,1,p)) in a) break;
if (p) return substr(c,1,p);
return "";
}
{while(length) {
s=lsr($0);
if (!s) printf "!";
while (!s && length) {
printf substr($0,1,1);
$0=substr($0,2);
s=lsr($0);
if (s) printf "! ";
}
printf "%s ", s;
$0=substr($0,length(s)+1)
}
printf "\n"; }