KSH - need to write a script to abbreviate a string

Dear Members,

I have to write a UNIX script (KSH) to create a 6 letter abbreviation from a given string. The string will have alphabets and underscores only.

e.g. abc_pst_xyz is our string and I have to create an abbreviation which will look like 'abpsxy' or 'abcpyz' etc.

Now comes the tricky part. I have to make sure that out of the given set of strings no two strings have the same abbreviation.

Basically, I need to search a particular directory and see if such an abbreviation already exists. If it does then, I have to recreate a new set of 6 letter abbreviation from the string.

Please help me with this script. Also, help me understand the logic to generate or select random characters from a string.

Thank you very much in advance.

The following script generate all abbrevations for a string :

$ cat gen_abbrev.ksh
#!/usr/bin/ksh

echo "$1" | \
nawk -v Abbrev_Len=${2:-6} '

function SplitString(str       ,i ,slen) {
    slen = length(str);
    for (i=1; i<=slen; i++)
        string_array = substr(str, i, 1);
    string_array[0] = slen;
    return slen
}

function buildAbbrev(len, abbrev   ,i ,lstr, char) {
    lstr = string_array[0];
#print "buildAbbrev len=" len, "abbrev=[" abbrev "]", "lstr=" lstr;
    if (len > 0 ) {
       for (i=1; i<=lstr; i++) {
           char = string_array;
           if (char) {
               string_array = ""
               buildAbbrev(len-1, abbrev char);
               string_array = char;
           }
       }
    } else {
        print abbrev;
    }
    return
}

function buildAllAbbrevs(str, labbrev    ,astr) {
    string_array[0] = 0;
    if (SplitString(str) > labbrev) {
        buildAbbrev(labbrev, "");
    } else print "ERR";
}

{
    buildAllAbbrevs($0, Abbrev_Len);
    exit;
}
'

Example:

$ ./gen_abbrev.ksh abcd 3
abc
abd
acb
acd
adb
adc
bac
bad
bca
bcd
bda
bdc
cab
cad
cba
cbd
cda
cdb
dab
dac
dba
dbc
dca
dcb
$

You can use it like that :

gen_abbrev.ksh ${string} 6 |
while read abbreviation
do
     if valid_abbreviation 
     then
         use_abbrevation
         break
     fi
done

Jean-Pierre.

1 Like

Thanks for the script aigles.

However, when I am trying to execute the script with the example you have provided, I am getting the following error:

gen_abbrev.ksh abcd 3

gen_abbrev.ksh[4]: nawk:  not found

Please help me resolve the error.

---------- Post updated at 03:33 PM ---------- Previous update was at 03:31 PM ----------

Thanks for the script aigles.

However, when I am trying to execute the script with the example you have provided, I am getting the following error:

gen_abbrev.ksh abcd 3

gen_abbrev.ksh[4]: nawk:  not found

Please help me resolve the error.

replace nawk by awk

Jean-Pierre.

Thanks. Replacing nawk by awk worked perfectly.
Now my other question is:

In the example you have stated, we get all the possible set of combinations of 3 letter string from 'abcd'.
But if I replace 'abcd' with 'abc_d' then I get set of string which have '' underscore as well. However, I want that whatever be the parent string (including '"), the generated abbreviation should not have an underscore.

How do I eliminate the occurrence of underscore.

Also, I understood the functions that you have created but what does the following line do:

awk -v Abbrev_Len=${2:-6}

Can you please provide me a link where I can get information about awk and how to use it in UNIX scripts?

Thank you very much!

Sets the variable Abbrev_Len inside awk to the value of $2 in the shell. If $2 is absent, assume 6.

1 Like

Replace the SplitString function :

function SplitString(str       ,i ,j ,char ,slen ) {
    slen = length(str);
    for (i=1; i<=slen; i++)
        char = substr(str, i, 1);
        if (char ~ /[0-9a-zA-A]/)
            string_array[j++] = char;
    string_array[0] = j;
    return slen
}

Jean-Pierre.

1 Like

Hi,

I used the alternate splitstring function, but now the script returns nothing.

function SplitString(str       ,i ,j ,char ,slen ) {
    slen = length(str);
    for (i=1; i<=slen; i++)
        char = substr(str, i, 1);
        if (char ~ /[0-9a-zA-A]/)
            string_array[j++] = char;
    string_array[0] = j;
    return slen
}

Any idea?

Sorry, the alternate version of the function SplitString was written too quickly and untested.
A new version (tested) :

function SplitString(str       ,i ,j ,char ,slen ) {
    slen = length(str);
    for (i=1; i<=slen; i++) {
        char = substr(str, i, 1);
        if (char ~ /[0-9a-zA-Z]/)
            string_array[++j] = char;
    }
    string_array[0] = j;
    return j
}

Jean-Pierre.

1 Like