Splitting Chunked-FullNames Nightmare

I've got a problem i'm hoping other more experienced programmers have had to deal with sometime in their careers and can help me: how to get fullnames that were chunked together into one field in an old database into separate more meaningful fields.

I'd like to get the records that nicely fit into the pattern of firstname middleinitial lastname into three fields separated by a colon, and skip the other names that don't fit that pattern until i figure out what to do with them (any suggestions welcome).

GIVEN INPUT:

DONNIE BERG
JERRY M MAGUIRE
D A BROWN
RICHARD N STYLES & FRANK A PERRY
MITCH GARBO & BOBBI MILLS
JUDY & STONE RUFFEY
MRS K H SCHULTZ
JASPER O & SUZI M THOMPSON
DAY FRANKLIN-MIZER
BO & TYRA J SLACK
JERRY B DE TUNA
CHARLES C VICTOR III
DARREN E MC FANN
TOM E VARBLE JR
MARY W & CAROLYN SMILEY
BAHAMA FARMS TOWNSHIP INC & HERME BLUE INC
STAN N GAIL & HIDDEN VALLEY FARMS
DRIPP CREEK FARM
Y Z & S T OUTTRIM
WM AND VI JOYNER SALES
A C A SALES
ADDONIS SYNDICATE & LOWLAND MEADOW

DESIRED OUTPUT:

DONNIE: :BERG
JERRY:M:MAGUIRE
D:A:BROWN
RICHARD:N:STYLES:FRANK:A:PERRY
MITCH: :GARBO:BOBBI: :MILLS
JUDY: :RUFFEY:STONE:RUFFEY
K:H:SCHULTZ
JASPER:O:THOMPSON:SUZI:M:THOMPSON
DAY: :FRANKLIN-MIZER
BO: :SLACK:TYRA:J:SLACK
JERRY:B:DE TUNA
CHARLES:C:VICTOR III
DARREN:E:MC FANN
TOM:E:VARBLE JR
MARY:W:SMILEY:CAROLYN: :SMILEY
BAHAMA FARMS TOWNSHIP INC & HERME BLUE INC
STAN:N:GAIL & HIDDEN VALLEY FARMS
DRIPP CREEK FARM
Y Z & S T OUTTRIM
WM AND VI JOYNER SALES
A C A SALES
ADDONIS SYNDICATE & LOWLAND MEADOW

Try and adapt the following awk program :

# Global Array Desription
#
# Names["cnt"     ] = Names count in names list (input line)
# Names["invalid" ] = 0 if all valid names, 1 otherwise
# Names["list"    ] = Formated names list
#
# Names[n,   "parts"] = Parts       of name n in list
# Names[n,   "first"] = Firstname  for name n in list
# Names[n,  "middle"] = Middlename for name n in list
# Names[n,    "last"] = Lastname   for name n in list
# Names[n,    "name"] = Formated       name n in list
# Names[n, "invalid"] = 0 if name n is valid, 1 otherwise
#

#=======================================================================
# F U N C T I O N S . . .
#=======================================================================

#
# set_name(name) - Set name informations
#

function set_name(name    ,parts, p) {

   #
   # Set name parts
   #

   parts = Names[name, "parts"]
   while (1) {

      if (Names[name, parts]   ~ /^(JR|SR)$/     ||
          Names[name, parts]   ~ /^[IVX]+$/      ||
          Names[name, parts-1] ~ /^(DE|MC)$/     ) {
         Names[name, parts-1] = Names[name, parts-1] " " Names[name, parts];
         parts--;
         continue;
      }

      if (Names[name, 1] ~ /^(MR|MRS|MS)$/) {
         for (p=2; p<=parts; p++)
            Names[name, p-1] = Names[name, p];
         parts--;
         continue;
      }

      break;
   }
   Names[name,   "parts"] = parts;
   Names[name, "invalid"] = 0;


   #
   # Set name components
   #

   if (parts == 3) {
      if (length(Names[name, 2]) > 1) {
         Names[name, "invalid"] = 1;
      } else {
         Names[name,  "first"] = Names[name, 1];
         Names[name, "middle"] = Names[name, 2];
         Names[name,   "last"] = Names[name, 3];
      }
   } else if (parts == 2) {
      Names[name,  "first"] = Names[name, 1];
      if (length(Names[name, 2]) == 1 && name < Names["cnt"]) {
         Names[name, "middle"] = Names[name, 2];
         Names[name,   "last"] = Names[name+1, "last"];
      } else {
         Names[name, "middle"] = " ";
         Names[name,   "last"] = Names[name, 2];
      }
   } else if (parts == 1) {
      if (name < Names["cnt"]) {
         Names[name,  "first"] = Names[name, 1];
         Names[name, "middle"] = " ";
         Names[name,   "last"] = Names[name+1, "last"];
      } else
         Names[name, "invalid"] = 1;
   } else
      Names[name, "invalid"] = 1;

   Names["invalid"] += Names[name, "invalid"];

   #
   # Format name
   #

   if (Names[name, "invalid"]) {
      Names[name, "name"] = "";
      for (p=1; p<=parts; p++)
         Names[name, "name"] = Names[name, "name"] (p>1 ? " " : "") Names[name, p];
   } else {
      Names[name, "name"] = Names[name, "first"] ":" Names[name, "middle"] ":" Names[name, "last"];
   }

}

#
# split_list() - Split input names list
#

function split_list(    f ,cnt ,parts) {

   cnt   = 1;
   parts = 0;

   for (f=1; f<=NF; f++) {
      if ($f != "&") {
         Names[cnt, ++parts] = $f
      } else {
         Names[cnt, "parts"] = parts;
         parts = 0;
         cnt++;
      }
   }
   Names[cnt, "parts"] = parts;

   Names[    "cnt"] = cnt;
   Names["invalid"] = 0;
   Names[   "list"] = "";

}

#
# set_list() - Format names list
#

function format_list(    name ,list ,sep) {

   list = "";
   sep = (Names["invalid"] ? " & " : ":");
   for (name=1; name<=Names["cnt"]; name++) {
      list = list (name>1 ? sep : "") Names[name, "name"];
   }
   Names["list"] = list;

}

#
# analyze_list() - Analyze input names list
#

function analyze_list(    n) {
   split_list();
   for (n=Names["cnt"]; n>0; --n) {
      set_name(n);
   }
   format_list();
}

#=======================================================================
# M A I N . . .
#=======================================================================

NF {

   analyze_list();

   print "Input =" $0
   print "Output=" Names["list"];
   print ""

}

Output (with your input sample file):

Input =DONNIE BERG
Output=DONNIE: :BERG

Input =JERRY M MAGUIRE
Output=JERRY:M:MAGUIRE

Input =D A BROWN
Output=D:A:BROWN

Input =RICHARD N STYLES & FRANK A PERRY
Output=RICHARD:N:STYLES:FRANK:A:PERRY

Input =MITCH GARBO & BOBBI MILLS
Output=MITCH: :GARBO:BOBBI: :MILLS

Input =JUDY & STONE RUFFEY
Output=JUDY: :RUFFEY:STONE: :RUFFEY

Input =MRS K H SCHULTZ
Output=K:H:SCHULTZ

Input =JASPER O & SUZI M THOMPSON
Output=JASPER:O:THOMPSON:SUZI:M:THOMPSON

Input =DAY FRANKLIN-MIZER
Output=DAY: :FRANKLIN-MIZER

Input =BO & TYRA J SLACK
Output=BO: :SLACK:TYRA:J:SLACK

Input =JERRY B DE TUNA
Output=JERRY:B:DE TUNA

Input =CHARLES C VICTOR III
Output=CHARLES:C:VICTOR III

Input =DARREN E MC FANN
Output=DARREN:E:MC FANN

Input =TOM E VARBLE JR
Output=TOM:E:VARBLE JR

Input =MARY W & CAROLYN SMILEY
Output=MARY:W:SMILEY:CAROLYN: :SMILEY

Input =BAHAMA FARMS TOWNSHIP INC & HERME BLUE INC
Output=BAHAMA FARMS TOWNSHIP INC & HERME BLUE INC

Input =STAN N GAIL & HIDDEN VALLEY FARMS
Output=STAN:N:GAIL & HIDDEN VALLEY FARMS

Input =DRIPP CREEK FARM
Output=DRIPP CREEK FARM

Input =Y Z & S T OUTTRIM
Output=Y:Z:OUTTRIM:S:T:OUTTRIM

Input =WM AND VI JOYNER SALES
Output=WM AND VI JOYNER SALES

Input =A C A SALES
Output=A C A SALES

Input =ADDONIS SYNDICATE & LOWLAND MEADOW
Output=ADDONIS: :SYNDICATE:LOWLAND: :MEADOW

Jean-Pierre.

Jean-Pierre, thank-you so much! Your program successfully splits the bulk of the 38,000 chunked-names i have to change. I can't thank-you enough for this code-gift! You've turned my nightmare into nothing more than a bad dream....