Common prefix of a list of strings

CarloM · October 24, 2013, 4:08pm

Is there a simple way to find the longest common prefix of a space-separated list of strings, optionally by field?

For example, given input:

"aaa_b_cc aaa_b_cc_ddd aaa_b_cc aaa_b_cd"

with no field separator, output:

aaa_b_c

with _ field separator, output:

aaa_b

I have an awk solution which appears to work (although I haven't done much testing):

function get_common_prefix() {
        list="$1"
        sep="$2"

        printf "$list" | awk '(NR==1) {pcount=split($0,prefix)}
                               (NR>1) {for (i=pcount;i>0;i--) {if ($i!=prefix) {pcount=i-1}}}
                                END {NF=pcount;print}' RS=' ' FS=$sep OFS=$sep
}

myprefix=$(get_common_prefix "$1" $2)

printf "[%s]\n" $myprefix

Searching didn't come up with anything more elegant that could handle both by character and by field. So, just wondering if the forum had any better solutions.

EDIT: The above (cygwin) doesn't seem to work very well on (non-gawk) AIX 6.1. Seems you can't fiddle with NF in END the way I've doing above (although just printing pcount fields of prefix works), and having a blank field separator seems equivalent to whitespace (i.e. it won't split by character).

RudiC · October 24, 2013, 4:47pm

Not sure if this should be considered more elegant, and it doesn't take into account field separators, but it would do the first job:

 A=(aaa_b_cc aaa_b_cc_ddd aaa_b_cc aaa_b_cd)
for ((i=1;i<=${#A[0]}; i++))                            # for the entire first string
  do for ((j=0;j<${#A[@]};j++))                         # for all strngs in array
       do P=${A%${A[0]:$i}}                             # get increasing length substring
          [ "${A[$j]#$P}" = "${A[$j]}" ] && break 2     # test if contained in all strings; if not, no CP, break out
       done
     CP=$P                                              # keep last valid common prefix
  done
echo $CP
aaa_b_c