awk - function to return permutations of n items out of m

cjnwl · January 9, 2019, 11:15am

Hi,

I'm trying to write an awk function that returns all possible permutations of n items chosen in a list of m items. For example, given the input "a,b,c,d,e" and 3, the function should return the following :

a a a
a a b
a a c
a b a
a b b
...
c a a
c a b
...
e e c
e e d
e e e

(125 lines = 5 ^^ 3).

I've managed to write a function with nested loops that works, but only for a predetermined depth. I can't seem to manage the recursion if I want n to be variable.

Also, an added difficulty is that I need the results in an array for further processing in the same script, not just output to the console.

Any help would be much appreciated.
Chris

Neo · January 9, 2019, 11:18am

Then. please post your function and script(s) what you wrote so we can help you.

cjnwl · January 9, 2019, 11:28am

Thanks for your prompt response. I didn't post my code as it's not really much use. I've also searched this forum, but none of the threads about combinations and/or permutations does exactly what I need.

Here's what I have so far. It's not in a function as yet, and the loops are hard-coded two deep. The variable len isn't used at all.

BEGIN {
  list = "a,b,c,d,e"
  len = 3
  sep = ","
  n = split(list,words,sep)
#  for (i in words) print i, words
  for (i=1;i<=n;i++) {
    for (j=1;j<=n;j++) {
	  print words, words[j]
	}
  }
}

Ultimately, I need to be able to access and manipulate the results in an array.

Thanks,
Chris

joker · January 9, 2019, 12:22pm

Some recursion will help...

#!/usr/bin/awk -f

function perm(list,len,current_set,      word) {
  if(len==1)
    for(word in list)
      print current_set list[word]
  else
    for(word in list)
      perm(list,len-1,current_set list[word] " ")
}

BEGIN {
  list = "a,b,c,d,e"
  sep = ","
  n = split(list,words,sep)
  perm(words,4)
}

Scrutinizer · January 9, 2019, 12:26pm

Another variation:

awk -v n=3 -v inp="a,b,c,d,e" '
  function perm(p,s,	i) {
    for(i=1;i<=n;i++)
      if(p==1)
        printf "%s%s\n",s,A
      else
        perm(p-1,s A" ")
  }

  BEGIN {
    split(inp,A,/,/)
    perm(n)
  }
'

joker · January 9, 2019, 12:46pm

@Scrutinizer:
... a small fix (your variant does only use n elements of the list, not all)

awk -v n=3 -v inp="a,b,c,d,e" '
  function perm(p,s,    i) {
   for(i=1;i<=c;i++)
     if(p==1)
       printf "%s%s\n",s,A
      else
        perm(p-1,s A" ")
  }
 
  BEGIN {
    c=split(inp,A,/,/)
    perm(n)
  }
'

vgersh99 · January 9, 2019, 1:05pm

I've modified stomp's suggestion to make it bit more flexible:

function perm(list,len,current_word,depth,      word) {

        # print "len=" len " depth=" depth " current_word=" current_word
        if(len==1) {
                for(word in list) {
                        print current_word " " list[word]
                }
        } else {
                for(word in list) {
                        perm(list,len-1,current_word " " list[word],depth+1)
                }

        }
}

BEGIN {
  list=(!list)? "a,b,c,d,e" : list
  len=(!len) ? 3 : len
  sep = ","
  split(list,words,sep)
  perm(words,len)
}

Sample invocations:

awk -f permute.awk </dev/null
awk -v list='a,b,c,d' -v len=2  -f permute.awk </dev/null
awk -v list='a,b,c,d' -v len=3  -f permute.awk </dev/null
awk -v list='a,b,c,d,e' -v len=3  -f permute.awk </dev/null

nezabudka · January 9, 2019, 1:09pm

echo {a..e}{a..e}{a..e} | xargs -n1

vgersh99 · January 9, 2019, 1:17pm

nice idea, nezabudka.

echo {a..e}{a..e}{a..e} | xargs -n1| sed 's/./& /g'

durden_tyler · January 9, 2019, 1:38pm

A very un-awk-like and more Perl-like script.

I added the sort routine because array iteration doesn't return the elements in order.
(Definitely not a good idea for large arrays.)

Note that the OP wants the results in an array for further processing in the same script.
I'm not sure how to retrieve the array elements in the order that they were inserted.
Maybe a forum member can suggest how to do so.

echo |
awk 'function generate_permutations(cs, item, n, k){
         if (k == 0){
             iter = iter + 1;
             arr[iter] = item;
             return;
         }
         for (i in cs){
             new_item = item""cs;
             generate_permutations(cs, new_item, n, k-1);
         }
     }
     {
          split("", arr, "");
          iter = 0;
          str = "a,b,c,d,e";
          n = split(str, charset, ",");
          k = 3;
          generate_permutations(charset, "", n, k);
          asort(arr);
          for (i=1; i <= iter; i++){
              printf "%s\n", arr;
          }
     }
    '

wisecracker · January 9, 2019, 1:51pm

SUPERB!

(Just one note for viewers though, this is NOT POSIX compliant...)

EDIT:
Modified to give spaces...

Last login: Wed Jan  9 19:12:39 on ttys000
AMIGA:amiga~> text=$( echo {a..e}_{a..e}_{a..e} | xargs -n1 ); echo "${text//_/ }"
a a a
a a b
a a c
a a d
. . .
e e e
AMIGA:amiga~> _

bakunin · January 9, 2019, 2:06pm

Very good idea indeed. You won't need any externals doing it like this:

for p in {a..e}{a..e}{a..e} ; do
     echo "$x"
done

This works in bash as well as ksh93 but not in ksh88 and not in Bourne shell either.

bakunin

nezabudka · January 9, 2019, 2:37pm

Thank you all for an evaluation
Probably there should replace the heavyweight utility "xargs" and wrap everything in a function
like so

#!/bin/bash
prt() {
        str=$(seq -s"{$1..$2}" $3)
        eval echo ${str//[0-9]}
}
prt a e $((3+1)) | fmt -1 | sed 's/./& /g'

Or insert a loop "for in" as suggested above.
In ksh93 works too

wisecracker · January 9, 2019, 3:07pm

nezabudka:

Thank you all for an evaluation
Probably there should replace the heavyweight utility "xargs" and wrap everything in a function
like so
#!/bin/bash
prt() {
   str=$(seq -s"{$1..$2}" $3)
   eval echo ${str//[0-9]}
}
prt a e $((3+1)) | fmt -1 | sed 's/./& /g'
Or insert a loop "for in" as suggested above.
In ksh93 works too

Too many pipes and transient commands, (utilities)...
However $((3+1)) should be $((2+1)) or $((3)) on OSX 10.14.1, default bash terminal...

Last login: Wed Jan  9 19:58:11 on ttys000
AMIGA:amiga~> cd Desktop/Code/Shell
AMIGA:amiga~/Desktop/Code/Shell> bash --version
GNU bash, version 3.2.57(1)-release (x86_64-apple-darwin18)
Copyright (C) 2007 Free Software Foundation, Inc.
AMIGA:amiga~/Desktop/Code/Shell> ./abcde.sh
a a a 
a a b 
a a c 
a a d 
a a e 
a b a 
a b b 
a b c 
. . .
e d d 
e d e 
e e a 
e e b 
e e c 
e e d 
e e e 
AMIGA:amiga~/Desktop/Code/Shell> _

bakunin · January 9, 2019, 3:18pm

Very good idea. Since the following works not only with integers but also with arbitrary sets of characters we don't even need a recursion:

perm ()
{
myset="$1"
n="$2"
buf=""
i=1

while (( i <= n )) ; do
     buf="${buf}${myset}"
     (( i++ ))
done

for x in $buf ; do echo $x ; done

return 0
}

# example calls:
perm "{1,2}" 3

perm "{a,b}" 2

bakunin

/PS: on second thoughts: this is not the permutations but the combinations of the given set, no?

wisecracker · January 9, 2019, 6:19pm

Can this be any simpler?
Single statement, longhand, OSX 10.14.1, default bash shell...

Last login: Wed Jan  9 23:14:09 on ttys000
AMIGA:amiga~> echo $'\n'{a..e}$' '{a..e}$' '{a..e}

a a a 
a a b 
a a c 
a a d 
a a e 
a b a 
a b b 
a b c 
. . . 
e d d 
e d e 
e e a 
e e b 
e e c 
e e d 
e e e
AMIGA:amiga~> _

Don_Cragun · January 9, 2019, 9:01pm

Hi wisecracker and cjnwl,
Note that if you change:

echo $'\n'{a..e}$' '{a..e}$' '{a..e}

to:

printf '%s\n' {a..e}' '{a..e}' '{a..e}

you get rid of the (probably undesired) initial empty line in the output. And to create the desired array of combinations using either a recent bash or ksh93 , one could then use:

OIFS="$IFS"
IFS=$'\n'
array=( $(printf '%s\n' {a..e}' '{a..e}' '{a..e}) )  
IFS="${OIFS}"

And, to verify that it worked, one could then run:

echo "${#array[@]} '${array[0]}' ... '${array[124]}'"

producing the output:

125 'a a a' ... 'e e e'

which shows that it is likely that the array named array contains the desired 125 elements.

nezabudka · January 10, 2019, 12:51am

wisecracker:

Can this be any simpler?
Single statement, longhand, OSX 10.14.1, default bash shell...
Last login: Wed Jan  9 23:14:09 on ttys000
AMIGA:amiga~> echo $'\n'{a..e}$' '{a..e}$' '{a..e}
AMIGA:amiga~> _

It's really very short!

eval echo "$'\n'{$var1..$var2}$' '{$var1..$var2}"

joker · January 10, 2019, 4:50am

scrutinizer:

awk -v n=3 -v inp="a,b,c,d,e" '
  function perm(p,s,    i) {
   for(i=1;i<=c;i++)
   if(p==1)
   printf "%s%s\n",s,A
   else
   perm(p-1,s A" ")
  }

  BEGIN {
   c=split(inp,A,/,/)
   perm(n)
  }
  '

What I'm writing now is important for programs with a lot more code, but I think it has benefits in small (awk) programs in this forum too.

Descriptive code

For me - and probably for others like the partly beginner level thread starters more alike - non-descriptive variable names makes it harder to read the code. I see this here in this forum a lot. I assume brevity of code is some kind of goal involved here.

Writing Code that is easily understandable with little effort is far more important to me. And of course readabilty / maintainibility on one side and efficient, clean and short code on the other side are both important and for both must be found a suitable balance. In real life, other people read my code to, or I myself after a long time again and I'd like to avoid that experience I had in the past: "Oh crap! What did I smoke when I wrote that code?"

As an awk example(Scrutinizers variant above is written much better than the first one): split(a,b,c) vs split(text,result,pattern)

This helps me even if I do not know the syntax of split, so I do not have to check the documentation for split() right away.

So I would like it to have code, I can easily understand. I like a solution more, which I may find in this forum even years after it was posted and I quickly can understand it.

Global Variables

(actually i decided to use introduces a new global variable in my fix) I think it's good design, to avoid global variables whenever possible(See http://wiki.c2.com/?GlobalVariablesAreBad\). For this forum, a solution that does this without globals avoids copy-and-paste problems: "Huu, I take this function, and put it into my code, and I like it, if it just works." and bamm! Variable collision and it crashes or just does not work. Maybe this(not-working-code with side-effects) is a nice to have too in terms of: Make sure you really understand your code, before trying to run it! Do not just copy it! ...but for I appreciate the other way.

cjnwl · January 10, 2019, 7:27am

Thanks Guys, that's exactly what I was looking for

--- Post updated at 12:27 PM ---

Note that the various shell scripts don't reply to the question, which was first of all for awk (as this is part of a much larger script that I've been given to modify), but more importantly, hard-coded the number of elements.