I'm trying to write an awk function that returns all possible permutations of n items chosen in a list of m items. For example, given the input "a,b,c,d,e" and 3, the function should return the following :
a a a
a a b
a a c
a b a
a b b
...
c a a
c a b
...
e e c
e e d
e e e
(125 lines = 5 ^^ 3).
I've managed to write a function with nested loops that works, but only for a predetermined depth. I can't seem to manage the recursion if I want n to be variable.
Also, an added difficulty is that I need the results in an array for further processing in the same script, not just output to the console.
Thanks for your prompt response. I didn't post my code as it's not really much use. I've also searched this forum, but none of the threads about combinations and/or permutations does exactly what I need.
Here's what I have so far. It's not in a function as yet, and the loops are hard-coded two deep. The variable len isn't used at all.
BEGIN {
list = "a,b,c,d,e"
len = 3
sep = ","
n = split(list,words,sep)
# for (i in words) print i, words
for (i=1;i<=n;i++) {
for (j=1;j<=n;j++) {
print words, words[j]
}
}
}
Ultimately, I need to be able to access and manipulate the results in an array.
I added the sort routine because array iteration doesn't return the elements in order.
(Definitely not a good idea for large arrays.)
Note that the OP wants the results in an array for further processing in the same script.
I'm not sure how to retrieve the array elements in the order that they were inserted.
Maybe a forum member can suggest how to do so.
echo |
awk 'function generate_permutations(cs, item, n, k){
if (k == 0){
iter = iter + 1;
arr[iter] = item;
return;
}
for (i in cs){
new_item = item""cs;
generate_permutations(cs, new_item, n, k-1);
}
}
{
split("", arr, "");
iter = 0;
str = "a,b,c,d,e";
n = split(str, charset, ",");
k = 3;
generate_permutations(charset, "", n, k);
asort(arr);
for (i=1; i <= iter; i++){
printf "%s\n", arr;
}
}
'
(Just one note for viewers though, this is NOT POSIX compliant...)
EDIT:
Modified to give spaces...
Last login: Wed Jan 9 19:12:39 on ttys000
AMIGA:amiga~> text=$( echo {a..e}_{a..e}_{a..e} | xargs -n1 ); echo "${text//_/ }"
a a a
a a b
a a c
a a d
. . .
e e e
AMIGA:amiga~> _
Too many pipes and transient commands, (utilities)...
However $((3+1)) should be $((2+1)) or $((3)) on OSX 10.14.1, default bash terminal...
Last login: Wed Jan 9 19:58:11 on ttys000
AMIGA:amiga~> cd Desktop/Code/Shell
AMIGA:amiga~/Desktop/Code/Shell> bash --version
GNU bash, version 3.2.57(1)-release (x86_64-apple-darwin18)
Copyright (C) 2007 Free Software Foundation, Inc.
AMIGA:amiga~/Desktop/Code/Shell> ./abcde.sh
a a a
a a b
a a c
a a d
a a e
a b a
a b b
a b c
. . .
e d d
e d e
e e a
e e b
e e c
e e d
e e e
AMIGA:amiga~/Desktop/Code/Shell> _
Very good idea. Since the following works not only with integers but also with arbitrary sets of characters we don't even need a recursion:
perm ()
{
myset="$1"
n="$2"
buf=""
i=1
while (( i <= n )) ; do
buf="${buf}${myset}"
(( i++ ))
done
for x in $buf ; do echo $x ; done
return 0
}
# example calls:
perm "{1,2}" 3
perm "{a,b}" 2
bakunin
/PS: on second thoughts: this is not the permutations but the combinations of the given set, no?
Can this be any simpler?
Single statement, longhand, OSX 10.14.1, default bash shell...
Last login: Wed Jan 9 23:14:09 on ttys000
AMIGA:amiga~> echo $'\n'{a..e}$' '{a..e}$' '{a..e}
a a a
a a b
a a c
a a d
a a e
a b a
a b b
a b c
. . .
e d d
e d e
e e a
e e b
e e c
e e d
e e e
AMIGA:amiga~> _
Hi wisecracker and cjnwl,
Note that if you change:
echo $'\n'{a..e}$' '{a..e}$' '{a..e}
to:
printf '%s\n' {a..e}' '{a..e}' '{a..e}
you get rid of the (probably undesired) initial empty line in the output. And to create the desired array of combinations using either a recent bash or ksh93 , one could then use:
What I'm writing now is important for programs with a lot more code, but I think it has benefits in small (awk) programs in this forum too.
Descriptive code
For me - and probably for others like the partly beginner level thread starters more alike - non-descriptive variable names makes it harder to read the code. I see this here in this forum a lot. I assume brevity of code is some kind of goal involved here.
Writing Code that is easily understandable with little effort is far more important to me. And of course readabilty / maintainibility on one side and efficient, clean and short code on the other side are both important and for both must be found a suitable balance. In real life, other people read my code to, or I myself after a long time again and I'd like to avoid that experience I had in the past: "Oh crap! What did I smoke when I wrote that code?"
As an awk example(Scrutinizers variant above is written much better than the first one): split(a,b,c) vs split(text,result,pattern)
This helps me even if I do not know the syntax of split, so I do not have to check the documentation for split() right away.
So I would like it to have code, I can easily understand. I like a solution more, which I may find in this forum even years after it was posted and I quickly can understand it.
Global Variables
(actually i decided to use introduces a new global variable in my fix) I think it's good design, to avoid global variables whenever possible(See http://wiki.c2.com/?GlobalVariablesAreBad\). For this forum, a solution that does this without globals avoids copy-and-paste problems: "Huu, I take this function, and put it into my code, and I like it, if it just works." and bamm! Variable collision and it crashes or just does not work. Maybe this(not-working-code with side-effects) is a nice to have too in terms of: Make sure you really understand your code, before trying to run it! Do not just copy it! ...but for I appreciate the other way.
Thanks Guys, that's exactly what I was looking for
--- Post updated at 12:27 PM ---
Note that the various shell scripts don't reply to the question, which was first of all for awk (as this is part of a much larger script that I've been given to modify), but more importantly, hard-coded the number of elements.