Thank you for that code, it seems to work well.
I have checked the output on some patterns up to 5 on bits (5 1s) and it looks correct. As far as I can tell, there should be 2^n output patterns where n=the number of on bits. Do I have that right? For my test pattern of 5 on bits, there are 2^5=25 output patterns.
This is the output organized for simplified analysis,
# for the input pattern
bk_00110000000000110001000
# 4 on bits (change 1 1 to 0)
bk_00010000000000110001000
bk_00100000000000110001000
bk_00110000000000010001000
bk_00110000000000100001000
bk_00110000000000110000000
# 3 on bits (change 2 1s to 0)
bk_000000000000001100011000
bk_00010000000000010001000
bk_00010000000000100001000
bk_00010000000000110000000
bk_00100000000000010001000
bk_00100000000000100001000
bk_00100000000000110000000
bk_00110000000000000001000
bk_00110000000000010000000
bk_00110000000000100000000
# 2 on bits (change 3 1s to 0)
bk_00000000000000010001000
bk_00000000000000100001000
bk_00000000000000110000000
bk_00100000000000010000000
bk_00100000000000100000000
bk_00100000000000000001000
bk_00010000000000010000000
bk_00010000000000100000000
bk_00010000000000000001000
bk_00110000000000000000000
# 1 on bit (change 4 1s to 0)
bk_00000000000000000001000
bk_00000000000000010000000
bk_00000000000000100000000
bk_00010000000000000000000
bk_00100000000000000000000
As far as I can tell, this is what the output should be. Please let me know if anyone sees anything amiss.
The next thing I need to do is to add a function to check each subset generated against a list of subsets and look for matches.
My revised script looks like,
#! /bin/bash
function check_against {
check_string=$1
pattern_match=0
# declare list of patterns to check agains
check_list=( bk_00110000000000000001000 \
bk_00110000000000000001000 \
bk_00010000000000000001000 \
bk_00010000000000000001000 \
bk_00010000000000000001000 \
bk_00010000000000000001000 \
bk_10001010011100000101000 \
bk_00001010111100000101000 \
bk_10001110000000000101010 \
bk_10001110000000110101000 \
bk_11001110000000110101000 \
bk_11110011011100110000000 \
bk_00110000000000110000010 )
# loop through check_list and compare each element to check_string
for check_against_string in "${check_list[@]}"
do
if [ "$check_string" == "$check_against_string" ]; then
pattern_match=$((pattern_match+1))
fi
done
# if any matches were found, output match and number of matches
if [ "$pattern_match" != "0" ]; then
echo -e "$check_string\t$pattern_match"
fi
}
# input string
input_string="${1:-"bk_00110000000000110001000"}"
# capture output of awk into string
subsets_list=$(
echo "$input_string" | awk -F1 '
# Compute 2**p - 1 for p >= 1
function two_e2m1(p, i, v) {
v = 0
for(i = 1; i < p; i++)
v = 2 * v + 1
return(v)
}
NF { printf("%s is input to be processed.\n", $0)
for(i = two_e2m1(NF) - 1; i > 0; i--) {
v = i
for(j = 1; j < NF; j++) {
d[NF - j] = v % 2
v /= 2
}
for(j = 1; j < NF; j++)
printf("%s%d", $j, d[j])
print $NF
}
}'
)
# parse subsets_list on newline
IFS=$'\n' read -rd '' -a check_list <<<"$subsets_list"
for element in "${check_list[@]}"
do
# pass each element to check against function
check_against $element
done
This is a bit of a hack, but I capture the output of Don Cragun's awk code in a string variable and then parse that into an array on newline. I then iterate through the array and pass each element to a function that compares the element against an array and counts the number of matches found. If any matches are found, the matching pattern is printed along with the number of matches.
This gives me the output I need as far as I can tell. Please let me know if anyone sees problems with this approach or has other suggestions.
LMHmedchem