Choose a single word from a wordlist by matching the letters it contains?

hakermania · March 13, 2010, 4:57am

Hi, I have a text file with 1500 words. Could it be a script that will keep the words that only have all these letters:
n i o m s c t a
If you could show me the way I would be greatful!

malcomex999 · March 13, 2010, 5:15am

Can you show example input file and desired output?

jim_mcnamara · March 13, 2010, 5:27am

Consider regular expressions:
^[niomscta]+$ This is a character class that contains just these letters made of one or more letters.

    grep   '^[niomscta]+$' wordfile

dennis.jacob · March 13, 2010, 8:39am

Something like this?

perl -lane 'print join " ", grep { /\b[niomscta]+\b/ } (split / /);  ' file

/home/usr1 >cat file
sdafsa fljds dfs dsf niomscta   nio msd
sad nidso scata niow nioscata
sdasda sdjf niom
miom
fdsssdsf   sct
df

/home/usr1 >perl -lane 'print join " ", grep { /\b[niomscta]+\b/ } (split / /,$_);  ' file
niomscta nio
scata nioscata
niom
miom
sct

hakermania · March 13, 2010, 10:35am

Well the wordlist has words like
html:)
121212
131313
123123
654321
8675309
666666
696969
888888
1234567
21122112
12345678
asdfjkl;
hal9000
bond007
ncc1701d
ncc1701e
ncc1701
thx1138
a12345
abcd1234

What I need is to find the word that is developed by the letters "n i o m s c t a" like "monastic".I don't need more or less letters (like "maso" or "monasticasctio")

---------- Post updated at 10:35 AM ---------- Previous update was at 10:23 AM ----------

I thought something like this:
#!/bin/sh
lol=`grep '^[n]' /home/alex/Desktop/wordlist.txt`
echo $lol > /home/alex/Desktop/lol
lol1=`grep '^[i]' /home/alex/Desktop/lol`
echo $lol1 > /home/alex/Desktop/lol1
lol2=`grep '^[o]' /home/alex/Desktop/lol1`
echo $lol2 > /home/alex/Desktop/lol2
lol3=`grep '^[m]' /home/alex/Desktop/lol2`
echo $lol3 > /home/alex/Desktop/lol3
lol4=`grep '^[s]' /home/alex/Desktop/lol3`
echo $lol4 > /home/alex/Desktop/lol4
lol5=`grep '^[c]' /home/alex/Desktop/lol4`
echo $lol5 > /home/alex/Desktop/lol5
lol6=`grep '^[t]' /home/alex/Desktop/lol5`
echo $lol6 > /home/alex/Desktop/lol6
lol7=`grep '^[a]' /home/alex/Desktop/lol6`
echo "Your word is: $lol7"

I don't know if this thought is correct...Is it?

dennis.jacob · March 13, 2010, 10:40am

You can still use code snippet provided..However, the shortened code is below for your requirement.

perl -lane 'print if (grep {/\b[niomscta]+\b/} $_);  ' file

hakermania · March 13, 2010, 10:59am

{ perl -lane 'print if (grep {/\b[niomscta]+\b/} $); ' file } doesn't really suits me. I want to take words from the wordlist with 8 characters (niomscta=8 chars) that contain ALL the letters : n i o m s c t a
Examples:
atcsmoin
oinmatcs
matcsoin etc...
Your command { perl -lane 'print if (grep {/\b[niomscta]+\b/} $); ' file }
has this output:

alex@lol-pc:~$ perl -lane 'print if (grep {/\b[niomscta]+\b/} $_);  ' /home/alex/Desktop/wordlist.txt
montana
monica
macintos
action
tomcat
cannon
tinman
nissan
station
samson
tattoo
cccccc
sonics
cosmos
mission
tintin
moomoo

that doesn't suits me at all. I want to do a kind of unscrimbling (hope I wrote it correctly) but not from all the existing words! Only from the wordlist.txt's words!

alister · March 13, 2010, 11:31am

Nothing to see here

hakermania · March 13, 2010, 11:38am

What is this supposed to do?

alex@lol-pc:~/Desktop$ ./pip
alex@lol-pc:~/Desktop$

radoulov · March 13, 2010, 11:51am

perl -nle'BEGIN {
  @_{split / /, shift} = (1) x length($ARGV[0]);
  }
keys %_ == grep $_{$_}, split // and print;
' 'n i o m s c t a' wordlist

hakermania · March 13, 2010, 11:52am

this works perfectly!!! 10000000000000 thxes!!!!!

dennis.jacob · March 13, 2010, 11:57am

hakermania:

{ perl -lane 'print if (grep {/\b[niomscta]+\b/} $); ' file } doesn't really suits me. I want to take words from the wordlist with 8 characters (niomscta=8 chars) that contain ALL the letters : n i o m s c t a
Examples:
atcsmoin
oinmatcs
matcsoin etc...
Your command { perl -lane 'print if (grep {/\b[niomscta]+\b/} $); ' file }
has this output:
alex@lol-pc:~$ perl -lane 'print if (grep {/\b[niomscta]+\b/} $_);  ' /home/alex/Desktop/wordlist.txt
montana
monica
macintos
action
tomcat
cannon
tinman
nissan
station
samson
tattoo
cccccc
sonics
cosmos
mission
tintin
moomoo
that doesn't suits me at all. I want to do a kind of unscrimbling (hope I wrote it correctly) but not from all the existing words! Only from the wordlist.txt's words!

I misunderstood. Below one is the corrected one.

perl -lane 'print   if (join("",sort (split //)) eq "acimnost");' file

alister · March 13, 2010, 12:03pm

The following verifies that each word contains the same amount of unique characters as are listed in letters. If that requirement is meant, then the word is filtered through tr -d. If the list of unique characters in the word matches, an empty string results and the word is printed.

#!/bin/sh

letters=niomscta
length=${#letters}

while read word; do
    [ $length -ne $(echo $word | sed 's/\(.\)/\1 /g' | tr -s ' ' '\n' | sort | uniq | wc -l) ] && continue
    [ -z "$(echo $word | tr -d $letters)" ] && echo $word
done < data

Determine a word's unique characters by splitting the word into one character per line, sorting, uniq'ing, reassembling into a string and then comparing to the sorted list of allowed letters.

#!/bin/sh

# letters must be sorted
letters=acimnost

while read word; do
    uniqchars=$(echo $word | sed 's/\(.\)/\1 /g;y/ /\n/' | sort | uniq | paste -s - | tr -d '\t')
    [ "$letters" = "$uniqchars" ] && echo $word
done < data

These solutions allow allowed letters to appear more than once. This can be easily modified by removing uniq from the pipeline.

Alister

radoulov · March 13, 2010, 1:17pm

Excellent point!
This should handle it too:

perl -nle'BEGIN {
  @_{split / /, shift} = (1) x length($ARGV[0]);
  }
keys %_ <= grep $_{$_}, split // and print;
' 'n i o m s c t a' infile

---------- Post updated at 07:17 PM ---------- Previous update was at 06:17 PM ----------

Based on dennis.jacob's and alister's ideas, with the almighty Z-Shell

letters=(n i o m s c t a)
letters=${(j::)${(uo)letters}}
  
while read -r; do
  [[ ${(j::)${(ous::)REPLY}} == $letters ]] &&
    print -- $REPLY
done<wordlist

hakermania · March 13, 2010, 1:53pm

radoulov:

Excellent point!
This should handle it too:
perl -nle'BEGIN {
  @_{split / /, shift} = (1) x length($ARGV[0]);
  }
keys %_ <= grep $_{$_}, split // and print;
' 'n i o m s c t a' infile 
---------- Post updated at 07:17 PM ---------- Previous update was at 06:17 PM ----------

Based on dennis.jacob's idea, with the almighty Z-Shell
letters=(n i o m s c t a)
letters=${(j::)${(uo)letters}}
  
while read -r; do
  [[ ${(j::)${(ous::)REPLY}} == $letters ]] &&
   print -- $REPLY
done<wordlist

Thank you very much but I have this bug:
Let's say that in the wordlist is the word "1a2b3c"
Running your first program will have output:
alex@lol-pc:~/Desktop$ ./ty
121212
131313
123123
21122112
abcd1234
1a2b3c
cccccc
alex@lol-pc:~/Desktop$
which contains the 1a2b3c but not only this. Can you fix this?

radoulov · March 13, 2010, 2:48pm

Yes,
based on dennis.jacob's and alister's ideas again:

perl -nle'BEGIN {
  $l = join "", sort grep !$_{$_}++ , split / /, shift;
  }
%_ = (); $w = join "", sort grep !$_{$_}++ , split //;
print if $w eq $l;
' '1 a 2 b 3 c' wordlist

If you prefer to pass the chars without spaces:

perl -nle'BEGIN {
  $l = join "", sort grep !$_{$_}++ , split //, shift;
  }
%_ = (); $w = join "", sort grep !$_{$_}++ , split //;
print if $w eq $l;
' '1a2b3c' wordlist

alister · March 13, 2010, 3:24pm

I may have lost track of the requirements, so to not waste anyone's time I will state what the following code accomplishes: It reads one word lines and compares those words against a list of allowable letters. If the word consists of nothing but those letters, possibly occuring more than once, the word is considered a match and is printed.

As simple as I could manage without using awk or perl (I have nothing but the utmost respect for those tools, but sometimes I'm a masochist ;)):

#!/bin/sh

l=acimnost  #allowed letters must be sorted
while read w; do
    [ $l = "$(echo $w | sed 's/./& /g;y/ /\n/' | sort | tr -ds \\n $l)" ] && echo $w
done < data

Parameterizing it to accept the list of letters and the wordlist file as arguments is trivial and left as an exercise to whomever wishes to do it.

Cheers,
Alister

radoulov · March 13, 2010, 4:13pm

Yep,
alister's shell solution works too, the only drawback, in my opinion, is the performance (too many forks/execs per record in the pipeline).

alister · March 13, 2010, 4:36pm

Hi, radoulov:

Sir, how dare you doubt the performance of my code. I will have you know that my shell is faster than a speeding bullet. heheheh

Seriously, though, performance and efficiency were most definitely not a priority in that code's design. It was just a sh exercise for fun.

Cheers,
Alister

radoulov · March 13, 2010, 5:19pm

Sure, sorry
If I have a time later, I'll try with POSIX shell too.