I've got a large text file, cleanme, that I want to process for all combinations of words in a second file, commonwords.
So I can iterate through commonwords like so:
filearray=( `cat commonwords | tr '\n' ' '`)
filearray=( `cat commonwords | tr '\n' ' '`)
for firstword in ${filearray[@]}
do
for lastword in ${filearray[@]}
do
./word_extractor_script.sh $firstword $lastword
done
done
However, when I call word_extractor_script.sh , I don't get quite the output I expect.
word_extractor_script.sh is as follows
perl -p -e 's/^.*?$1/$2/' cleanme > CleanerExtract
sed 's/^/$1 /g' CleanerExtract > CleanerExtract1
sed -n '/$2/p' CleanerExtract1 > CleanerExtract2
sed G CleanerExtract2 > CleanerExtract3
sed -e 's/.*$1//' -e 's/$2.*//' CleanerExtract3 > CleanerExtract4
sed '/^.\{1000\}/d' CleanerExtract4 > OutputShorter_$1_$2
rm CleanerExtract
rm CleanerExtract1
rm CleanerExtract2
rm CleanerExtract3
rm CleanerExtract4
If I replace $1 and $2 in word_extractor_script.sh with actual words, everything runs fine - am I doing something very dumb with the parameters I'm passing to the second script? Or is it something to do with the $ being a metacharacter in the regex and this causing thigns to get confused?
Thanks,
James
commonwords (for this example) could be
cat
dog
elephant
and cleanme could be
There is a cat next to a dog. The dog is next to the cat. The elephant is behind the dog but under the cat which is to one side of the elephant and underneath the dog.
I'd expect CleanerOutput_cat_dog to be produced as one of the results, and contain
which is to one side of the elephant and underneath the
... although instead I get nothing (unlike what happens if I run word_extractor_script.sh with the $1 and $2 replaced with cat and dog).