Getting confused about passing parameters to perl

JamesForeman · July 6, 2010, 10:25am

I've got a large text file, cleanme, that I want to process for all combinations of words in a second file, commonwords.

So I can iterate through commonwords like so:

filearray=( `cat commonwords | tr '\n' ' '`)
filearray=( `cat commonwords | tr '\n' ' '`)

for firstword in ${filearray[@]}

	do
		for lastword in ${filearray[@]}
		do	
		./word_extractor_script.sh $firstword $lastword
		done

	done

However, when I call word_extractor_script.sh , I don't get quite the output I expect.

word_extractor_script.sh is as follows

perl -p -e 's/^.*?$1/$2/' cleanme > CleanerExtract

sed 's/^/$1 /g' CleanerExtract > CleanerExtract1

sed -n '/$2/p' CleanerExtract1 > CleanerExtract2

sed G CleanerExtract2 > CleanerExtract3

sed -e 's/.*$1//' -e 's/$2.*//' CleanerExtract3 > CleanerExtract4

sed '/^.\{1000\}/d' CleanerExtract4 > OutputShorter_$1_$2

rm CleanerExtract
rm CleanerExtract1
rm CleanerExtract2
rm CleanerExtract3
rm CleanerExtract4

If I replace $1 and $2 in word_extractor_script.sh with actual words, everything runs fine - am I doing something very dumb with the parameters I'm passing to the second script? Or is it something to do with the $ being a metacharacter in the regex and this causing thigns to get confused?

Thanks,

James

commonwords (for this example) could be

cat
dog
elephant

and cleanme could be

There is a cat next to a dog. The dog is next to the cat. The elephant is behind the dog but under the cat which is to one side of the elephant and underneath the dog.

I'd expect CleanerOutput_cat_dog to be produced as one of the results, and contain

which is to one side of the elephant and underneath the

... although instead I get nothing (unlike what happens if I run word_extractor_script.sh with the $1 and $2 replaced with cat and dog).

durden_tyler · July 6, 2010, 10:51am

Change all single-quotes to double-quotes in your shell script, like so -

$
$
$ cat word_extractor_script.sh
perl -p -e "s/^.*?$1/$2/" cleanme > CleanerExtract
 
sed "s/^/$1 /g" CleanerExtract > CleanerExtract1
 
sed -n "/$2/p" CleanerExtract1 > CleanerExtract2
 
sed G CleanerExtract2 > CleanerExtract3
 
sed -e "s/.*$1//" -e "s/$2.*//" CleanerExtract3 > CleanerExtract4
 
sed "/^.\{1000\}/d" CleanerExtract4 > OutputShorter_$1_$2
 
rm CleanerExtract
rm CleanerExtract1
rm CleanerExtract2
rm CleanerExtract3
rm CleanerExtract4
$
$

It should work thereafter -

$
$ # show the contents of file "cleanme"
$ cat cleanme
There is a cat next to a dog. The dog is next to the cat. The elephant is behind the dog but under the cat which is to one side of the elephant and underneath the dog.
$
$ # show the contents of script "word_extractor_script.sh"
$ cat word_extractor_script.sh
perl -p -e "s/^.*?$1/$2/" cleanme > CleanerExtract
 
sed "s/^/$1 /g" CleanerExtract > CleanerExtract1
 
sed -n "/$2/p" CleanerExtract1 > CleanerExtract2
 
sed G CleanerExtract2 > CleanerExtract3
 
sed -e "s/.*$1//" -e "s/$2.*//" CleanerExtract3 > CleanerExtract4
 
sed "/^.\{1000\}/d" CleanerExtract4 > OutputShorter_$1_$2
 
rm CleanerExtract
rm CleanerExtract1
rm CleanerExtract2
rm CleanerExtract3
rm CleanerExtract4
$
$ # execute the script passing "cat" and "dog" as parameters
$ ./word_extractor_script.sh cat dog
$
$ # show the contents of "OutputShorter_cat_dog"
$ cat OutputShorter_cat_dog
 which is to one side of the elephant and underneath the
$
$

In any case, are you trying to do this ?

$
$ perl -plne 's/^.*cat(.*?)dog.*$/$1/' cleanme
 which is to one side of the elephant and underneath the
$

You don't need so many calls to sed in such a case.

tyler_durden

JamesForeman · July 6, 2010, 11:37pm

Thanks.

That single line of perl looks more elegant than what I had: I'll try that tonight.