How to handle grepping variable data containing wildcards?

I have a lot of files with keywords and unique names. I'm using a shell script to refer to a simple pattern file with comma separated values in order to match on certain keywords. The problem is that I don't understand how to handle the wildcard values when I want to skip over the unique names.

Here is an example of the data patterns in the files:

abcd efgh ijkl
abcd efgh ijkl
abcd mnop unique-name01 efgh ijkl
abcd mnop unique-name02 efgh ijkl
abcd efgh ijkl

An example of the pattern file:

abcd efgh ijkl,LABEL001 LABEL002
mnop .* efgh ijkl,LABEL001 LABEL003

The shell script operates like this:

while IFS= read -r fi
  fn=`awk -F, '{print $NF}'`
  fp=`awk -F, '{print $1}'`

  echo "Pattern: \"$fp\""
  egrep "$fp" $source_file

done < pattern_file

The result looks like this:

Pattern: "abcd efgh ijkl"

[matches]

Pattern: "abcd mnop . .. efgh ijkl"

[no matches]

I get correct matches on the patterns that do not use a wildcards, so I know the script is working in that respect.
The wildcard patterns do not work, and it looks like the wildcards themselves are getting mangled somehow. In this example, the pattern has changed from "abcd mnop .* efgh ijkl" to "abcd mnop . .. efgh ijkl"

Thanks

The script snippet looks garbled. It sounds like the wildcard in the pattern got expanded by the shell at some point. You need more double quotes where it happens :slight_smile:

But aren't you making it too complicated? How about

while IFS=, read pattern rest
do
  egrep "$pattern" $source_file
done < p

Juha

1 Like

You got me looking back at the quotes again and I found the problem where I initially strip out the fields I need for the pattern. What a dumb mistake!
Yes, I should try to make it simpler. :slight_smile:
Thanks!

No need for the while loop...first remove everything that follows the comma on each line of the pattern file with awk and pipe its output to grep...

awk -F, '{print $1}' pattern_file | xargs grep {} $source_file
1 Like

I appreciate the suggestion. I need the labels for characterizing data in the output, so I'm hanging onto those until I get into the loop where I then separate them and use them both. Also, I'm doing additional things inside the while loop that I didn't include in the post.
The main point of my post was to find out why I wasn't able to get wildcards to work properly. I wasn't expecting to find out that I had overlooked quoting the variable, so I included the other information because I thought it might be helpful for resolving the problem.
Anyway, thanks again for the suggestion! I may use it for another project.

The problem of your snippet in post#1 - beyond the missing do in the while loop - is that the awk s in the two "command substitutions" read from stdin and thus drain the pattern file. As it has just two lines, when it comes to assigning fp its already at EOF, fp is empty, and thus grep looks for an empty match.

With sth. like

while IFS=, read -r fp X fn
  do    echo "Pattern: \"$fp\""
        egrep "$fp" source_file
  done <pattern_file
Pattern: "abcd efgh ijkl"
abcd efgh ijkl
abcd efgh ijkl
abcd efgh ijkl
Pattern: "mnop .* efgh ijkl"
abcd mnop unique-name01 efgh ijkl
abcd mnop unique-name02 efgh ijkl

you should come close to what you need.