Regex in grep to match all lines ending with a double quote (") OR a single quote (')

NanJ · August 26, 2009, 6:15am

Hi,
I've been trying to write a regex to use in egrep (in a shell script) that'll fetch the names of all the files that match a particular pattern. I expect to match the following line in a file:

Name = "abc"

The regex I'm using to match the same is:

egrep -l '(^[nN][aA][mM][eE]) *= *" *[a-zA-Z0-9_+-]* *"$' /PATH_TO_SEARCH

But I now want to search all filenames which have the name quoted in either " OR '
i.e., the pattern to match could be:

Name = "abc"

OR

Name = 'abc'

I've tried various ways to include ' in the regex I've posted above, but I'm not able to get it right. I've tried using a backslash (\) and also tried things like [\'\"].
Any help to get this right is appreciated.
Thanks.

Franklin52 · August 26, 2009, 6:39am

With awk:

awk -F "\'|\"" '/Name =/{print $2}' file

NanJ · August 26, 2009, 6:54am

Thanks Franklin52,
Is there a way to do the same in grep itself. I wanted the regex to be part of the grep regex itself. Are you suggesting that I use awk instead of grep?

Thanks
NJ

drl · August 26, 2009, 7:04am

Hi.

When you need to protect special characters on the command line, you need to use quoting. However, as you've found, if the special characters are the quote symbols themselves, you can run into trouble.

One solution for some versions of grep is to have the pattern in a file so that it does not appear on the command line. That can be accomplished by using a here document to create the file. There are features in the here document syntax to ignore special characters, in addition to creating a file from within a script. Once that is done, we can use grep to read the regular expressions from the newly-created file. Here is an example:

 #!/usr/bin/env bash

# @(#) s1	Demonstrate isolation of quotes in file for grep.

echo
set +o nounset
LC_ALL=C ; LANG=C ; export LC_ALL LANG
echo "Environment: LC_ALL = $LC_ALL, LANG = $LANG"
echo "(Versions displayed with local utility \"version\")"
version >/dev/null 2>&1 && version "=o" $(_eat $0 $1) grep
set -o nounset
echo

FILE=${1-data1}

echo " Data file $FILE:"
cat $FILE

echo
echo " Results:"
cat > my-pattern <<'EOF'
[nN][aA][mM][eE] *= *['"].*['"]
EOF
grep -f my-pattern $FILE

exit 0

producing:

% ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian GNU/Linux 5.0 
GNU bash 3.2.39
GNU grep 2.5.3

 Data file data1:
name = "double-1"
Name = "double-2"
name = 'single-1'
name = none
name = 'single-2'

 Results:
name = "double-1"
Name = "double-2"
name = 'single-1'
name = 'single-2'

Another method is to surround the regular expression on the command line with double quotes. Inside of double quotes you may have escaped double quotes, \", and single quotes. However, you may not have escaped single quotes within a single-quoted string.

See man pages for details. Good luck ... cheers, drl

NanJ · August 26, 2009, 7:28am

Writing the pattern to a file without the hassles of escaping the quotes worked perfectly fine!
Thanks a ton..

I had escaped both " and ' when I had the regex directly in the command. Wonder why it failed despite using "\"....
Had tried the following:

egrep -l "(^[nN][aA][mM][eE]) *= *[\"\'] *[a-zA-Z0-9_+-]* *[\"\']$" /FILE
egrep -l "(^[nN][aA][mM][eE]) *= *\"\|\' *[a-zA-Z0-9_+-]* *\"\|\'$" /FILE

I'd used both these options to get the OR, but they'd failed.

Thanks
NJ

drl · August 26, 2009, 9:17am

Hi.

Modifications to your regular expressions, all searches done with egrep:

#!/usr/bin/env bash

# @(#) s2	Demonstrate isolation of quotes in file for egrep.

echo
set +o nounset
LC_ALL=C ; LANG=C ; export LC_ALL LANG
echo "Environment: LC_ALL = $LC_ALL, LANG = $LANG"
echo "(Versions displayed with local utility \"version\")"
version >/dev/null 2>&1 && version "=o" $(_eat $0 $1) egrep
set -o nounset
echo

FILE=${1-data1}

echo " Data file $FILE:"
cat $FILE

echo
echo " Results with here-document solution:"
cat > my-pattern <<'EOF'
[nN][aA][mM][eE] *= *['"].*['"]
EOF
egrep -f my-pattern $FILE

echo
echo " Results with command-line solution 1:"
# egrep -l "(^[nN][aA][mM][eE]) *= *[\"\'] *[a-zA-Z0-9_+-]* *[\"\']$" /FILE
egrep "^[nN][aA][mM][eE] *= *[\"'] *[a-zA-Z0-9_+-]* *[\"']$" $FILE

echo
echo " Results with command-line solution 2:"
# egrep -l "(^[nN][aA][mM][eE]) *= *\"\|\' *[a-zA-Z0-9_+-]* *\"\|\'$" /FILE
egrep "^[nN][aA][mM][eE] *= *(\"|') *[a-zA-Z0-9_+-]* *(\"|')$" $FILE


exit 0

producing:

% ./s2

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian GNU/Linux 5.0 
GNU bash 3.2.39
egrep GNU grep 2.5.3

 Data file data1:
name = "double-1"
Name = "double-2"
name = 'single-1'
name = none
name = 'single-2'

 Results with here-document solution:
name = "double-1"
Name = "double-2"
name = 'single-1'
name = 'single-2'

 Results with command-line solution 1:
name = "double-1"
Name = "double-2"
name = 'single-1'
name = 'single-2'

 Results with command-line solution 2:
name = "double-1"
Name = "double-2"
name = 'single-1'
name = 'single-2'

See man pages, experiment on small cases ... cheers, drl

NanJ · August 26, 2009, 9:28am

Thanks!!! That was me caught up in an endless loop of regex learning!!!

Another question posted by me which you might be able to answer. Pls do check it when you find some time. (http://www.unix.com/shell-programming-scripting/117800-problem-while-using-grep-multi-level-space-separated-filepath.html\)

Thanks a lot for the help again!
NJ