Combining multiple greps

Xubuntu56 · February 22, 2019, 9:24am

I'm trying to learn about regular expressions. Let's say I want to list all the files in /usr/bin beginning with "p", ending with "x", and containing an "a" .
I know this works:

ls | grep ^p | grep x$ | grep a

but I'm thinking there must be a way to do it without typing grep three times. Some of my attempts:

ls | grep '^p' 'x$' 'a'
ls | grep ^p | x$ | [a]
ls | grep -E "[[ ^p ]]+[[ x$ ]]+[[ a ]]"

Ubuntu 18.04.2; Xfce 4.12.3; kernel 4.15.0-45-generic; bash 4.4.19(1); Dell Inspiron-518

joker · February 22, 2019, 9:33am

Try this...

ls | grep "^p.*a.*x$"

.* is for any character, 0 to any number of occurrances.

So the task is to create one combined regular expression that matches all criteria you want.

bakunin · February 22, 2019, 9:42am

[quote=xubuntu56;303031161]

I know this works:

ls | grep ^p | grep x$ | grep a

but I'm thinking there must be a way to do it without typing grep three times. Some of my attempts:


You are right and there is - in fact there are two ways:

First, you can use several expressions in grep at once by using the -e-switch:

grep -e "one" -e "two" /some/file

will list all lines containing "one" AND all lines containing "two" from that file. It is in fact a logical OR for these two expressions.

The second possibility (and this is probably what you wanted) is to combine regular expressions. i.e. to get all lines containing "one" and "two" in that order you would write:

grep 'one.*two' /some/file

The regexp means: "one", followed by anything (".*"), followed by "two". If the order of the two words should not matter you need two regexps, which you can combine with the method above:

grep -e 'one.*two' -e 'two.*one' /some/file

Search for "'one', something, then 'two'" or for "'two', something, then "one"'.

You can also use sed (the stream editor) for such (or even more complex) purposes where grep might get a bit unwieldy: sed -n will only print the lines you explicitly print (the default is to print every line after it is processed, which includes passing it through unchanged) therefore:

sed -n '/one/p' /some/file

basically is the same as

grep 'one' /some/file

But you can nest certain rules in sed which grep cannot do. i.e.

sed -n '/one/ {
             /two/ {
                   /three/p
             }
         }' /some/file

Would be similar to the example above: print only the lines containing "one", "two" and "three" but in any order.

I hope this helps.

bakunin

Xubuntu56 · February 22, 2019, 12:01pm

@stomp-- ls | grep "^p.*a.*x$" works great, with either single or double quotes. I'm used to thinking of the asterisk as a wildcard, which was confusing me at first.
@bakunin--this works great

sed -n '/^p/ {
             /x$/ {
                  /a/p
                  }
             }' catalogue

Scrutinizer · February 22, 2019, 10:31pm

Not really related to the question, but:

ls -d p*a*x

Xubuntu56 · February 23, 2019, 7:17am

@Scrutinizer--it does relate to the question, as it performs the task. It found the target file, partx , in /usr/bin .

I don't understand why, however, as the man page for ls says the -d option is to "list directories themselves, not their contents".

MadeInGermany · February 23, 2019, 8:06am

Normally ls lists the contents of a given directory.
Say there would be a pax/ subdirecory beside the partx file,
ls p*ax would list partx and the contents of the pax/
While ls -d p*ax would list the two items partx and pax.

Xubuntu56 · February 23, 2019, 8:34am

Is there a "gray area" between grepping and globbing?
A moment ago I just read that "The moral of the story is that grep never uses globbing." (see link)
I also thought I had it straight in my mind that globbing is for filenames, and grepping is for searching text within a file.
Yet my first post above asks for help grepping filenames!

Globbing and Regex: So Similar, So Different | Linux Journal

RudiC · February 23, 2019, 9:31am

In your post, you're not "grepping filenames", but grep ping text that is the result of an ls command, containing file names.

As you pointed out, you need to differentiate between "grepping and globbing", which are not the same even though it sometimes may seem so. Either is a malapropism; the exact terms would be "regex matching" and "pattern matching".

"Globbing" deals with patterns and is done by the shell, mostly when dealing with directory contents. And, in one exceptional case, some recent shells can deal with regexes: in "conditional expressions". man bash :

"Grepping" deals with regexes, basic and extended, abbr. BREs and EREs. They have many subtleties, and it pays off to spend some time reading the man page.

Patterns and regexes in principle have different syntaxes. There are some overlaps, e.g. the [...] bracket expression meaning "Match any one of the enclosed characters", but also "faux amis" (false friends) like the * . It's always annoying to keep those differences in mind when dealing with either, and I have to test and experiment every single time when I switch from one to the other.

Scrutinizer · February 24, 2019, 6:17am

With regards to pattern matching, perhaps a finer point to be made here is that (standard) globbing, or glob pattern matching, is a special form of pattern matching, where patterns are used for filename expansion.

In pattern matching, the wildcards are * , ? or a bracket expression ( [ ... ] )

In the case of globbing there are the following extra rules:

Wildcards do not match files that start with a . (dot) (those can be matched by specifying a dot as the first character in the pattern)
Wildcards do not match / (forward slash).
A forward slash cannot be used in a bracket expression (doing so turns the bracket expression into a literal string).

Globbing results in a list of files if there is a match, or the pattern itself if there is no match. The order in which the list of files is presented is governed by LC_COLLATE.
See Patterns Used for Filename Expansion

Examples:

$ mkdir -p somedir/foo
$ touch a.b .a.b somedir/bar somedir/.baz somedir/"foo bar"
$ ls -d *
a.b	somedir
$ ls -d .*
.	..	.a.b
$ ls -d * .*
.	..	.a.b	a.b	somedir
$ ls -d * .[!.]*
.a.b	a.b	somedir
$ ls -d */*
somedir/bar	somedir/foo	somedir/foo bar

Compare this to regular pattern matching, where a slash is actually matched

$ ls -d somedir/foo | while read line; do case $line in (*) echo "$line"; esac; done
somedir/foo

--
The bash shell and other more modern shells, like ksh93 and zsh also support extended globbing with additional rules.