Regular Expression in Find command [KSH]

vinay4889 · July 26, 2012, 7:12am

Hello,

I am trying to use regex wtih find command in KSH. For some reason it is not working as expected.

Input:

comm_000_abc_0102.c
comm_000_abc.c
456_000_abc_1212.cpp
456_000_abc_.cpp

Expected Output:

comm_000_abc_0102.c
kkm_000_abc_8888.cpp

(Basically I want to find all files recursively to see all back up files - which I generally create adding MMDD before .c or .cpp)

I tried this:

 
find . -name "*[\w]+[\d].mp" -print | egrep [\w]+[\d].mp
find . -regex "*[\w]+[\d].mp"
find . -regextype sed -regex  "*[\w]+[\d].mp"
find . * | grep -P "*[\w]+[\d].mp"
find . * | grep -E "*[\w]+[\d].mp"
find . * | grep -E "*\w+[\d].mp"
find -regextype posix-extended -regex  "*\w+[\d].mp"

None of these is working.

I got below error for some.

find -regextype posix-extended -regex  "*\w+[\d].mp"
Usage: find [-H | -L] path-list [predicate-list]

Could you please help me with this.

Thanks,
Vinay

bakunin · July 26, 2012, 8:23am

You probably confuse "regexps" with "shell regexps" (aka "file globs"). "Regexp" is what commands like sed, awk, grep, etc. use. The shell (and hence "find" uses only "file globs", which are a lot simpler and less sophisticated.

What you can do is to use "find" to pre-sort the files you are interested in and then filter this output through "grep" or a similar tool using regexps, for instance:

find /path/to/startdir -type f -name "*\.c" -print | grep "[01][0-9][0-9][0-9]\.c$"

Note the difference between "*\.c" (file glob, read by shell) and the regexp grep works with.

You can also use "find" with an "-exec" clause to send every file name to a regexp-capable program. "Find" will use the return value (=error level) of that program to determine if it should be included in the result set (=printed) or not. This probably will result in a lot more overhead because the external program is called for every single filename instead of once for the pipeline in the above example.

I hope this helps.

bakunin

PS: As you seem to be a bit unsure about the syntax of "find" you might want to read this article where i explained it in some detail.

bakunin

drl · July 26, 2012, 1:45pm

Hi.

Note that with the regex option in find, one needs to match the entire path. Matching arbitrary strings with a regex requires .* , not a stand-alone * :

       -regex pattern
              File name matches regular expression pattern.  This is a match
              on the whole path, not a search.  For example, to match a file
              named `./fubar3', you can use the regular expression `.*bar.' or
              `.*b.*3', but not `f.*r3'.

-- excerpt from man find (GNU)

Best wishes ... cheers, drl

Chirel · July 26, 2012, 2:04pm

Hi

find . | egrep '.*_[0-1][0-9][0-3][0-9].(c|cpp)$'

Corona688 · July 26, 2012, 2:12pm

find . '(' -name '*_[0-1][0-9][0-3][0-9].c' -o -name '*_[0-1][0-9][0-3][0-9].cpp' ')'

vinay4889 · July 26, 2012, 4:58pm

Hi Everyone,

Thanks for the help. I got the output as expected , and now I have more than one option to achieve that:).

I am still not clear on one thing though. Why can not we use shortcuts for the regular expression available- (with grep -E here)? For instance - \d for all numbers , \w for all alphanumerics . Please suggest.

Thanks,
Vinay

Corona688 · July 26, 2012, 5:00pm

I don't think those work inside []

And anyway, your expression is simple enough there's little need.