find restricted search to some directories

Hi,

I would like to look for files in certain sub-directories in order to avoid looking into possibly big ones.
The subdirectories to search are created monthly following the convention YYYYMM.

I've tried this:

find . \( ! -name 2[0-9][0-9][0-9][0,1][0-9] -prune \) -o -type f -print

expecting to retrieve only Y and Z out of the 3 examples below

./a/X01012/X
./a/201012/Y
./a/b/201011/Z

but that command doesn't return anything.

How should i write it ?

Thanks for your help

I m not clear what you want
however if you search only named within `2[0-9][0-9][0-9][0,1][0-9]` direcs..
maybe you can use this like :wink:

for i in `find -name "2[0-9][0-9][0-9][0,1][0-9]" -type d`; do find $i -type f ; done

This works indeed (with a '.' after the first find though).

May be i'm being picky but isn't just 1 single call to find necessary ?

Edit: Removed previously posted code.

You don't see any output from your original command because you're pruning the starting (.) directory.

You can exclude every directory not named 2[0-9][0-9][0-9][0,1][0-9] only for a known number of levels,
because otherwise you will also exclude the parent of the directories you're interested in.

---------- Post updated at 12:34 PM ---------- Previous update was at 12:13 PM ----------

You can use something like this, but it will fail if there are white spaces or other special characters in the directory names
or if the list returned exceeds the ARG_MAX limit of your OS.

find $(find . -name '2[0-9][0-9][0-9][01][0-9]' -type d) -type f

Thanks Radoulov, your explanation makes sense. Would some output be expected with the following set then ?

./201012/a/X01012/X
./201012/a/201012/Y
./201011/a/b/201011/Z

I don't understand what you mean, could you please elaborate further?

I mean you're probably right about the reason for the absence of output so i just figured a starting directory matching the regular expression should not be pruned and therefore the command should return something. I've tried it and it's not the case.

Never mind, i'll use 2 find commands as suggested by ygemici or in your last post. Thank you both for your help

In:

you may run into problems if the number of arguments for the outer man find (linux) exceeds your shell's limits. You can get around that by:

find . -name '2[0-9][0-9][0-9][01][0-9]' -type d -exec find {} -type f -print \;

Now if you want to get fancy :o, taken from an example from man xargs (linux):

find . -name '2[0-9][0-9][0-9][01][0-9]' -type d -print | xargs sh -c 'find "$@" -type f -print'

(xargs is your friend :D)

Those are not shell limits, see above :slight_smile:

I would avoid this one because of its inefficiency.

Hm,
I believe you mean:

find . -name '2[0-9][0-9][0-9][01][0-9]' -type d | 
  xargs -I{} find {} -type f

Or this:

find . -name '2[0-9][0-9][0-9][01][0-9]' -type d \
  -exec sh -c 'find "$@" -type f' - {} +

Otherwise you'll skip $0.

1 Like

try

find -name "2[0-9][0-9][0-9][0,1][0-9]" -type d -print -exec ls -1 {} \;

more try

find -name "2[0-9][0-9][0-9][0,1][0-9]" -type d -print -exec ls -1 {} \;| sed -e :a -e 'N;s/^\(\..*\)\n\([^\.]*\)/\1 \2/;s/\(\..*\)\(\..*\)/\1\n\2/;ba'|sed '/^$/d;s/ / -> /1'
1 Like

@radoulov:

You are quite correct.

But for:

I did not mean:

The -I option implies the -L1 option, so this would be functionally equivalent to:

because as per man xargs (linux):

-L max-lines
    Use at most max-lines nonblank input lines per command line.
    Trailing blanks cause an input line to be logically continued
    on the next input line.  Implies -x.

You are correct about skipping $0, so I'd like to change my answer to:

find . -name '2[0-9][0-9][0-9][01][0-9]' -type d -print | xargs sh  -c �find "$@" -type f -print' --

(xargs is still your friend)

1 Like

Didn't know this quite important detail,
thanks for pointing it out!

And thanks for pointing out the "skipping of $0".

From a performance perspective, did some benchmark tests: M.D.Ludwig's solution seems faster.

In the competition was xargs, the filter suggestion from ygemici (using egrep instead of the sed expression), and a while read loop:

1) find . -name '2[0-9][0-9][0-9][01][0-9]' -type d | xargs -I{} find {} -type f
2) find . -name '2[0-9][0-9][0-9][01][0-9]' -type d -print | xargs sh -c 'find "$@" -type f -print' (with incomplete results)
3) find . -name 2[0-9][0-9][0-9][0,1][0-9] -type d|while read d; do find $d -type f; done
4) find . -name "2[0-9][0-9][0-9][0,1][0-9]" -type d -print -exec ls -1 {} \;|egrep -v '^\.'

xargs is the way to go, thanks for the suggestion M.D : )

Thanks for reporting your benchmark results!

1 Like

Don't forget my correction (in red, below)

And you are most welcome.

1 Like