help on most efficient search

Hello,

We have a directory with 15 sub-directories where each sub-directory contains 1.5 to 2 lakhs of files in it. Daily, around 300-500 files will be uploaded to each sub-directory.

Now, i need to get the list of files received today in most efficient way. I tried using "find with newer option" and also the "ls -ltr with tail" but both are taking long time to provide the list of images received today.

Please advise me on the most efficient way (should take least time possible) to find today's files.

TIA
Prvn

Those are the tools I would use. Perhaps you could instead rearrange the directory structure so that all new files are in a "recent" directory tree, and then move them to the "main" directory tree when you no longer want to treat them as "recent"?

Which file system are you using? NTFS in particular is a real dog when it comes to coping with large directories. You might want to investigate whether it might make sense to switch to Reiser or XFS or something.

(I just happen to have a vague idea of what a lakh is, but you'd probably better avoid regional lingo like that in an international forum.)

If these are web uploads, perhaps it would be simpler to process the web server's log file?

IF you have 10000 entries in a directory, in order to find files by ctime or mtime you have to stat all 10000 of them. If these are NSF mounted directories, it takes even longer, regardless of the remote filesystem type.

Either use the log as era suggests, use alter the app that sends the files to write a list of files to a text file in a central location.

Hi Era,

Thanks for your reply.

using "recent" directory may not suit as we must pick the files list from actual location only.

We are running Solaris 9 with UFS file system.

They are not web or FTP uploads but just copied (using "cp" command)

Thanks
Prvn

Maybe you could institute a policy to use "cp -v" and direct the output to a file?

Another solution which I guess might not be suitable for you would be to rearchitect the whole thing to use a database instead of the bare file system.

Dunno about Solaris but on Linux you can install a daemon which monitors the file system for you, and can keep track of which files have been created recently. Maybe you could find something like this for your system.

Thanks Era.

Could you please let me know the daemons on linux for monitoring directories? I will try to get its source and compile on Solaris.

I know aide but i think it would take even more time to monitor (check) as i have millions of files in the directory to monitor.

Prvn

Please explain what do you mean by "lakhs" :confused:

Sorry for my regional language.

lakh (or lac) means 100 thousand (100,000).

Thanks
Prvn

shamrock: that's why I put in the wiki link anyway ...

prvnrk: it's probably not so simple as to just compile the thing on Solaris, because it depends on the availability of various system calls and other pieces of infrastructure. The one I was thinking of is related to dbus but I can't recall its name. Here's another one: Monitor Linux file system events with inotify

Googling for "solaris dnotify" mainly brings up links discussing how it's not available, but I didn't look very closely; maybe you can find something similar. Actually, "solaris inotify" brings up some rather promising hits; look at Summer of Code - Genunix and UNIX man pages : inotify (7) which says the following:

Edit: Hmm, I guess /dev/poll is not what you want. The Summer of Code link takes you to a thread where they discuss a possible future mechanism ("future" relative to 2006; we can only speculate what happened then ...) so things are not so promising. One of the messages says you have the basic kernel support but it should be exposed to userland. Here's another one for you: Nabble - Gnome - Lib - Gamin - General - Try to port gamin to Solaris

I had this problem of using lakhs instead of "K" or "M" - that turned out to be a real fun for others in a meeting. This happened few years ago when I was into this industry for the first time.

We need to use international attributes all the time. :wink:

I dont mean to offend anybody just thought of sharing this.

We can kind of tackle this problem.

There are 2 approaches to do this.

1) Use a book keeping file which maintains a list of filenames and its status, whether its processed or not which would indicate indirectly whether its a new file or not

2) much more easier way use a table and populate the table accordingly with the filename and the status.

Both are same but based on the number of files you are handling and the need for book keeping you need to make a call for the option

Thank you Era.

I tried dnotify, inotify and FAM on Solaris but could not setup successfully. All these are working just great on Linux.

As I could not get any info on using /dev/poll on Solaris, i did not try it out yet.

Thanks
Prvn

Could you hack UFS to provide a suitable hook directly? Sounds tacky but it might not even be all that hard (but if this is an important production system, management approval might be a tough issue).

Management will definitely approve on PROD box if i achieve it on the TEST box.

But, i dont know anything about hacking Solaris UFS. It would be great if you could provide me any documentation on this.

Thanks
Prvn

Never hacked on that but isn't it the same file system they use in BSD? If so, Leffler et al. should have the full scoop, and then some.

Actually the 4.4 book is probably more current.

Unix File System - Wikipedia, the free encyclopedia has some more helpful links, some of which look very promising from this perspective.