Hello All,
Was recently working on an requirement where we have to search files more than a specific number, following is the example on same.
Let's say file names are test_40000.txt
, test_40001.txt
and so on till test_99999.txt
.
Now requirement was to search from find
command only those files whose integer value is greater than 40000
in it.
So I have created these files for testing purposes(with ZERO size also). Now as we know for fast processing of find
along with exec
we could use {} \+
at last of it because it processes all the files in single shot rather than 1 by 1 and will be faster than using {} \;
at end of find
, exec
command. In forums I have seen people asking about it how {} \+
will be faster than {} \;
, then here is an example for it too.
Let's say I created following files.
-rw-rw-r-- 1 singh_test singh_test 0 Jul 12 14:31 test_40003.txt
-rw-rw-r-- 1 singh_test singh_test 0 Jul 12 14:32 test_40004.txt
-rw-rw-r-- 1 singh_test singh_test 0 Jul 12 14:32 test_40005.txt
-rw-rw-r-- 1 singh_test singh_test 0 Jul 12 14:32 test_40006.txt
-rw-rw-r-- 1 singh_test singh_test 0 Jul 12 14:32 test_40007.txt
-rw-rw-r-- 1 singh_test singh_test 0 Jul 12 14:32 test_40008.txt
-rw-rw-r-- 1 singh_test singh_test 0 Jul 12 14:32 test_40009.txt
-rw-rw-r-- 1 singh_test singh_test 78 Jul 12 15:21 test_40000.txt
-rw-rw-r-- 1 singh_test singh_test 78 Jul 12 15:21 test_40001.txt
-rw-rw-r-- 1 singh_test singh_test 78 Jul 12 15:21 test_40002.txt
So now I had prepared the following find
command to satisfy the condition of getting specific files.
find -type f -exec awk --re-interval 'FILENAME ~ /test_[4-9][0-9]{4,}.txt/ && !a[FILENAME]++{;print FILENAME} \
END{if(FILENAME ~ /test_[4-9][0-9]{4,}.txt/ && !a[FILENAME]++){print FILENAME}}' {} \+
So output will be as follows for above command.
./test_40002.txt
./test_40000.txt
./test_40001.txt
Now following are the points on above output.
I- If we see it carefully there is NO 0 size file in output.
II- IMHO it is because of {} \+
it will collect all the file names first(all the files) then when it comes to -exec
it will pass all the files to awk
in single shot, let's say in following style(as an example).
awk --re-interval 'FILENAME ~ /test_[4-9][0-9]{4,}.txt/ && !a[FILENAME]++{;print FILENAME} END{if(FILENAME ~ /test_[4-9][0-9]{4,}.txt/ && !a[FILENAME]++){print FILENAME}}' test_40000.txt test_40001.txt .......
III- So point to be noted here, how could we confirm that it is executing files I above fashion because, I had created NULL size files in system which are coming into the regex pattern and should be picked successfully.
IV- But they are NOT getting picked up why?
V- Because awk
will NOT be able to process any empty size file, only thing in an awk
program we could do is could work on END
section which will obviously be executed an the end of all files are done with execution/processing by awk
.
VI- So that only it is NOT picking those NULL size files, AND
VII- Hence proved that using \+
will be collecting all the information from find
(command's whatever conditions) and then will execute them in a single shot.
Now on the other hand when I run following command using {} \;
it will collect the files one by one and execute the exec
function one by one for them so then NULL size files will be picked SUCCESSFULLY.
Following is the example on same.
find -type f -exec awk --re-interval 'FILENAME ~ /test_[4-9][0-9]{4,}.txt/ && !a[FILENAME]++{;print FILENAME} \
END{if(FILENAME ~ /test_[4-9][0-9]{4,}.txt/ && !a[FILENAME]++){print FILENAME}}' {} \;
Output will be as follows for above command.
./test_40008.txt
./test_40005.txt
./test_40007.txt
./test_40006.txt
./test_40002.txt
./test_40004.txt
./test_40000.txt
./test_40001.txt
./test_40009.txt
./test_40003.txt
I had this learning, so ASAP I thought of sharing with al here so all could be benefited by this. Please feel free to correct me or any feed back on same.
Thanks,
R. Singh