Regarding real example of user of semicolon(;) and + in find/exec command.

Hello All,

Was recently working on an requirement where we have to search files more than a specific number, following is the example on same.
Let's say file names are test_40000.txt , test_40001.txt and so on till test_99999.txt .

Now requirement was to search from find command only those files whose integer value is greater than 40000 in it.

So I have created these files for testing purposes(with ZERO size also). Now as we know for fast processing of find along with exec we could use {} \+ at last of it because it processes all the files in single shot rather than 1 by 1 and will be faster than using {} \; at end of find , exec command. In forums I have seen people asking about it how {} \+ will be faster than {} \; , then here is an example for it too.
Let's say I created following files.

-rw-rw-r-- 1 singh_test singh_test    0 Jul 12 14:31 test_40003.txt
-rw-rw-r-- 1 singh_test singh_test    0 Jul 12 14:32 test_40004.txt
-rw-rw-r-- 1 singh_test singh_test    0 Jul 12 14:32 test_40005.txt
-rw-rw-r-- 1 singh_test singh_test    0 Jul 12 14:32 test_40006.txt
-rw-rw-r-- 1 singh_test singh_test    0 Jul 12 14:32 test_40007.txt
-rw-rw-r-- 1 singh_test singh_test    0 Jul 12 14:32 test_40008.txt
-rw-rw-r-- 1 singh_test singh_test    0 Jul 12 14:32 test_40009.txt
-rw-rw-r-- 1 singh_test singh_test   78 Jul 12 15:21 test_40000.txt
-rw-rw-r-- 1 singh_test singh_test   78 Jul 12 15:21 test_40001.txt
-rw-rw-r-- 1 singh_test singh_test   78 Jul 12 15:21 test_40002.txt

So now I had prepared the following find command to satisfy the condition of getting specific files.

find -type f  -exec awk --re-interval  'FILENAME ~ /test_[4-9][0-9]{4,}.txt/ && !a[FILENAME]++{;print FILENAME} \
END{if(FILENAME ~ /test_[4-9][0-9]{4,}.txt/ && !a[FILENAME]++){print FILENAME}}' {} \+

So output will be as follows for above command.

./test_40002.txt
./test_40000.txt
./test_40001.txt

Now following are the points on above output.

I- If we see it carefully there is NO 0 size file in output.
II- IMHO it is because of {} \+ it will collect all the file names first(all the files) then when it comes to -exec it will pass all the files to awk in single shot, let's say in following style(as an example).

awk --re-interval  'FILENAME ~ /test_[4-9][0-9]{4,}.txt/ && !a[FILENAME]++{;print FILENAME} END{if(FILENAME ~ /test_[4-9][0-9]{4,}.txt/ && !a[FILENAME]++){print FILENAME}}' test_40000.txt test_40001.txt .......

III- So point to be noted here, how could we confirm that it is executing files I above fashion because, I had created NULL size files in system which are coming into the regex pattern and should be picked successfully.
IV- But they are NOT getting picked up why?
V- Because awk will NOT be able to process any empty size file, only thing in an awk program we could do is could work on END section which will obviously be executed an the end of all files are done with execution/processing by awk .
VI- So that only it is NOT picking those NULL size files, AND
VII- Hence proved that using \+ will be collecting all the information from find (command's whatever conditions) and then will execute them in a single shot.

Now on the other hand when I run following command using {} \; it will collect the files one by one and execute the exec function one by one for them so then NULL size files will be picked SUCCESSFULLY.
Following is the example on same.

find -type f  -exec awk --re-interval  'FILENAME ~ /test_[4-9][0-9]{4,}.txt/ && !a[FILENAME]++{;print FILENAME} \
END{if(FILENAME ~ /test_[4-9][0-9]{4,}.txt/ && !a[FILENAME]++){print FILENAME}}' {} \;
 

Output will be as follows for above command.

./test_40008.txt
./test_40005.txt
./test_40007.txt
./test_40006.txt
./test_40002.txt
./test_40004.txt
./test_40000.txt
./test_40001.txt
./test_40009.txt
./test_40003.txt
 

I had this learning, so ASAP I thought of sharing with al here so all could be benefited by this. Please feel free to correct me or any feed back on same.

Thanks,
R. Singh

awk or sed are text file processors; they loop over the lines in a file. To only print FILENAME looks like a misuse of awk.
You would need to loop over the arguments in the BEGIN section.

find . -type f  -exec awk 'BEGIN { for (i=1; i<ARGC; i++) if (ARGV ~ /test_[4-9][0-9]{4}.txt/) print ARGV }' {} +

Still this looks like the wrong tool for the task.
If there is a simple search pattern for the filenames one can use a -name glob:

find . -type f -name "test_[4-9][0-9][0-9][0-9][0-9].txt"