Using find to output list of files with specific strings

This is my problem, I am using the following code to extract the file names with specific strings 0.01 :

find ./ -name "*.txt" -exec grep -H '0.01' {} + 

It works wonders with a small sample. However, when I use it in a real scenario it produces an empty file -even though I am sure there are files with the expected string
Am I missing something here? I am pretty sure there should be a better alternative using AWK, I just could not come up with one
Thanks in advance

I couldn't find any example of -H in reference books and grep warned me of "illegal option" when I tried it. Perhaps just "-l" instead. Also, the "find" commands that execute something end in "\;" instead of a plus sign. Putting a backslash in front of your decimal point might be worth a look.

It were highly surprising if grep should fail in "real scenarios". Does "I am sure there are files" guarantee to 100% there are files? Why don't you add a test file to the real scenario to check for correct operation?
@wbport: There are grep versions (including linux and FreeBSD) providing the -H option to print file names. And, most (if not all) find commands allow the exec action to be terminated with a ; (for exec ing on every single file found) OR with a + (for exec ing on as many files as would fit). The unescaped dot in the regex will match any char including decimal points, so the missing matches will NOT be due to this.

I did. That's why I know there is something wrong with the performance of the script. In reality, I wanted list all files where values between 0.019-0.011 were found -I just could not come up with a better solution.
As I said, it seem to work in a subset of files but failed miserably using real datasets :confused::confused::confused:

Post a representative sample of an input file from the real dataset that should match but doesn't.

Would this work?

find . -name '*.txt' -type f -print0 | xargs -0 perl -nle '/0.01/ and print $ARGV and close ARGV'

If you just want the names of files whose names end in .txt and whose contents include the string 0.01 (without printing the contents of lines that contain that string), I would try:

find . -name '*.txt' -exec grep -Fl 0.01 {} +

PS: Note that since I'm using grep -F (AKA fgrep ) instead of grep without the -F option, we are looking for a fixed string instead of looking for a match to a basic regular expression. Therefore, we don't need to escape the <period> in the string 0.01 to keep it from matching any character as it would in a BRE match.

1 Like

My script

find ./ -name "*.9tmp" -exec grep -H '0.1' {} +

outputs (subset):

 ./9.9tmp:0.609114
 ./9.9tmp:0.609114
 ./9.9tmp:0.609114
 ./91.9tmp:0.570516
 ./91.9tmp:0.570516
 ./92.9tmp:0.409131
 ./93.9tmp:0.904146
 ./93.9tmp:0.904146
 ./94.9tmp:0.609114
 ./97.9tmp:0.570516
 

My modified script

find ./ -name "*.9tmp" -exec grep -H '0.[0-1][0-9][0-9][0-9][0-9][0-9]' {} +

outputs:

 ./11.9tmp:0.179054
 ./11.9tmp:0.152542
 ./11.9tmp:0.152542
 ./11.9tmp:0.176465
 ./11.9tmp:0.176465
 ./11.9tmp:0.152542
 ./11.9tmp:0.179054
 ./11.9tmp:0.179054
 ./11.9tmp:0.179054
 ./11.9tmp:0.179054
 ./11.9tmp:0.179054
 ./11.9tmp:0.176465
 ./11.9tmp:0.176465
 ./11.9tmp:0.176465
 ./11.9tmp:0.176465
 ./11.9tmp:0.176465
 

Aia's script find . -name '*.txt' -type f -print0 | xargs -0 perl -nle '/0.01/ and print $ARGV and close ARGV'
outputs:

 ./11.9tmp
 ./12.9tmp
 ./16.9tmp
 ./20.9tmp
 ./22.9tmp
 ./26.9tmp
 ./28.9tmp
 ./29.9tmp
 ./30.9tmp
 ./32.9tmp
 ./36.9tmp
 ./42.9tmp
 ./46.9tmp
 ./48.9tmp
 ./53.9tmp
 ./57.9tmp
 ./61.9tmp
 ./62.9tmp
 ./63.9tmp
 ./69.9tmp
 ./72.9tmp
 ./75.9tmp
 ./83.9tmp
 ./9.9tmp
 ./91.9tmp
 ./92.9tmp
 ./93.9tmp
 ./94.9tmp
 ./97.9tmp
 

And Don's script

find . -name '*.9tmp' -exec grep -Fl 0.1 {} +

Outputs:

 /11.9tmp
 

So, my modified script and Don's output the expected/desired result. Why is that my first script and Aia's do not produce the same result?

:confused::confused:

In post #1 you said:

I interpreted that you wanted to output just the file name.
If you want as well to show the first instance of the matching pattern, try the following:

find . -name '*.txt' -type f -print0 | xargs -0 perl -nle '/0\.01/ and print "$ARGV: $&" and close ARGV'
1 Like

Got it!
Thanks!

Expanding a little on what I said in post #7, grep 0.1 and grep -E 0.1 use basic regular expression and extended regular expression matching, respectively, and in both cases the <period> in 0.1 matches any character. So, the RE 0.1 matches the text in red in the output:

 ./9.9tmp:0.609114
 ./9.9tmp:0.609114
 ./9.9tmp:0.609114
 ./91.9tmp:0.570516
 ./91.9tmp:0.570516
 ./92.9tmp:0.409131
 ./93.9tmp:0.904146
 ./93.9tmp:0.904146
 ./94.9tmp:0.609114
 ./97.9tmp:0.570516

but I have absolutely no explanation for why your first script:

find ./ -name "*.9tmp" -exec grep -H '0.1' {} +

did not also find:

 ./11.9tmp:0.179054
 ./11.9tmp:0.152542
 ./11.9tmp:0.152542
 ./11.9tmp:0.176465
 ./11.9tmp:0.176465
 ./11.9tmp:0.152542
 ./11.9tmp:0.179054
 ./11.9tmp:0.179054
 ./11.9tmp:0.179054
 ./11.9tmp:0.179054
 ./11.9tmp:0.179054
 ./11.9tmp:0.176465
 ./11.9tmp:0.176465
 ./11.9tmp:0.176465
 ./11.9tmp:0.176465
 ./11.9tmp:0.176465

nor why id didn't report many of the files found by Aia's perl script.

My suggestion worked because grep -F 0.1 performs a fixed string search; not a regular expression search, and in a fixed string search the <period> in 0.1 only matches a <period>.

And, using grep -l just prints the name of a file that contains a match (without displaying the matching text) and moves on to the next file instead of looking for all possible matches in a single file.

1 Like

Got it!

Sorry, I just listed a small subset. I was not very clear in my following statement outputs (subset):