Need Help with Simple Regex

evilfreakz · August 31, 2008, 4:17am

I have got a question. How to do this? I mean AND expression in regex.

List all the files in current directory that do not contain the words use AND take.

Thx.

era · August 31, 2008, 4:54am

Assuming you want an answer rather than a theory, something like

for f in *; do
  grep use "$f" >/dev/null && continue
  grep take "$f" >/dev/null && continue
  # file contains neither if we get to here; report its file name
  echo "$f"
done

If you really do require this to be done in regular expressions exclusively, there is no simple way to specify this in regular expressions. Theoretically there could be an operator & to parallel the operator | but in practice, it is fairly useless, and also complicates the regex engine a fair bit (if I recall the gist of the research papers on this topic correctly).

Lakris · August 31, 2008, 5:32am

From the top of my head, I would use something like

grep -L use $(grep -L take *.txt)

which means, first (in the prenthesis) find all files that do not contain the word take, and in that list of files, find all the files that do not contain the word use.
But I'm sure there is a way to use OR in the regexp..

/Lakris

Franklin52 · August 31, 2008, 6:06am

Use egrep, search for $var1 OR $var2:

egrep "[$var1]|[$var2]" file

Invert match:

egrep -v "[$var1]|[$var2]" file

Search for $var1 AND $var2:

egrep "$var1.*$var2|$var2.*$var1" file

Invert match:

egrep -v "$var1.*$var2|$var2.*$var1" file

Regards

ghostdog74 · August 31, 2008, 6:29am

i think OP means the words "use" and "take" should not be in the file. So the egrep solution will not work is "use" and "take" are on separate lines.

Franklin52 · August 31, 2008, 6:43am

Sorry, just awake , I have to read the question thoroughly.

Regards

Lakris · August 31, 2008, 8:01am

I guess I misinterpreted the OP, now here is my supersilly superuseless use of cat and pipe...

for x in *.txt;do cat $x|tr "\n" " "|egrep '(use.*take|take.*use)'&>/dev/null; [ $? == 1 ] && echo $x;done

but I think it gets the job done?

/Lakris

ghostdog74 · August 31, 2008, 8:13am

awk '/use/ { ++u}/take/{ ++u }
END {
 if ( u == 0 && t == 0 ) {
   print "file: "FILENAME " has no use or take"
 }
}' file*

era · August 31, 2008, 11:29am

lakris:

I guess I misinterpreted the OP, now here is my supersilly superuseless use of cat and pipe...
for x in *.txt;do cat $x|tr "\n" " "|egrep '(use.*take|take.*use)'&>/dev/null; [ $? == 1 ] && echo $x;done
but I think it gets the job done?

If you take out the useless(tm) stuff, it's a fairly good solution.

for x in *.txt; do
  tr "\n" " " <"$x" | egrep -v 'use.*take|take.*use' >/dev/null && echo "$x"
done

I believe I've seen egreps which couldn't handle really long lines, so this might not be entirely robust, but the idea is workable.

Lakris · August 31, 2008, 11:32am

"I'll kill that cat..."

drl · August 31, 2008, 4:41pm

Hi.

Probably intended:

awk '/use/ { ++u}/take/{ ++t }

cheers, drl

drl · August 31, 2008, 4:51pm

Hi.

Sigh. Because the OP mentioned AND explicitly, my interpretation is that either of the strings could appear, but he is interested in cases where both are not present -- i.e. search through all files, if both strings are found anywhere in a particular file, then do not list that filename, otherwise print the filename ... cheers, drl

drl · August 31, 2008, 6:15pm

Hi.

The awk I use, GNU Awk 3.1.4, does not reset variable when the filename changes, hence the variables remain as set with the most recent increase ... cheers, drl

drl · August 31, 2008, 6:17pm

Hi.

So we have something to talk about consistently, here are the data files I have been using for testing. I think they contain all the variations of the strings appearing and not appearing ... cheers, drl

<< data1 >>
This file has only string use, and
omits the other important string.

<< data2 >>
This file has string take,
but not the other one.

<< data3 >>
This file holds both strings take
and use.

<< data4 >>
Here use and take are on the same line.

<< data5 >>
This file will contain
neither of the two strings.

cheers, drl

ghostdog74 · August 31, 2008, 8:04pm

yes. thanks

ghostdog74 · August 31, 2008, 8:07pm

file* should be file then. A for loop is the workaround.