Need Help with Simple Regex

I have got a question. How to do this? I mean AND expression in regex.

List all the files in current directory that do not contain the words use AND take.

Thx.:stuck_out_tongue:

Assuming you want an answer rather than a theory, something like

for f in *; do
  grep use "$f" >/dev/null && continue
  grep take "$f" >/dev/null && continue
  # file contains neither if we get to here; report its file name
  echo "$f"
done

If you really do require this to be done in regular expressions exclusively, there is no simple way to specify this in regular expressions. Theoretically there could be an operator & to parallel the operator | but in practice, it is fairly useless, and also complicates the regex engine a fair bit (if I recall the gist of the research papers on this topic correctly).

From the top of my head, I would use something like

grep -L use $(grep -L take *.txt)

which means, first (in the prenthesis) find all files that do not contain the word take, and in that list of files, find all the files that do not contain the word use.
But I'm sure there is a way to use OR in the regexp..

/Lakris

Use egrep, search for $var1 OR $var2:

egrep "[$var1]|[$var2]" file

Invert match:

egrep -v "[$var1]|[$var2]" file

Search for $var1 AND $var2:

egrep "$var1.*$var2|$var2.*$var1" file

Invert match:

egrep -v "$var1.*$var2|$var2.*$var1" file

Regards

i think OP means the words "use" and "take" should not be in the file. So the egrep solution will not work is "use" and "take" are on separate lines.

Sorry, just awake :eek:, I have to read the question thoroughly.

Regards

I guess I misinterpreted the OP, now here is my supersilly superuseless use of cat and pipe...

for x in *.txt;do cat $x|tr "\n" " "|egrep '(use.*take|take.*use)'&>/dev/null; [ $? == 1 ] && echo $x;done

but I think it gets the job done?

/Lakris

awk '/use/ { ++u}/take/{ ++u }
END {
 if ( u == 0 && t == 0 ) {
   print "file: "FILENAME " has no use or take"
 }
}' file*

If you take out the useless(tm) stuff, it's a fairly good solution.

for x in *.txt; do
  tr "\n" " " <"$x" | egrep -v 'use.*take|take.*use' >/dev/null && echo "$x"
done

I believe I've seen egreps which couldn't handle really long lines, so this might not be entirely robust, but the idea is workable.

"I'll kill that cat..."
:wink:

Hi.

Probably intended:

awk '/use/ { ++u}/take/{ ++t }

cheers, drl

Hi.

Sigh. Because the OP mentioned AND explicitly, my interpretation is that either of the strings could appear, but he is interested in cases where both are not present -- i.e. search through all files, if both strings are found anywhere in a particular file, then do not list that filename, otherwise print the filename ... cheers, drl

Hi.

The awk I use, GNU Awk 3.1.4, does not reset variable when the filename changes, hence the variables remain as set with the most recent increase ... cheers, drl

Hi.

So we have something to talk about consistently, here are the data files I have been using for testing. I think they contain all the variations of the strings appearing and not appearing ... cheers, drl

<< data1 >>
This file has only string use, and
omits the other important string.

<< data2 >>
This file has string take,
but not the other one.

<< data3 >>
This file holds both strings take
and use.

<< data4 >>
Here use and take are on the same line.

<< data5 >>
This file will contain
neither of the two strings.

cheers, drl

yes. thanks :slight_smile:

file* should be file then. A for loop is the workaround.