Finding non-existing words in a list of files in a directory and its sub-directories

Hi All,

I have a list of words (these are actually a list of database table names separated by comma).

Now, I want to find only the non-existing list of words in the *.java files of current directory and/or its sub-directories.

Sample list of words: table_one,table_ten,table_x,table_y,table_z

Type of files in current directory and/or its sub-directories: *.java

Suppose, 2 words (say table_ten and table_z) are not exist in any of the *.java files in current directory and/or its sub-directories, then print these 2 words as output as shown below.

Non-existing words are: table_ten,table_z

Please help.

Thanks in advance.

  1. What shell/OS are you using?
  2. What have you tried? (It's easier and quicker for forum members to help you from where you're stuck; rather than give you a solution they have to work from scratch, just for you)
  3. To help you get started, you could try a combination of find and grep . Let us know what you could come up with.

@balajesuri Thanks for your reply :slight_smile:

I am using bash shell.

I guess, this can be done with shell scripting using a for loop.

A crude and inefficient way:

for file in $(find /path/to/dir -type f -name "*java")
do
    for word in table_one table_two table_three
    do
        if grep -q $word $file
        then
            continue
        else
           echo "$file does not contain $word"
        fi
    done
done

Try (untested, assuming table names in a one line file):

awk -F, '
NR == 1         {for (n=split($0, T); n>0; n--) TBL[T[n]]=n
                 next
                }
$0 in TBL       {delete TBL[$0]
                }
END             {print "non-existing:"
                 for (t in TBL) print t
                }
' tblfile *.java

@RudiC Thanks for your reply :slight_smile:

After ran the script code, got the below error message.

bash-3.2$ ./test.sh
Syntax Error The source line is 1.
The error context is
<<< >>>
awk: 0602-500 Quitting The source line is 1.
bash-3.2$

What files did you supply to the script?

I created a file called tblfile

And then added the below contents (i.e. comma separated table names).

table_one,table_ten,table_x,table_y,table_z

Works for me (linux, FreeBSD)

awk -F, '
NR == 1         {for (n=split($0, T); n>0; n--) TBL[T[n]]=n
                 next
                }
$0 in TBL       {delete TBL[$0]
                }
END             {print "non-existing:"
                 for (t in TBL) print t
                }
' tblfile *.java
non-existing:
table_one
table_y
table_z