Hopefully, RudiC's suggested awk
script got you started down a workable path. Unfortunately, with an input file like:
pattern /* comment */
pattern2 /* start
continue comment
This >>> pattern <<< should never be seen.
continue comment
end */ pattern3
/* comment1 */ pattern4 /* comment 2 */
I believe that if your search pattern is pattern
, RudiC's script will not find any of the four occurrence of pattern in the above file that are not in comment fields.
You didn't mention anything about quoted strings. If --
or /*
and */
do not denote comments if they are single quoted or double quoted (as in a shell script or C code), the following script won't work either. (If you need something that will ignore comments found in quoted strings, maybe you can use the following as a guide on how to attack that problem; but I won't volunteer to do that for you here. A general parser like that is too much like work for me to offer to do it for free. ;))
The following script will work with any ksh
, with /usr/xpg4/bin/sh
, /usr/xpg6/bin/sh
, or with bash
(if bash
is installed on your Solaris system). First copy the following into a file named NoCommentPattern.awk
:
# Check to see if we already had a match in this file...
nf > 0 {if(FNR == 1) nf = 0
else next
}
d { printf("===%d%d\t%s\n", nf, ssc, $0)
}
# Strip out any comments (or skip line completely if we're in the middle of a
# multi-line comment.
{ if(ssc) {
# An earlier line had an unclosed comment starting with "/*"...
if(s = index($0, "*/")) {
$0 = substr($0, s + 2)
if(d) printf("Updated $0:\n\t%s\n", $0)
ssc = 0
} else next
}
# Search for "/*...*/" and "--" comments.
while(match($0, "[-][-]|[/][*]")) {
if(substr($0, RSTART, 1) == "-") {
# Found -- comment; throw away the rest of the line...
if(RSTART == 1) {
if(d) printf("Comment line deleted.\n")
next
}
$0 = substr($0, 1, RSTART - 1)
if(d) printf("Updated $0:\n\t%s\n", $0)
break
}
# Found start of "/*" comment; look for the end of comment...
if(s = index(substr($0, RSTART + 2), "*/")) {
# End found, delete comment from line and look for more.
$0 = (RSTART > 1 ? substr($0, 1, RSTART - 1) : "") \
substr($0, RSTART + s + 3)
if(d) printf("Updated $0:\n\t%s\n", $0)
} else {# We found the start of a "/*...*/" commment but not
# the end. Process the part of this line before the
# comment...
ssc = 1
if(RSTART == 1) {
if(d) printf("Comment line deleted.\n")
next
}
$0 = substr($0, 1, RSTART - 1)
if(d) printf("Updated $0:\n\t%s\n", $0)
break
}
}
}
# Look for pattern in current line after comments have been stripped.
index($0, P) {
# Found it...
print FILENAME
nf = 1
}
and create a script (for this example, call it findpat
) containing:
#!/usr/xpg4/bin/sh
pat=${1:-insurance_no}
if [ $# -gt 1 ]
then debug=1
else debug=0
fi
find . -type f -exec /usr/xpg4/bin/awk -v P="$pat" -v d="$debug" -f NoCommentPattern.awk {} +
and make it executable:
chmod +x findpat
Then the command:
./findpat
or:
./findpat "insurance_no"
will search for any regular files containing insurance_no
in the directory hierarchy rooted in the current directory that is not in a comment and print the names of any files that meet these conditions.
If you invoke it with two or more arguments:
./findpat "Search Pattern" debug
it will print lots of debugging information while it searches for matching files so you can see the lines it is processing and how it strips out comments before looking for the pattern. Once you understand how it works, you can make the script run a little bit faster if you strip out the debugging code.
Note that if you run this script in a directory other than where you place the file NoCommentPattern.awk
, you'll need to modify the script to use an absolute pathname to where this file is located. This script should work even if there are spaces or tabs in your search pattern, but it will not find it if your pattern matches text that starts on one line and continues onto the next line.
If someone else wants to try this on a system where awk
includes support for the nextfile
function, this script can be made a lot faster by using it instead of setting nf = 1
when a match is found, reading the remainder of the file, and setting nf
back to zero when the 1st line of the next file is found.