Hi all,
I am in the process of building a shell script as part of a auditing utility. It will search a specified directory for keywords and output results of the file path, and line number that the word was found on. I built a test script (shown below) that does just this, but egrep apparently does not allow MS word, excel, etc... documents to be read. I was wondering if someone could point me in an alternate direction that would allow me to search these types of documents as well? (Wordfile is a file that is create elsewhere with a list of words to search for e.g. bus)
Thanks!
cat << EOF > ${TMPDIR}/scanit
rm -f ${TMPDIR}/strings
strings "\$1" | egrep -n -i -f ${TMPDIR}/wordlist ^\d{3}-\d{2}-\d{4}$ >> ${TMPDIR}/strings
if [ -s ${TMPDIR}/strings ]
then
echo >> ${TMPDIR}/${HOSTNAME}.o
echo "File: \$1" >> ${TMPDIR}/${HOSTNAME}.o
file "\$1" >> ${TMPDIR}/${HOSTNAME}.o
cat ${TMPDIR}/strings >> ${TMPDIR}/${HOSTNAME}.o
fi
rm -f ${TMPDIR}/strings
EOF
HOSTNAME=`hostname`
export HOSTNHAME
if [ $# -eq 0 ]
then
echo "You must specify the start of the directory tree to search"
exit
fi
find $1 -type f 2> ${TMPDIR}/${HOSTNAME}_find_errors | tee ${TMPDIR}/${HOSTNAME}_filelist | \
head -100 |\
sed -e "s+^+sh -x ${TMPDIR}/scanit \"+" -e 's/$/"/' > ${TMPDIR}/scanitnow
sh -x ${TMPDIR}/scanitnow 1> ${TMPDIR}/${HOSTNAME}_scan_run 2>&1
cd ${TMPDIR}
if [ -s ${HOSTNAME}.o ]
then
date "+%Y%M%d_%H:%m:%S: indicators found on ${HOSTNAME}" > ${HOSTNAME}_scan_results.csv
cat ${HOSTNAME}.o >> ${HOSTNAME}_scan_results.csv
else
date "+%Y%M%d_%H:%m:%S: No indicators found on ${HOSTNAME}" > ${HOSTNAME}_scan_results.csv
fi
zip ${HOSTNAME}_scan.zip ${HOSTNAME}_find_errors ${HOSTNAME}_filelist ${HOSTNAME}_scan_run ${HOSTNAME}_scan_results.csv