I'm trying to search all .odt files in a directory for a string in the text of the file.
I've found a bash script that works, except that it can't handle whitespace in the filenames.
#!/bin/bash
if [ $# -ne 1 ]; then
echo "Usage: searchodt searchterm"
exit 1
fi
for file in $(ls *.odt); do
unzip -ca "$file" content.xml | zgrep -ql "$1"
if [ $? -eq 0 ]; then
echo "$file"
fi
done
(Coutesy of [ubuntu] [SOLVED] Search multiple .odt files - Ubuntu Forums)
I've gone through a number of postings on this forum, but simple tricks like quotes, of any kind, don't work. Any quotes I put around
(ls *.odt)
or just
*.odt
stop it working completely
I found this code
find /path/to/some/where/ -name "*.pdf" | awk '{print $5}'| uniq -d |while read name ; do
in a thread here
which solves the problem in that context, but I don't see how to integrate it into the script.
Try it with:
for file in *.odt; do
instead of:
for file in $(ls *.odt); do
Thank you very much, I never thought of just taking that out!
For the completeness of this thread in case others are interested it has also been pointed out by someone on the other thread that all the way over on page *2* of the referenced thread there is a working script for this purpose, which also takes the path as command line arg. :o
#!/bin/bash
if [ $# -ne 2 ]; then
echo "Usage: searchodt searchpath searchterm"
exit 1
fi
find $1 -name "*.odt" | while read file
do
unzip -ca "$file" content.xml | grep -qli "$2"
if [ $? -eq 0 ]; then
echo "Found keyword in: " $file
fi
done
For extra completeness:
The reason why $(ls *) fails for files with spaces is that the default word separators for the shell includes spaces. So when the result of the substitution gets expanded by the shell, spaces act as boundaries and filenames get cut over them.
One solution is to remove the space character as a separator, by modifying the special IFS variable:
IFS=$(echo "")
echo always prints a newline unless passed -n, so the above produces a newline portably across platforms.
Now, only newlines define word boundaries and spaces in filenames are no longer a problem.
This way you can substitute things like $(grep string file) for iterating.