Hello, I'm working on a script to extract the contents of a file (in general, plain txt file with numbers, symbols, and letters) and output it into a .txt file. but it is kind of all over the place. It needs to not include duplicates and the content has to be readable. I jumped all over the place as far as learning scripting but I managed to get down the translate feature. kind of new to awk but i heard it can be more effective and works similar. I was also wondering if im just making something more complicated when sort & uniq might be able to do the job?
Note: I will be using this script numerous times. Is it possible to keep updating the output file so that the context is extracted collectively?
#!/bin/bash
# Check for input file on command line.
ARGS=1
E_BADARGS=65
E_NOFILE=66
if [ $# -ne "$ARGS" ] # Correct number of arguments passed to script or too complicated for something easy?
then
echo "Usage: `basename $0` filename"
exit $E_BADARGS
fi
if [ ! -f "$1" ] # Check if file exists.
then
echo "File \"$1\" does not exist."
exit $E_NOFILE
fi
#so far i have it set to translate output by feeding tr back to itself. will this work?
#or is awk more effective. what about the use of | sort | uniq -c?
tr A-Z a-z | tr '[:space:]' Z | \
tr -cs '[:alpha:]' Z | tr -s '\173-\377' Z | tr Z ' '`
# for or while loop?
> output.txt
exit 0
I think everything for that can be done in awk using associative arrays that will flag every entry and prevent printing of a second duplicate. Conversion of chars are also easily handled. The problem in order to solve that quickly in one shot,.. can you give us an adequate example of the file's contents and the intended output?
The thing is, the input files vary. It could be in logs, records, database, information converted into plain text. The script will need to be able to read everything on it. One file for example had:
John Smith 555-5555 to 555-5555 Hello Jane Doe
another file was an email message so it was all text
The output just needs to have everything taken from the input printed in the output. The problem here is that it needs to be collectively done. For example I input one file and output it to the output file. Input another file and output it to the same(adding into) output file. That's where I'm stuck. I read that it will overwrite it the existing file, but I was wondering if it can be updated instead.
gibberish meaning non-printable that might be mixed into the regular expressions
Update:
so for the while loop portion where it reads I can use this code correct?
while read line
do echo "${line}"
done < <(cat file.lst)
/tmp/file1.txt
/tmp/file with space.txt
which inputs a file list of files to extract content out of and output it into a txt file in temp?
---------- Post updated at 07:38 PM ---------- Previous update was at 04:52 PM ----------
Will this also work?
cd <input_file_directory>
for file in `dir -d *` ; do
<exeFile with full path> "$file" <output_file_path/"$file".out>
done