find $SRC -type f -name *.emlx |
while read FILE
do
if :
then sed -n '/From/p' $FILE
fi
done > $DEST-output.txt
The loop above spits out a .txt file with several lines that look like this:
From: John Smith <jsmith@company.com>
How can I narrow that sed result to spit out the email only? Maybe the "From:" line but only include the data in between <> symbols containing @. I'm running these results into a ldapsearch query which is why I need the email only.
Something like this should work for both the angular bracket style ( name surname <emailaddr> )and the direct type email addresses ( emailaddr ):
awk '/From:/{gsub(/[<>]/,x,$NF); print $NF}'
It is best to quote the wildcard name specification to avoid unwanted expansion. Also, you could probably use the -exec clause instead of a while loop, then you could also use the + operator for more efficient operation, e.g.:
There's also mails having "From:" lines ending with <br/> , and HTML- headers replacing < with < and > with > ; some even put the username in parentheses AFTER the email - address - try this to capture all of those as well:
sed -n 's/</</;s/>/>/;s/ *[(>].*$//;s/^From:.*[< ]//p' file