This works OK, but the problem is if I run into an error with any one of those extraction programs, the formatting for the rest of the document gets shifted. I tried to remedy this with an if statement entering a placeholder for bad/no matches but it's not bullet proof (shown below).
Any ideas for a stronger formatting structure?
sed -n '/Date/p' $FILE | egrep -o -m1 "Date:.{17}"
if [ $? -eq 1 ]; then
echo "NODATE"
fi
find $SRC -type f -name *.emlx |
while read FILE
do
sed -n '/Serial/p' $FILE | egrep -o -m1 "Serial#.{14}"
if [ $? -eq 1 ]; then
echo "NO SERIAL"
sed -n '/Date/p' $FILE | egrep -o -m1 "Date:.{17}"
if [ $? -eq 1 ]; then
echo "NODATE"
done | awk 'ORS=NR%2?", ":RS' > ~/Desktop/output.txt
fi
The output gets tripped up because it has Serial# twice (even though I thought the -m1 trigger only looks at the first instance) and pushes the data over 1 comma.
The line that USUALLY comes in on the email is:
"Serial#: C02J13XXXXXX"
But the user decided to add data in when they noticed the Serial number was missing, tripping up my filter:
"Serial#: System Serial# (N0 Serial#) 2010 - 13" MB Air 1.86GHZ Intel Core 2 Duo"