Pulling Data, Then Moving to the Next File

sudo · April 24, 2014, 7:30pm

I'm scanning a list of emails- I need to pull 2 pieces of data, then move to the next file:
Sender's Email Address
Email Date

I need these to be outputted into a single column- separated by a ",". Like this:

Email1's Address, Email1's Date Stamp
Email2's Address, Email2's Date Stamp
....so on

Instead it stacks the results like this:
Email1's Address
Email1's Date Stamp
Email2's Address
Email2's Date Stamp
....so on

How can I achieve the results I want? Thanks!!

find $SRC -type f -name *.emlx |
while read FILE
    do
        awk '/^From:/ && gsub(/.*<|>.*/,x)' $FILE
        sed -n '/Date/p' $FILE

    done > ~/Desktop/output.txt

pilnet101 · April 24, 2014, 9:51pm

find $SRC -type f -name *.emlx |
while read FILE
    do
        awk '/^From:/ && gsub(/.*<|>.*/,x)' $FILE
        sed -n '/Date/p' $FILE

    done | awk 'ORS=NR%2?", ":RS' > ~/Desktop/output.txt

sudo · April 25, 2014, 6:47pm

This works OK, but the problem is if I run into an error with any one of those extraction programs, the formatting for the rest of the document gets shifted. I tried to remedy this with an if statement entering a placeholder for bad/no matches but it's not bullet proof (shown below).

Any ideas for a stronger formatting structure?

sed -n '/Date/p' $FILE | egrep -o -m1 "Date:.{17}"
if [ $? -eq 1 ]; then
echo "NODATE"
fi

pilnet101 · April 28, 2014, 2:11am

Can you provide some example data of the bad output when the data is shifted?

sudo · April 28, 2014, 11:42am

Sure, currently my code reads:

find $SRC -type f -name *.emlx |
while read FILE
do
     sed -n '/Serial/p' $FILE | egrep -o -m1 "Serial#.{14}"
          if [ $? -eq 1 ]; then
          echo "NO SERIAL"
     sed -n '/Date/p' $FILE | egrep -o -m1 "Date:.{17}"
          if [ $? -eq 1 ]; then
          echo "NODATE"
done | awk 'ORS=NR%2?", ":RS' > ~/Desktop/output.txt
fi

The output gets tripped up because it has Serial# twice (even though I thought the -m1 trigger only looks at the first instance) and pushes the data over 1 comma.

The line that USUALLY comes in on the email is:
"Serial#: C02J13XXXXXX"
But the user decided to add data in when they noticed the Serial number was missing, tripping up my filter:
"Serial#: System Serial# (N0 Serial#) 2010 - 13" MB Air 1.86GHZ Intel Core 2 Duo"

As you can see it shifts my data over:

Serial#: C02J13XXXXXX, Date: Thu, 18 Jul 2013
Serial#: C02J1JXXXXXX, Date: Thu, 18 Jul 2013
Serial#: System Seria, Serial#) 2010 - 13" M
Date: Fri, 19 Jul 2013, Serial#: C02HNCZXXXXX
Date: Thu, 04 Jul 2013, Serial#: C02HNCXXXXX