Simple shell script to find and print data

Hi,

I have a log file containing data on emails sent. Looks a bit like this for one email:

Content-Type: text/plain;
charset="UTF-8"
Date: 12 Jun 2008 14:04:59 +0100
From: from@email.com
Subject: xcf4564xzcv
To: recip@email.co.uk
Size = 364 Jun 12 14:04 smtp_234sldfh.tmp

I need to take the subject, date, time, size and To: and stick it in an output file in the following format

1,recip@email.co.uk,1,,,1,xcf4564xzcv,1,12 Jun 2008,14:04:59,1,,364
 

How can I do this in Shell (AIX)... sed? grep? Here is the perl script I wrote:

open(READLOGFILE,  "C:\\temp\\email.log") or die("Failed to open file");
open(WRITELOGFILE, ">>C:\\temp\\emailstats.log") or die("Failed to open file");
$numRecords  = 0;
$emailSize   = 0;
$foundSize   = 0;
$foundEmail  = 0;
$foundSubject= 0;
$foundDate   = 0;
while($line = <READLOGFILE>) {
 if($line =~ /^Subject\:\s(\S*)/ ) {
  $subject      = $1;
  $foundSubject = 1;
 } 
 if($line =~ /^Date\:\s(.*)\s(\d\d\:\d\d\:\d\d)\s(.*)/) {
  $date      = $1;
  $foundDate = 1;
  $time    = $2;
 } 
 if($line =~ /^To\:\s(\S*\@\S*)/ ) {
  $to         = $1;
  $foundEmail = 1;
 }
 if($line =~ /^Size\s\=\s(\d*)/) {
  $foundSize = 1;
  $emailSize = $1;
 }
 
 if(($foundSubject + $foundEmail + $foundSize + $foundDate) == 4) {
  print WRITELOGFILE "1,".$to.",1,,,1,".$subject.",1,".$date.",".$time.",1,,".$emailSize."\n";
  $numRecords  = $numRecords + 1;
  $totalSize   = $totalSize + $emailSize;
  $emailSize   = 0;
   $foundSize   = 0;
  $foundEmail  = 0;
  $foundSubject= 0;
  $foundDate   = 0;
 }
 
}
print "\nProcessing Complete";

To keep the forums high quality for all users, please take the time to format your posts correctly.

First of all, use Code Tags when you post any code or data samples so others can easily read your code. You can easily do this by highlighting your code and then clicking on the # in the editing menu. (You can also type code tags

```text
 and 
```

by hand.)

Second, avoid adding color or different fonts and font size to your posts. Selective use of color to highlight a single word or phrase can be useful at times, but using color, in general, makes the forums harder to read, especially bright colors like red.

Third, be careful when you cut-and-paste, edit any odd characters and make sure all links are working property.

Thank You.

The UNIX and Linux Forums

Assuming the headers appear every time, you can use multiple calls to 'grep' to output one file for each field, and then use 'paste' to merge them, i.e.,

grep "^Subject" file.txt > subjects.out
grep "^To" file.txt > tos.out
paste -d"\t" subjects.out tos.out

On the other hand, if there's no guarantee each header will always exist in every set, you have to do basically what you've done already running through the file and setting vars, and echo output, i.e.,

SUBJECT=
FOUND_SUBJECT=0
cat file.txt | while read line
do
  SUBJECT=`echo $line | grep "^Subject"`
  if [ $SUBJECT ]
     FOUND_SUBJECT=1
  # etc
  . . .
  echo ...
  # reset vars
done
echo "Processing complete"

Hi,

Thanks for you post... but could you make this idiot proof for me please, never used UNIX in my life.

  1. How would I get 12 Jun 2008 from the line:
Date: 12 Jun 2008 14:04:59 +0100

Would AIX/UNIX accept:

/^Date\:\s(.*)\s(\d\d\:\d\d\:\d\d)\s(.*)/

Thanks
Terry

Not sure about grep on aix. Check for a -P option. On RHEL 5, this puts grep into Perl mode, which would presumably allow you to keep your regexes intact.

Otherwise, it's very similar. Use quotes to shell-proof your patterns. Drop the slash delimiters.

Since it's a log, you can probs just grep on the keyword at the start of the line, i.e., grep "^Date: " or you could do grep "^Date: [0-9][0-9]:" etc.

To get positional vars out of a line you can use awk, I think like this:
echo $var | awk '{ print $1 $3 }'

Truth be told, though, if you can stick with perl, it's much easier--made for this kind of task.

---------- Post updated at 08:19 AM ---------- Previous update was at 08:17 AM ----------

also, if you're really a newb, you may not know about the man pages. Type 'man awk' or 'man grep' at the command-line. and while you're at it, type 'which perl' . If you get a path to perl, use it.

Nah, client doesn't allow use of perl... Its not one of their strategic software, therefore hasn't got security clearance! Good chance to learn some shell though.

Thanks for you're help, I'll get testing. I'm going to try these:

grep "^To" email.log > recipients.out|awk '{print $2}'
grep /^Subject\:\s(\S*)/ email.log > subjects.out

Does the sytax look alright?

any ideas how I can paste all the outputs together to get it to look like this:

1,recip@email.co.uk,1,,,1,xcf4564xzcv,1,12 Jun 2008,14:04:59,1,,364

---------- Post updated at 03:12 PM ---------- Previous update was at 01:35 PM ----------

Resolution for anyone who needs it in future...

grep "^To" email.log |awk '{print "1,"$2}' > recipients.out ; grep "^Subject" email.log |awk '{print ",1,,,1,"$2}' > subjects.out ; grep "^Date" email.log |awk '{print ",1,"$2" "$3" "$4}' > date.out ; grep "^Date" email.log |awk '{print ","$5}' > time.out ; grep "^Size" email.log |awk '{print ",1,,"$3}' > size.out ; paste recipients.out subjects.out date.out time.out size.out > emailstats.txt ; rm recipients.out ; rm subjects.out ; rm date.out ; rm time.out ; rm size.out