Working with individual blocks of text using awk

Hi,

I am working with CVS log data and have some data as follows.

RCS file: /cvsroot/eclipse/org.eclipse.debug.core/core/org/eclipse/debug/core/IBreakpointListener.java,v
head: 1.14
branch:
locks: strict
access list:
keyword substitution: o
total revisions: 15;    selected revisions: 2
description:
----------------------------
revision 1.14
date: 2006-06-12 15:42:24 -0500;  author: darin;  state: Exp;  lines: +2 -2;
copyright updates
----------------------------
revision 1.13
date: 2006-05-16 09:34:00 -0500;  author: darin;  state: Exp;  lines: +1 -1;
javadoc spelling errors
=============================================================================
RCS file: /cvsroot/eclipse/org.eclipse.debug.core/core/org/eclipse/debug/core/IBreakpointManager.java,v
head: 1.36
branch:
locks: strict
access list:
keyword substitution: o
total revisions: 38;    selected revisions: 4
description:
----------------------------
revision 1.31
date: 2007-03-26 20:47:29 -0500;  author: darin;  state: Exp;  lines: +1 -1;  commitid: 61604608779a4567;
update copyrights
----------------------------
revision 1.30
date: 2007-01-17 09:01:45 -0600;  author: darin;  state: Exp;  lines: +3 -2;  commitid: 614345ae3a564567;
javadoc settings and fixes
----------------------------
revision 1.29
date: 2006-06-12 15:42:24 -0500;  author: darin;  state: Exp;  lines: +2 -2;
copyright updates
----------------------------
revision 1.28
date: 2006-05-16 09:34:00 -0500;  author: darin;  state: Exp;  lines: +1 -1;
javadoc spelling errors
=============================================================================
RCS file: /cvsroot/eclipse/org.eclipse.debug.core/core/org/eclipse/debug/core/IBreakpointManagerListener.java,v
head: 1.6
branch:
locks: strict
access list:
keyword substitution: kv
total revisions: 6;    selected revisions: 1
description:
----------------------------
revision 1.4
date: 2005-02-23 23:58:22 -0600;  author: darins;  state: Exp;  lines: +1 -1;
CPL --> EPL

A block starts with the word RCS and ends with the pattern ======
I used the following command

awk '/.java,v/,/====/'

and it extracted all the blocks of data for java files i.e. starting and ending with the pattern ======. This is good.

However, I also want to extract some more information from each block and store that also.
For example, I want to count how many revisions are there in each block, how many distinct authors worked on that file, how many lines added/deleted in total for each file, etc.

Can anyone help me out how to extract this information from each block and store that in a tab separated file? Even if I do not get the values of the individual revisions/author names, etc. it is ok. I just want to get the count for revisions (total or the sum of lines added, etc).

Any starting help on even how to work with these individual blocks will be useful. Do I have to use some for loop to work with each block?

Thanks,
Sandeep

awk -F\; '
/.java,v/{file=$0}
/^total revisions:/ {revisions=$1}
/author:/ {author=$2;split($4,a," ");add+=int(a[2]);del+=int(a[3])}
/======/ {print file RS revisions RS author RS "Add lines: "add RS "Delete lines: " del;
          revisions=author=add=del=""
         }
{t=$0}
END{ if (t!~/===/) print file RS revisions RS author RS "Add lines: "add RS "Delete lines: " del;}
' infile

RCS file: /cvsroot/eclipse/org.eclipse.debug.core/core/org/eclipse/debug/core/IBreakpointListener.java,v
total revisions: 15
  author: darin
Add lines: 3
Delete lines: -3
RCS file: /cvsroot/eclipse/org.eclipse.debug.core/core/org/eclipse/debug/core/IBreakpointManager.java,v
total revisions: 38
  author: darin
Add lines: 7
Delete lines: -6
RCS file: /cvsroot/eclipse/org.eclipse.debug.core/core/org/eclipse/debug/core/IBreakpointManagerListener.java,v
total revisions: 6
  author: darins
Add lines: 1
Delete lines: -1

Here's a Perl script for the problem. Change the value of $delim to "\t" for tab-delimited output.

$
$ # display the content of the data file "f7"
$ cat -n f7
     1  RCS file: /cvsroot/eclipse/org.eclipse.debug.core/core/org/eclipse/debug/core/IBreakpointListener.java,v
     2  head: 1.14
     3  branch:
     4  locks: strict
     5  access list:
     6  keyword substitution: o
     7  total revisions: 15;    selected revisions: 2
     8  description:
     9  ----------------------------
    10  revision 1.14
    11  date: 2006-06-12 15:42:24 -0500;  author: darin;  state: Exp;  lines: +2 -2;
    12  copyright updates
    13  ----------------------------
    14  revision 1.13
    15  date: 2006-05-16 09:34:00 -0500;  author: darin;  state: Exp;  lines: +1 -1;
    16  javadoc spelling errors
    17  =============================================================================
    18  RCS file: /cvsroot/eclipse/org.eclipse.debug.core/core/org/eclipse/debug/core/IBreakpointManager.java,v
    19  head: 1.36
    20  branch:
    21  locks: strict
    22  access list:
    23  keyword substitution: o
    24  total revisions: 38;    selected revisions: 4
    25  description:
    26  ----------------------------
    27  revision 1.31
    28  date: 2007-03-26 20:47:29 -0500;  author: darin;  state: Exp;  lines: +1 -1;  commitid: 61604608779a4567;
    29  update copyrights
    30  ----------------------------
    31  revision 1.30
    32  date: 2007-01-17 09:01:45 -0600;  author: inigo;  state: Exp;  lines: +3 -2;  commitid: 614345ae3a564567;
    33  javadoc settings and fixes
    34  ----------------------------
    35  revision 1.29
    36  date: 2006-06-12 15:42:24 -0500;  author: montoya;  state: Exp;  lines: +2 -2;
    37  copyright updates
    38  ----------------------------
    39  revision 1.28
    40  date: 2006-05-16 09:34:00 -0500;  author: darin;  state: Exp;  lines: +1 -1;
    41  javadoc spelling errors
    42  =============================================================================
    43  RCS file: /cvsroot/eclipse/org.eclipse.debug.core/core/org/eclipse/debug/core/IBreakpointManagerListener.java,v
    44  head: 1.6
    45  branch:
    46  locks: strict
    47  access list:
    48  keyword substitution: kv
    49  total revisions: 6;    selected revisions: 1
    50  description:
    51  ----------------------------
    52  revision 1.4
    53  date: 2005-02-23 23:58:22 -0600;  author: darins;  state: Exp;  lines: +1 -1;
    54  CPL --> EPL
    55  =============================================================================
$
$
$ # run the Perl script that processes the file "f7"
$
$ perl -lne 'BEGIN {
               $delim = "|";
               print join $delim, ("File", "Total Revisions", "Authors", "Lines Added", "Lines Deleted")
             }
             if (/^RCS file:.*\/(.*?),.*$/) {
               $file = $1;
             } elsif (/^total revisions: (\d+);.*$/) {
               $revcount = $1;
             } elsif (/^.*author: (\w+);.*lines: \+(\d+) -(\d+).*$/) {
               $authors{$1}++;
               $added += $2;
               $deleted += $3;
             } elsif (/^==+$/) {
               print join $delim, ($file, $revcount, join(",", keys %authors), $added, $deleted);
               $file = "";
               $revcount = "";
               %authors = ();
               $added = 0;
               $deleted = 0;
             }
            ' f7
File|Total Revisions|Authors|Lines Added|Lines Deleted
IBreakpointListener.java|15|darin|3|3
IBreakpointManager.java|38|inigo,darin,montoya|7|6
IBreakpointManagerListener.java|6|darins|1|1
$
$

HTH,
tyler_durden

Thanks rdcwayx and Tyler for the replies.

Although I am not very well versed in Awk and perl, I understand the code that you have given. I will work on these and try to get the other things I want using what you have written.

Thanks again. Really appreciate the help.

Sandeep