[awk]compare a number in a string with a list

Hi,

I have a program written in awk and I want to extend it to do another task.

My program is a list of CVS log reports of a repository. For each file, I have some fields. One of the fields is the comment field. I want to know how I can check if a comment (which is a free text field) contains a number and if it does, I want to check if that number exists in another list of numbers (taken as input from another file).

For example,
If the list of numbers is

100
101
102
103

And I have the following log data

RCS file: /cvsroot/eclipse/org.eclipse.jdt.apt.core/src/org/eclipse/jdt/apt/core/internal/util/SourcePositionImpl.java,v
head: 1.10
branch:
locks: strict
access list:
keyword substitution: kv
total revisions: 12;    selected revisions: 1
description:
=============================================================================
RCS file: /cvsroot/eclipse/org.eclipse.jdt.apt.core/src/org/eclipse/jdt/apt/core/internal/util/TypesUtil.java,v
head: 1.13
branch:
locks: strict
access list:
keyword substitution: kv
total revisions: 15;    selected revisions: 2
description:
----------------------------
revision 1.13
date: 2008-01-01 20:28:39 -0600;  author: wharley;  state: Exp;  lines: +1 -16;  commitid: 537d477af6d64567;
Bug 100 - partial fix.  
=============================================================================
RCS file: /cvsroot/eclipse/org.eclipse.jdt.apt.core/src/org/eclipse/jdt/apt/core/internal/util/Visitors.java,v
head: 1.7
branch:
locks: strict
access list:
keyword substitution: kv
total revisions: 8;    selected revisions: 1
description:
=============================================================================
RCS file: /cvsroot/eclipse/org.eclipse.jdt.apt.core/src/org/eclipse/jdt/apt/core/util/AptPreferenceConstants.java,v
head: 1.16
branch:
locks: strict
access list:
keyword substitution: kv
total revisions: 17;    selected revisions: 4
description:
----------------------------
revision 1.16
date: 2008-01-29 16:55:45 -0600;  author: wharley;  state: Exp;  lines: +1 -1;  commitid: 69f6479faef04567;
This is also some sample text. bug 101, followed by some more text. 
----------------------------
revision 1.14
date: 2007-10-15 15:46:44 -0500;  author: wharley;  state: Exp;  lines: +12 -1;  commitid: 724c4713d1b24567;
This is some sample text. Bug 102: some text
=============================================================================

After the word "description", there can be 0, 1 or more commit data. In each commit data, I want to get the number from it, which can be anywhere in the 3rd or 4th line after the field separator "----------". And I want to compare that number with the list above and if it turns to be true, take some actions.

Can anyone help me how to do this. If needed, I can also send the already existing awk code I have.

Thanks,
Sandeep

What output do you expect/want given the input files you posted?

Hi Radoulov,

I want to actually increment a counter everytime I get a match. The final count value is what I need. So every time I see a number in the commit message, I will compare with the list. If the number is present, count++ and then finally output count.

Thanks,
Sandeep

OK,
could you please specify the pattern? How exactly the lines you're interested in look like?
Where in your sample above is the commit data?

For the current file I have, the output looks like this,

Filename    Selected Revisions    Number of Authors       Total Lines Added
DebugElement.java     0
DebugEvent.java         2                           1                               3
DebugPlugin.java       16                           3                              158
....

and so on.
Now, I need another column named Number of Bugfixes, and the number that I count for each file, will be the value there.

So for each file (each record in awk separated by ======), I have to get a count.

Thanks

This sounds different than the original requirement ...
What the number list has to do with it?

The data that I showed in the previous post is the output format I have. I need another column to this called "number of bugfixes". To get that number, I have a count.

I also have a list of bugs (but it is not classified at file level). I will use this list for comparison.

To find the number of bugs per file, I increment the count every time I see that a number in the commit message is also present in the list of

pseudo code would be
1) for each file,
do
*initialize count to 0
*for each commit message,
-find if commit message has a number in it.
-If it does, compare number with the numbers in list.
-If there is a match, increment count.
* Assign numberOfFixes=count;
done

In the code I already have, I am finding the other data I showed for each file in this way. and I want to do this task also in similar way.

Thanks

You could start with something like this.

Given your sample files, the following code:

awk 'END {
  for (F in f)
    print F, "bugs fixed:", f[F]
  }
NR == FNR {
  nl[$1]; next
  }
/^RCS file/ {
  n = split($NF, t, /[/,]/)
  fn = t[n -1]
  }
match($0, /[Bb][Uu][Gg] [0-9][0-9]*/) && substr($0, RSTART + 4, RLENGTH - 4) in nl {
  f[fn]++
  }' numlist data  

produces:

AptPreferenceConstants.java bugs fixed: 2
TypesUtil.java bugs fixed: 1

You should try to integrate this code in your main script.

Thanks Radoulov,
I will try to check this with the data I have.

Sandeep