[awk]compare a number in a string with a list

sandeepk1611 · February 21, 2011, 1:33pm

Hi,

I have a program written in awk and I want to extend it to do another task.

My program is a list of CVS log reports of a repository. For each file, I have some fields. One of the fields is the comment field. I want to know how I can check if a comment (which is a free text field) contains a number and if it does, I want to check if that number exists in another list of numbers (taken as input from another file).

For example,
If the list of numbers is

And I have the following log data

RCS file: /cvsroot/eclipse/org.eclipse.jdt.apt.core/src/org/eclipse/jdt/apt/core/internal/util/SourcePositionImpl.java,v
head: 1.10
branch:
locks: strict
access list:
keyword substitution: kv
total revisions: 12;    selected revisions: 1
description:
=============================================================================
RCS file: /cvsroot/eclipse/org.eclipse.jdt.apt.core/src/org/eclipse/jdt/apt/core/internal/util/TypesUtil.java,v
head: 1.13
branch:
locks: strict
access list:
keyword substitution: kv
total revisions: 15;    selected revisions: 2
description:
----------------------------
revision 1.13
date: 2008-01-01 20:28:39 -0600;  author: wharley;  state: Exp;  lines: +1 -16;  commitid: 537d477af6d64567;
Bug 100 - partial fix.  
=============================================================================
RCS file: /cvsroot/eclipse/org.eclipse.jdt.apt.core/src/org/eclipse/jdt/apt/core/internal/util/Visitors.java,v
head: 1.7
branch:
locks: strict
access list:
keyword substitution: kv
total revisions: 8;    selected revisions: 1
description:
=============================================================================
RCS file: /cvsroot/eclipse/org.eclipse.jdt.apt.core/src/org/eclipse/jdt/apt/core/util/AptPreferenceConstants.java,v
head: 1.16
branch:
locks: strict
access list:
keyword substitution: kv
total revisions: 17;    selected revisions: 4
description:
----------------------------
revision 1.16
date: 2008-01-29 16:55:45 -0600;  author: wharley;  state: Exp;  lines: +1 -1;  commitid: 69f6479faef04567;
This is also some sample text. bug 101, followed by some more text. 
----------------------------
revision 1.14
date: 2007-10-15 15:46:44 -0500;  author: wharley;  state: Exp;  lines: +12 -1;  commitid: 724c4713d1b24567;
This is some sample text. Bug 102: some text
=============================================================================

After the word "description", there can be 0, 1 or more commit data. In each commit data, I want to get the number from it, which can be anywhere in the 3rd or 4th line after the field separator "----------". And I want to compare that number with the list above and if it turns to be true, take some actions.

Can anyone help me how to do this. If needed, I can also send the already existing awk code I have.

Thanks,
Sandeep

radoulov · February 21, 2011, 3:08pm

What output do you expect/want given the input files you posted?

sandeepk1611 · February 21, 2011, 3:11pm

Hi Radoulov,

I want to actually increment a counter everytime I get a match. The final count value is what I need. So every time I see a number in the commit message, I will compare with the list. If the number is present, count++ and then finally output count.

Thanks,
Sandeep

radoulov · February 21, 2011, 3:19pm

OK,
could you please specify the pattern? How exactly the lines you're interested in look like?
Where in your sample above is the commit data?

sandeepk1611 · February 21, 2011, 3:24pm

For the current file I have, the output looks like this,

Filename    Selected Revisions    Number of Authors       Total Lines Added
DebugElement.java     0
DebugEvent.java         2                           1                               3
DebugPlugin.java       16                           3                              158
....

and so on.
Now, I need another column named Number of Bugfixes, and the number that I count for each file, will be the value there.

So for each file (each record in awk separated by ======), I have to get a count.

Thanks

radoulov · February 21, 2011, 3:32pm

This sounds different than the original requirement ...
What the number list has to do with it?

sandeepk1611 · February 21, 2011, 3:41pm

The data that I showed in the previous post is the output format I have. I need another column to this called "number of bugfixes". To get that number, I have a count.

I also have a list of bugs (but it is not classified at file level). I will use this list for comparison.

To find the number of bugs per file, I increment the count every time I see that a number in the commit message is also present in the list of

pseudo code would be
1) for each file,
do
*initialize count to 0
*for each commit message,
-find if commit message has a number in it.
-If it does, compare number with the numbers in list.
-If there is a match, increment count.
* Assign numberOfFixes=count;
done

In the code I already have, I am finding the other data I showed for each file in this way. and I want to do this task also in similar way.

Thanks

radoulov · February 21, 2011, 4:42pm

You could start with something like this.

Given your sample files, the following code:

awk 'END {
  for (F in f)
    print F, "bugs fixed:", f[F]
  }
NR == FNR {
  nl[$1]; next
  }
/^RCS file/ {
  n = split($NF, t, /[/,]/)
  fn = t[n -1]
  }
match($0, /[Bb][Uu][Gg] [0-9][0-9]*/) && substr($0, RSTART + 4, RLENGTH - 4) in nl {
  f[fn]++
  }' numlist data

produces:

AptPreferenceConstants.java bugs fixed: 2
TypesUtil.java bugs fixed: 1

You should try to integrate this code in your main script.

sandeepk1611 · February 22, 2011, 5:41pm

Thanks Radoulov,
I will try to check this with the data I have.

Sandeep