Counting a consecutive number in column 2

Hi,
I have a input file which contains following data

0 1
0 2
0 3
0 4
0 8 
0 9
0 11
1 1
1 2
1 6
1 7
1 8
1 9
2 1
2 11
2 12 
2 13

I would like to count a consecutive number in column 2 which are grouped by column 1 as well as obtain the maximum number of consecutive no. in each group.

So, results I want to get would be

0 4
0 2
0 1
1 2
1 4
2 1
2 3

As for maximum thing,

0 4
1 4
2 3

Thanks in advance

are you sure this isn't a homework?

That is a part of result which represents the stacking interaction in DNA. Column 1 represents time, Column 2 is residue in DNA participating stacking interaction. I just want to know the how many stacks are formed in a consecutive manner.
Since raw data is too complex to post here, I just omitted raw data and gave the example in order to explain more efficiently.

Try something like:

awk '$1>p || $2!=q+1{if(NR>1)print p,c; c=0} {p=$1; q=$2; c++} END{print p,c}' file
1 Like

Thanks. It really works.

Also, I would appreciate it a lot if you reply me how to obtain maximum value in the group.

The result I would like to get is following

0 4
1 4
2 3

Try something like:

awk '$1>p || $2!=q+1{if(c>m)m=c; c=0} $1>p{if(NR>1)print p,m; m=0} {p=$1; q=$2; c++} END{if(c>m)m=c; print p,m}' file
1 Like

Sincerely appreciate it !

You're welcome. I added a modification in the END section. This all only works if column 1 is grouped..

Sorry for bothering you. I would like to know how to get average of the number grouped by column 1.

If I have following data,

0 1
0 2
0 3
0 4
0 5
1 1
1 2
1 3

The result I would like obtain would be

0 3
1 2

Thanks in advance

Try:

awk '$1>p {if(NR>1)print p, t/n; t=n=0} {p=$1; n++; t+=$2} END{print p,t/n}'  file

--edit--
I added if(NR>1) in all suggestions

1 Like

Hi,
I am working with very similar data and Scrutinizer's answers have been very helpful. However, I was wondering how one would alter the output a bit. For example, if I had Ryan Kim's data in his post 'One more question':

0 1 
0 2 
0 3 
0 4 
0 5 
1 1 
1 2 
1 3

What if I wanted to print out all of the actual lines that correspond to a series of lines with at least n consecutive values in column 2? For example, if I had n=4 (or 5) and the above data, I would want to extract and print the following lines of data:

0 1 
0 2 
0 3 
0 4 
0 5 

If I had n=3, I would extract all of the lines from the original dataset.

I altered Scrutinizer's awk solution slightly to allow filtering the series based on the number of lines with consecutive values in column two:

awk '$1>p || $2!=q+1{if(NR>1)print p,c; c=0} {p=$1; q=$2; c++} END{if (c>4) {print p,c}} file

But, I can't figure out how to print the actual series of lines with the consecutive values in them. Any possible advice/explanations would be greatly appreciated!

Thank you very much in advance for any help.

Something like this?

awk '
  $1>p || $2!=q+1 {
    if(NR>1 && c>=n) print s
    c=0
    s=x
    p=$1
  }
  {
    s=s ORS $0
    q=$2
    c++
  }
  END{
    if(c>=n)print s
  }
' n=3 file

Hi Scrutinizer,
thanks very much, that works perfectly!