Count words/lines between two tags using awk

Is there an efficient

awk

that can count the number of lines that occur in between two tags.
For instance, consider the following text:

<s>
Hi PP -
my VBD -
name DT -
is NN -
. SENT .
</s>
<s>
Her PP -
name VBD -
is DT -
the NN -
same WRT -
. SENT -
</s>

I am interested to know that in between the first set of tags

<s>

and

</s>

there are 5 words (indicated by 5 lines) and in the second set of tags

<s>

and

</s>

there are 6 words (indicated by 6 lines).

The desired output would be as follows:

5
6

How can I write an

awk

that uses my

XML

tags as markers to count this information and obtain my desired output?

Any attempts from your side?

---------- Post updated at 17:21 ---------- Previous update was at 17:16 ----------

Be it as it may... this one is not too daunting. Try

awk '/<s>/ {ST=NR} /<\/s>/{print NR-ST-1}' file
5
6
1 Like

awk -f ow.awk myFile where ow.awk is:

$0 ~ "<s>",$0~"</s>" {c++}
$0 ~ "</s>" {print c-2;c=0}
1 Like

Hello owwow14,

Following may also help you in same.

awk '/<s>/{getline;while($0 !~ /<\/s>/){A++;getline};print A;A=""}'  Input_file

Output will be as follows.

5
6

Thanks,
R. Singh

1 Like
$ awk -F"\n" -v RS="</s>\n" ' { print NR,NF-2 } ' file
1 5
2 6
1 Like