owwow14
1
Is there an efficient
awk
that can count the number of lines that occur in between two tags.
For instance, consider the following text:
<s>
Hi PP -
my VBD -
name DT -
is NN -
. SENT .
</s>
<s>
Her PP -
name VBD -
is DT -
the NN -
same WRT -
. SENT -
</s>
I am interested to know that in between the first set of tags
<s>
and
</s>
there are 5 words (indicated by 5 lines) and in the second set of tags
<s>
and
</s>
there are 6 words (indicated by 6 lines).
The desired output would be as follows:
5
6
How can I write an
awk
that uses my
XML
tags as markers to count this information and obtain my desired output?
RudiC
2
Any attempts from your side?
---------- Post updated at 17:21 ---------- Previous update was at 17:16 ----------
Be it as it may... this one is not too daunting. Try
awk '/<s>/ {ST=NR} /<\/s>/{print NR-ST-1}' file
5
6
1 Like
awk -f ow.awk myFile
where ow.awk
is:
$0 ~ "<s>",$0~"</s>" {c++}
$0 ~ "</s>" {print c-2;c=0}
1 Like
Hello owwow14,
Following may also help you in same.
awk '/<s>/{getline;while($0 !~ /<\/s>/){A++;getline};print A;A=""}' Input_file
Output will be as follows.
5
6
Thanks,
R. Singh
1 Like
anbu23
5
$ awk -F"\n" -v RS="</s>\n" ' { print NR,NF-2 } ' file
1 5
2 6
1 Like