I have a large dataset with following structure;
C 0001 Carbon [C]
D SAR001 methane [CH3]
D SAR002 ethane
D SAR003 propane
D SAR004 butane
D SAR005 pentane
C 0002 Hydrogen [H]
C 0003 Nitrogen [N]
C 0004 Oxygen [O]
D SAR011 ozone
D SAR012 super oxide
C 0005 Sulphur
D SAR013 Hydrogen Sulphide [H2S]
D SAR014 Sulphuric acid
.
.
.
In this dataset, lines starting with C are the headings and those with D are the components of their headings. I want to count the number of components in each heading and desires the output as;
0001 5
0002 0
0003 0
0004 2
0005 2
.
.
.
The pseudo code can be;
grep ^C
count next lines with ^D
print [$2 of ^C] and [count of ^D]
restart loop