Jotne
March 20, 2013, 1:35pm
1
I have some small problem with my code.
data.html
<TD class="statuscol2">c</TD>
<TD class="statuscol3">18</TD>
<TD class="statuscol4"><SPAN TITLE="#04">test4</SPAN></TD>
<TD class="statuscol5">OFF</TD>
awk '/col2/ {
for (i=1; i<=4; i++)
{
gsub(/^[ \t]+|<[^>]*>/, "");
printf "%s,", $0;
getline
}
print ""
}' data.html
c,18,test4,OFF,
This works fine, but sometimes there are more than one data filed in one line like this.
data.html
<TD class="statuscol2">c</TD>
<TD class="statuscol3">18</TD>
<TD class="statuscol4"><SPAN TITLE="#04">test4</SPAN></TD>
<TD class="statuscol5">OFF</TD><br>id8<br>
This gives out c,18,test4,OFFid8,
How do I only get first hit one the line and get c,18,test4,OFF,
Yoda
March 20, 2013, 1:58pm
2
For extracting all tag data:
awk -F'[<>]' ' {
for ( i = 3; i <= NF; i += 2 )
{
if ( $i != "" )
printf "%s,", $i
}
} END {
printf "\n"
} ' data.html
For extracting just first tag data:
awk -F'[<>]' ' {
for ( i = 3; i <= NF; i += 2 )
{
if ( $i != "" && f == 0)
{
f = 1
printf "%s,", $i
}
}
f = 0
} END {
printf "\n"
} ' data.html
Jotne
March 20, 2013, 2:08pm
3
It worked for the example, but not for the whole data.
This is a repetitive task that will give many lines.
I do search for col2
as a trigger to start. Then I need f.eks only next 10 lines.
Here line 5 give extra data I do not need.
<TR class="c">
<TD class="statuscol1">no</TD>
<TD class="statuscol2">c</TD>
<TD class="statuscol3">17</TD>
<TD class="statuscol4"><SPAN TITLE="#104">status</SPAN></TD>
<TD class="statuscol5"><a href="#" class="tooltip">ON<span>host<br>made<br></span></a></TD>
<TD class="statuscol6">ON</TD>
<TD class="statuscol7">3342</TD>
<TD class="statuscol8">37397</TD>
<TD class="statuscol9"><SPAN TITLE="">intra</SPAN></TD>
<TD class="statuscol10">20.03.13 11:01:48</TD>
<TD class="statuscol11">07:08:13</TD>
<TD class="statuscol12">073D</TD>
<TD class="statuscol13">Status42</TD>
<TD class="statuscol14">by local</TD>
<TD class="statuscol15"><SPAN CLASS="idlesec_normal">00:00:05</SPAN></TD>
<TD class="statuscol16">OK</TD>
</TR>
eks output
c,17,status,ON,ON,3342,37397,intra,20.03.13 11:01:48,
I get:
c,17,status,ONhostmade,ON,3342,37397,intra,20.03.13 11:01:48,
Yoda
March 20, 2013, 2:42pm
4
My 2nd suggestion should work with some minor changes:
awk -F'[<>]' ' /col2/ {
cf = 1
} cf == 1 && c <= 8 {
++c
for ( i = 3; i <= NF; i += 2 )
{
if ( $i != "" && f == 0 )
{
f = 1
printf "%s,", $i
}
}
f = 0
} c == 9 {
printf "\n"
cf = 0
c = 0
} ' data.html
1 Like