HTML to csv

ganga.dharan · January 22, 2008, 9:03am

Hi !! Could you please let me know of how can a html file be converted to csv.. I am looking out for a script which could do that.. Please find the below example

<HTML><BODY><TABLE>
<TR><TD>Parent CR</TD><TD>ChildCR</TD><TD>Title</TD><TD>Description</TD></TR>
</TABLE></BODY></HTML>
<HTML><BODY><TABLE>
<TR><TD>10048</TD><TD>14950</TD><TD>CR 10048 QA Issue</TD><TD>The AutoSett xml message generated got rejected in dBCRis. </TD></TR>
<TR><TD>10048</TD><TD>15144</TD><TD>CR 10048 QA Issue</TD><TD>In the below message

The csv should not have the html headers after transformation. Thanks in advance !!

dennis.jacob · January 23, 2008, 12:35am

Try this:

sed -n '/<TR/p' filename | sed 's/\(<TR><TD>\)\(.*\)\(<\/TD><TD>\)\(.*\)\(<\/TD><TD>\)\(.*\)\(<\/TD><TD>\)\(.*\)\(<\/TD><\/TR>\)/\2,\4,\6,\8/'

Input:

Output:

ganga.dharan · January 23, 2008, 6:15am

thanks jacob.. it works with the piece of example that you had shown.. but when i tried with my html, it doesn't work.. Would be helpful if you can look at the attachment. The attachment is the csv image after transformation

varungupta · January 24, 2008, 5:50pm

Hey,

Are you taking "comma saperated value" in unix file iteself or you are taking it in some excel file ?
If its excel file then let me know the logic for that.

Thanks