How to remove only html tags inside a file?

Hi All,

I have following example file

i want to remove all html tags only,

Input File:

<html>
<head>
<title>Software Solutions Inc., </title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
</head>
<body bgcolor=white leftmargin="0" topmargin="0" marginwidth="00" marginheight="0" class=NormalFont>
<table ID="Table2" Bordercolor=black border=2 cellspacing=2 cellpadding=2>
<TR><TD colspan=4 align=left bgcolor="yellow"><font color=blue ><b> Iswar Ramamoorthy</b></font></TD> </TR>
<tr>
<td align=center><b>Date</b></td>
<td align=center><b>Total Hours</b></td>
<td align=center><b>Total IN Time</b></td>
<td align=center><b>Total Break Hours</b></td>
</tr>

&lt;/table&gt;

	
	&lt;table ID="Table2" Bordercolor=black border=2 cellspacing=2 cellpadding=2&gt;
			&lt;TR&gt;&lt;TD colspan=4 align=left bgcolor="yellow"&gt;&lt;font color=blue &gt;&lt;b&gt;Aman Jain&lt;/b&gt;&lt;/font&gt;&lt;/TD&gt; &lt;/TR&gt;
			&lt;tr&gt;
				&lt;td align=center&gt;&lt;b&gt;Date&lt;/b&gt;&lt;/td&gt;
				&lt;td align=center&gt;&lt;b&gt;Total Hours&lt;/b&gt;&lt;/td&gt;
				&lt;td align=center&gt;&lt;b&gt;Total IN Time&lt;/b&gt;&lt;/td&gt;
				&lt;td align=center&gt;&lt;b&gt;Total Break Hours&lt;/b&gt;&lt;/td&gt;
			&lt;/tr&gt;
			
	
&lt;/table&gt;

	
	&lt;table ID="Table2" Bordercolor=black border=2 cellspacing=2 cellpadding=2&gt;
			&lt;TR&gt;&lt;TD colspan=4 align=left bgcolor="yellow"&gt;&lt;font color=blue &gt;&lt;b&gt;Anilkumar Kaandukuri&lt;/b&gt;&lt;/font&gt;&lt;/TD&gt; &lt;/TR&gt;
			&lt;tr&gt;
				&lt;td align=center&gt;&lt;b&gt;Date&lt;/b&gt;&lt;/td&gt;
				&lt;td align=center&gt;&lt;b&gt;Total Hours&lt;/b&gt;&lt;/td&gt;
				&lt;td align=center&gt;&lt;b&gt;Total IN Time&lt;/b&gt;&lt;/td&gt;
				&lt;td align=center&gt;&lt;b&gt;Total Break Hours&lt;/b&gt;&lt;/td&gt;
			&lt;/tr&gt;
			
				
			&lt;tr class=normalfont &gt;
				&lt;td align=center&gt;11/16/2007&lt;/td&gt;
				&lt;td align=center&gt;1:16:0&lt;/td&gt;
				&lt;td align=center&gt;01:16&lt;/td&gt;
				&lt;td align=center&gt;0&lt;/td&gt;
			&lt;/tr&gt;

&lt;/table&gt;

	
	&lt;table ID="Table2" Bordercolor=black border=2 cellspacing=2 cellpadding=2&gt;
			&lt;TR&gt;&lt;TD colspan=4 align=left bgcolor="yellow"&gt;&lt;font color=blue &gt;&lt;b&gt;Arun  Sivaraman&lt;/b&gt;&lt;/font&gt;&lt;/TD&gt; &lt;/TR&gt;
			&lt;tr&gt;
				&lt;td align=center&gt;&lt;b&gt;Date&lt;/b&gt;&lt;/td&gt;
				&lt;td align=center&gt;&lt;b&gt;Total Hours&lt;/b&gt;&lt;/td&gt;
				&lt;td align=center&gt;&lt;b&gt;Total IN Time&lt;/b&gt;&lt;/td&gt;
				&lt;td align=center&gt;&lt;b&gt;Total Break Hours&lt;/b&gt;&lt;/td&gt;
			&lt;/tr&gt;

My expected result:

Software Solutions Inc

Iswar Ramamoorthy

Date
Total Hours
Total IN Time
Total Break Hours

Aman Jain

Date
Total Hours
Total IN Time
Total Break Hours

Anilkumar Kaandukuri

Date
Total Hours
Total IN Time
Total Break Hours

11/16/2007
1:16:0
01:16
0

............
...........

etc............

sed -n '/^$/!{s/<[^>]*>//g;p;}' filename

Or, with a bit different output:

lynx --dump filename

(the file must have htm[l] extension)

Or use html2text :slight_smile:

All the commands are doing good,

sed -n '/^$/!{s/<[^>]*>//g;p;}' filename

Please explain the above sed command

Thanks,
Thangaraju.