Conversion of xhtml data into csv format using dump utility

Hi Unix Gurus,

I tried to convert the attached xhtml table content into csv file using unix shell script (lynx -dump filename) and got the below results:

 
Title ID Owner Priority Estimate Project Change Date Changed By
Complexity Create Date Created By Detail Estimate Total De tail
Estimate Done Feature Group Reference Source Split From Split From ID
Sprint Sprint State Status Story Status Team To Do Old ID Story Team
RR - Bug in m_tgt_fact_org_hierarchy mapping D-04980 Geraldraj S
elvaraj 3 - Could Have 5.00 2012 BI Project 05/23/2012 08:32 Sujith
Mukundan 10/21/2010 14:11 Karthik Iyengar 0.00 0.00 1.3 Ratings
Reporting Custom er Sprint 10 (10 -23 May) Closed Accepte d
RR New User - Password Reset B-38882 Geraldraj Selvaraj 1 - Must Have
3.00 2012 BI Project 02/01/2012 12:49 Administrator 08/12/2011 11:48
Rakesh Sinha 26.00 26.00 6.00 1.3 Ratings Reporting EU Compliance
Request - Counts Generation -ETL Analysis B-38881 Sprint 2 (19 Jan- 1
Feb) C losed Accepted

But I want the output as follows:

Row1:

Title,,ID,Owner,Priority,Estimate,Project,Change Date,Changed By,Complexity,Create Date,Created By,Detail Estimate,Total Detail Estimate,Done,Feature Group,Reference,Source,Split From,Split From ID,Sprint,Sprint State,Status,Story Status,Team,To Do,Old ID,Story Team
 

Row 2:

RR - Bug in m_tgt_fact_org_hierarchy mapping,,D-04980,Geraldraj Selvaraj,3 - Could Have,5.00,2012 BI Project,23/05/2012,Sujith Mukundan,,21/10/2010,Karthik Iyengar,0.00,0.00,,1.3 Ratings Reporting,,Customer,,,Sprint 10 (10 -23 May),Closed,Accepted,,,,,
 

Row 3:

RR New User - Password Reset,,B-38882,Geraldraj Selvaraj,1 - Must Have,3.00,2012 BI Project,01/02/2012,Administrator,,12/08/2011,Rakesh Sinha,26.00,26.00,6.00,1.3 Ratings Reporting,,,EU Compliance Request - Counts Generation -ETL Analysis,B-38881,Sprint 2 (19 Jan- 1 Feb),Closed,Accepted,,,,,

Could you please advice me if there is any option in dump utility to convert the html table content into above expected format? Or else please advise me any other method (unix script) to resolve this.

I would highly appreciate your help on this.

Well, can you

cat

the file?
You should also be able to use awk, sed, grep, and other tools.

Hi joeygm

Please refer to the attached file (v4.html) in the original mail.

html is a type of file. But, you should be able to cat or other unix commands on it. So, that is to answer your issue regarding 'other than dump?'

So, what are you trying to extract?
header line?
RR line?
RR - New line?

Always just three items?

joeyg,

Could you please download the v4.html file into your pc and please open it in notepad & that will show you the "cat" of the content.

I am trying to extract the attribute values for all the lines and the number of lines will be more than 50k.

I understand all that, but am trying to get you to explain what you are trying to extract. By attribute, do you mean any line with:

<td class=

why not -dump it with 1300 or more columns (check how many you need) like this

lynx -dump -width=1300 filename.xhmtl >filename.txt

It looks like the dump has fixed width that can easily be imported as a csv-file. Else respond and something can be done to replace the blanks with commas in awk.