Parse excel file with html on each cell

<DIV><P>Pr�-condi��o aceder ao ecr� Home do MRS.</P></DIV><DIV><P>OK.</P></DIV><DIV><P>Seleccionar Pesquisa de Recep��o Directa.</P></DIV><DIV><P>Confirmar que abriu ecr� de Recep��o Directa.</P></DIV><DIV>

I don't see any columns in the output below?

Your input doesn't look like an Excel file, what is it?

Dear Corona688,

Thank you for you're reply...

This is an extraction from Microsoft MTM...the extraction is made to a excel file and in one of the cells it comes this:

<DIV><P>Pr�-condi��o aceder ao ecr� Home do MRS.</P></DIV><DIV><P>OK.</P></DIV><DIV><P>Seleccionar Pesquisa de Recep��o Directa.</P></DIV><DIV><P>Confirmar que abriu ecr� de Recep��o Directa.</P></DIV><DIV>

Is was wondering how could I turn it into this....
Column A1(excel)

Pr�-condi��o aceder ao ecr� Home do MRS.                        

column B1(excel)


Column A2(excel)

Seleccionar Pesquisa de Recep��o Directa.

column B2(excel)

Confirmar que abriu ecr� de Recep��o Directa.

Best regards,
Rui Oliveira

You can't generate a Excel format file very easily, however you could build a delimited file. The most common is Comma Separated Values file and on Windows associated with a .csv extension that Excel will read in. The format of the file is that each record is a row and each column is separated by a comma, like this:-

Cell A1,Cell B1,Cell C1,Cell D1
Cell A2,Cell B2,Cell C2,Cell D2
Cell A3,Cell B3,Cell C3,Cell D3

If you can work out how to split your input, you can build output records to suit.

  • What have you tried so far?
  • What output/errors do you get?
  • What OS and version are you using?
  • What are your preferred tools? (C, shell, perl, awk, etc.)
  • What logical process have you considered? (to help steer us to follow what you are trying to achieve)

Most importantly, What have you tried so far?

There are probably many ways to achieve most tasks, so giving us an idea of your style and thoughts will help us guide you to an answer most suitable to you so you can adjust it to suit your needs in future.

We're all here to learn and getting the relevant information will help us all.


Hello Robin, thanks for you're quick reply

Well I haven't tried much yet because I was struggling to find the best way to do it... I simply used

sed 's/<[^>]*>//g' file > newfile

to remove the tags but I was trying to see if it is real hard to make the way I told in the answer above....

I'll try to achieve what you told me to...this is my best answer!:slight_smile:

So, what do you have in newfile? I've just got the tags stripped out, but now no way to split it up, unless the full stops are consistent and can be used.

Is it always going to be more that the original input will be like:-

<DIV><P>Cell A1.</P></DIV><DIV><P>Cell B1.</P></DIV><DIV><P>Cell A2.</P></DIV><DIV><P>Cell B2.</P></DIV><DIV><P>Cell A3.</P></DIV><DIV><P>Cell B3.</P></DIV><DIV><P>Cell A4.</P></DIV><DIV><P>Cell B4.</P></DIV>

If that is certain, then maybe we're better working from that.


Yes there are always stops in each sentence....

newfile give me something like this...