Get HTML table

Hi all,

I have a html that contains several tables in it. Need to extract the data from one of them named "orderList". Is it any easy way without using loops.

Thanks

Hello, valigula, and welcome to the forum.

There is almost no chance anyone will be able to help you unless you show us what the html looks like and what it is that you want to extract from it. In short, share a sample of the input data (your html) and a sample of the desired output (how you want it to look afterwards). Make sure to put each sample between code tags so that formatting is not lost.

Regards,
Alister

Thanks for your reply

that is a sort example of the html code

<html>
<body>
    <table>
        <thead>
            <tr>
                <th>number</th>
                <th>product type</th>
                <th>service activation date</th>
                <th>cease date</th>
                <th>reference</th>
                <th>operator</th>
            </tr>
        </thead>
        <tbody>
            <tr>
                <td><span id="test-number"><span>01214413277</span></a></td>
                <td><span id="test-productType">Product Name</span></td>
                <td><span id="test-serviceActivationDate">11/12/09</span></td>
                <td><span id="test-ceaseDate"></span></td>
                <td><span id="test-reference">123456789</span></td>
            </tr>
        </tbody>
    </table>
</div>
  <div>
  <table id="orderList">
    <thead>
      <tr>
        <th>order number</th>
        <th>order type</th>
        <th>product type</th>
        <th>status</th>
        <th>status date</th>
      </tr>
    </thead>
    <tbody>
      <tr class="odd">
        <td><span id="orderLink">24904093</a></td>
        <td><span id="orderType">Provide</span></td>
        <td><span id="productType">Product Name</span></td>
        <td><span id="status"></span></td>
        <td><span id="statusDate">15/12/09</span></td>
      </tr>
    </tbody>
  </table>
</body>
</html>

example: i actually need is to find out the status column on the orderList table is completed Y, any other case including null will be N.

24904093, N

Try this code:

awk -F"[><]" '/orderLink/ { f=1; _ord=$5; } f && /status/ { $5=$5?$5:"N";f=0; print _ord","$5}' file

It worked for the sample input. Please check it for the complete html input and let us know how it goes.

input & output:

/home/usr2 >cat file
<html>
<body>
    <table>
        <thead>
            <tr>
                <th>number</th>
                <th>product type</th>
                <th>service activation date</th>
                <th>cease date</th>
                <th>reference</th>
                <th>operator</th>
            </tr>
        </thead>
        <tbody>
            <tr>
                <td><span id="test-number"><span>01214413277</span></a></td>
                <td><span id="test-productType">Product Name</span></td>
                <td><span id="test-serviceActivationDate">11/12/09</span></td>
                <td><span id="test-ceaseDate"></span></td>
                <td><span id="test-reference">123456789</span></td>
            </tr>
        </tbody>
    </table>
</div>
  <div>
  <table id="orderList">
    <thead>
      <tr>
        <th>order number</th>
        <th>order type</th>
        <th>product type</th>
        <th>status</th>
        <th>status date</th>
      </tr>
    </thead>
    <tbody>
      <tr class="odd">
        <td><span id="orderLink">24904093</a></td>
        <td><span id="orderType">Provide</span></td>
        <td><span id="productType">Product Name</span></td>
        <td><span id="status"></span></td>
        <td><span id="statusDate">15/12/09</span></td>
      </tr>
      <tr class="odd">
        <td><span id="orderLink">904093</a></td>
        <td><span id="orderType">Provide</span></td>
        <td><span id="productType">Product Name</span></td>
        <td><span id="status">Y</span></td>
        <td><span id="statusDate">15/12/09</span></td>
      </tr>
    </tbody>
  </table>
</body>
</html>
/home/ansujohn >
/home/ansujohn >cat file
<html>
<body>
    <table>
        <thead>
            <tr>
                <th>number</th>
                <th>product type</th>
                <th>service activation date</th>
                <th>cease date</th>
                <th>reference</th>
                <th>operator</th>
            </tr>
        </thead>
        <tbody>
            <tr>
                <td><span id="test-number"><span>01214413277</span></a></td>
                <td><span id="test-productType">Product Name</span></td>
                <td><span id="test-serviceActivationDate">11/12/09</span></td>
                <td><span id="test-ceaseDate"></span></td>
                <td><span id="test-reference">123456789</span></td>
            </tr>
        </tbody>
    </table>
</div>
  <div>
  <table id="orderList">
    <thead>
      <tr>
        <th>order number</th>
        <th>order type</th>
        <th>product type</th>
        <th>status</th>
        <th>status date</th>
      </tr>
    </thead>
    <tbody>
      <tr class="odd">
        <td><span id="orderLink">24904093</a></td>
        <td><span id="orderType">Provide</span></td>
        <td><span id="productType">Product Name</span></td>
        <td><span id="status"></span></td>
        <td><span id="statusDate">15/12/09</span></td>
      </tr>
      <tr class="odd">
        <td><span id="orderLink">904093</a></td>
        <td><span id="orderType">Provide</span></td>
        <td><span id="productType">Product Name</span></td>
        <td><span id="status">Y</span></td>
        <td><span id="statusDate">15/12/09</span></td>
      </tr>
    </tbody>
  </table>
</body>
</html>
/home/usr1 >awk -F"[><]" '/orderLink/ { f=1; _ord=$5; } f && /status/ { $5=$5?$5:"N";f=0; print _ord","$5}' file
24904093,N
904093,Y

Test in a couple of cases ( with a few modifications and this is just owesome!)

I will totally teted and give you the feed-back in how it went.

Thanks

---------- Post updated at 11:43 AM ---------- Previous update was at 07:06 AM ----------

It works perfectly, i completely tested with over 200 cases and works fine!!!

Thanks