HI all,
<html>
<body>
<div>
<table id="orderList">
<thead>
<tr>
<th>order number</th>
<th>order type</th>
<th>product type</th>
<th>status</th>
<th>status date</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><span id="orderLink">24978900</a></td>
<td><span id="orderType">Provide</span></td>
<td><span id="productType">Prod1</span></td>
<td><span id="status">Complete</span></td>
<td><span id="statusDate">18/12/09</span></td>
<td><span id="bucket"></span></td>
</tr><tr class="even">
<td><span id="orderLink">27004805</a></td>
<td><span id="orderType">Cease</span></td>
<td><span id="productType"></span></td>
<td><span id="status">Rejected</span></td>
<td><span id="statusDate">17/02/10</span></td>
</tr>
</tbody>
</table>
</div>
</body>
</html>
the desire result will be:
24978900
The last order that is "Complete"; order number is a seq so newer numbers are always at the botton.
Thanks
Here's one way to do it with Perl -
$
$ cat -n f5
1 <html>
2 <body>
3 <div>
4 <table id="orderList">
5 <thead>
6 <tr>
7 <th>order number</th>
8 <th>order type</th>
9 <th>product type</th>
10 <th>status</th>
11 <th>status date</th>
12 </tr>
13 </thead>
14 <tbody>
15 <tr class="odd">
16 <td><span id="orderLink">24978900</a></td>
17 <td><span id="orderType">Provide</span></td>
18 <td><span id="productType">Prod1</span></td>
19 <td><span id="status">Complete</span></td>
20 <td><span id="statusDate">18/12/09</span></td>
21 <td><span id="bucket"></span></td>
22 </tr><tr class="even">
23 <td><span id="orderLink">27004805</a></td>
24 <td><span id="orderType">Cease</span></td>
25 <td><span id="productType"></span></td>
26 <td><span id="status">Rejected</span></td>
27 <td><span id="statusDate">17/02/10</span></td>
28 </tr>
29 </tbody>
30 </table>
31
32 </div>
33 </body>
34 </html>
$
$ perl -lne 'BEGIN{undef $/}while (/.*<tr.*?"orderLink">(\d+)<.*?>Complete<.*?\/tr>.*/msg){print $1}' f5
24978900
$
$
tyler_durden
Actually after playing a little bit with AWK, i found
$ awk -F"[><]" ' /orderLink/ { f=1; _ord=$5; } f && /status/ { $5="Complete"; f =0; print _ord", " $5}' /tmp/9054329.htm | tail -1
Thanks
drewk
March 22, 2010, 1:36pm
4
Is the HTML file local (ie, really a file) or on the web (ie, something you need to use wget to get?)
This PRE regex would get the number you want from that data:
/<html>.*?<td><span id="orderLink">(.*?)</a>/s
What do you mean by interate? What are the conditions for what you are looking for or rejecting?
Please be more specific.
On similar lines as the awk script -
$
$ perl -lne '/.*orderLink">(\d+)<.*/ and $x=$1; /.*>Complete<.*/ and print $x' f5
24978900
$
tyler_durden
drewk:
Is the HTML file local (ie, really a file) or on the web (ie, something you need to use wget to get?)
This PRE regex would get the number you want from that data:
/<html>.*?<td><span id="orderLink">(.*?)</a>/s
What do you mean by interate? What are the conditions for what you are looking for or rejecting?
Please be more specific.
Hi drewk ,
The file it is already on my local machine , first use wget to login and download the page i was need. Did it this way mainly because did not know how to do it online ( without downloading the file). Some people mention using links maybe for the next version .
Sorry about my mispeling "itinerate".
Thanks
drewk
March 22, 2010, 8:46pm
7
OK -- itinerate
Try Tyler's perl script (either) with wget or curl:
curl "http://www.yururl.com" | perl -lne 'BEGIN{undef $/}while (/.*<tr.*?"orderLink">(\d+)<.*?>Complete<.*?\/tr>.*/msg){print $1}'
That will download and itinerate
drewk:
OK -- itinerate
Try Tyler's perl script (either) with wget or curl:
curl "http://www.yururl.com" | perl -lne 'BEGIN{undef $/}while (/.*<tr.*?"orderLink">(\d+)<.*?>Complete<.*?\/tr>.*/msg){print $1}'
That will download and itinerate
Thanks, i will have a look.
There is a new recuriments. I was ask not to search for Status = Completed but all the others differents thatn Rejected.
Can this be done using the current awk ?
$ awk -F"[><]" ' /orderLink/ { f=1; _ord=$5; } f && /Rejected/ {
_sta=$5; f=0; print _ord ","}' f1 | tail -1
Thanks in advance
I don't quite understand this statement. Do you want to fetch orderLinks -
(a) with "Rejected" status ?
(b) with statuses other than "Complete" and "Rejected" ?
(c) with statuses other than "Rejected" ?
I shall assume that you want (a).
Can this be done using the current awk ?
$ awk -F"[><]" ' /orderLink/ { f=1; _ord=$5; } f && /Rejected/ {
_sta=$5; f=0; print _ord ","}' f1 | tail -1
...
Just try it on your HTML and see for yourself !
You have your HTML file, you have your awk script; what's stopping you from testing it out ?
Here's what I see when I run it on the HTML file you supplied in your first post -
$
$ cat -n f5
1 <html>
2 <body>
3 <div>
4 <table id="orderList">
5 <thead>
6 <tr>
7 <th>order number</th>
8 <th>order type</th>
9 <th>product type</th>
10 <th>status</th>
11 <th>status date</th>
12 </tr>
13 </thead>
14 <tbody>
15 <tr class="odd">
16 <td><span id="orderLink">24978900</a></td>
17 <td><span id="orderType">Provide</span></td>
18 <td><span id="productType">Prod1</span></td>
19 <td><span id="status">Complete</span></td>
20 <td><span id="statusDate">18/12/09</span></td>
21 <td><span id="bucket"></span></td>
22 </tr><tr class="even">
23 <td><span id="orderLink">27004805</a></td>
24 <td><span id="orderType">Cease</span></td>
25 <td><span id="productType"></span></td>
26 <td><span id="status">Rejected</span></td>
27 <td><span id="statusDate">17/02/10</span></td>
28 </tr>
29 </tbody>
30 </table>
31
32 </div>
33 </body>
34 </html>
$
$ awk -F"[><]" ' /orderLink/ { f=1; _ord=$5; } f && /Rejected/ { _sta=$5; f=0; print _ord ","}' f5
27004805,
$
Is this what you wanted ?
In any case, you could probably simplify the script thusly -
awk -F"[><]" '/orderLink/ {f=1; ord=$5} f && /Rejected/ {print ord}' f5
tyler_durden
Sorry for my terrible writting.
First i was asked to search for orderLinks with status = Completed. But it was too many exception (other statuses to be consider) , so know i rather do a "different than" Rejected instead.
In my first example i need to retrive the:
24978900
i added a grep at the end of the awk
awk -v telf="$1" -F"[><]" ' /orderLink/ { f=1; _ord=$5; } f && /productType/ {_pro=$5; f=1 ;} f && /status/ { $5; f=0; print telf","_ord", "_pro"," $5}' /tmp/$1.htm | grep -v Rejected
That returns all NOT Rejected,