Hi All!
I have obtained following output from a tool "pdftohtml" ::
So, my input is as under:
<text top="246" left="160" width="84" height="16" font="3">Business purpose</text>
<text top="260" left="506" width="220" height="16" font="3">giving the right information and new insights </text>
<text top="296" left="160" width="67" height="16" font="3">Characteristic</text>
<text top="296" left="278" width="111" height="16" font="3">Operational processing</text>
<text top="296" left="506" width="120" height="16" font="3">Informational processing</text>
<text top="318" left="160" width="55" height="16" font="3">Orientation</text>
<text top="318" left="278" width="56" height="16" font="3">Transaction</text>
<text top="318" left="506" width="42" height="16" font="3">Analysis</text>
<text top="340" left="160" width="43" height="16" font="3">Function</text>
------
----
Now, i want to write a shell script that checks the value of "left" attribute in in each <text> tag and if this value is equal to 160, it saves the content enclosed inside a particular <text> tag in an arbitrary file inside <p> tag.
So, i want output as follows:
<p>Business purpose</p>
<p>Characteristic</p>
<p>Orientation</p>
<p>Function</p>
------
-----
Any help will be Truly Appreciated. Thanks in advance !!!
Something like this?
sed '/left="160"/s/\(.*>\)\(.*\)\(<.*\)/\1<p>\2<\/p>\3/' file > outfile
Regards
Thanks Franklin. You Rock!!!
hi,
you may use perl ASX to process it.
input:
<?xml version="1.0"?>
<data>
<text top="246" left="160" width="84" height="16" font="3">Business purpose</text>
<text top="260" left="506" width="220" height="16" font="3">giving the right information and new insights </text>
<text top="296" left="160" width="67" height="16" font="3">Characteristic</text>
<text top="296" left="278" width="111" height="16" font="3">Operational processing</text>
<text top="296" left="506" width="120" height="16" font="3">Informational processing</text>
<text top="318" left="160" width="55" height="16" font="3">Orientation</text>
<text top="318" left="278" width="56" height="16" font="3">Transaction</text>
<text top="318" left="506" width="42" height="16" font="3">Analysis</text>
<text top="340" left="160" width="43" height="16" font="3">Function</text>
</data>
code:
package Leo;
use XML::SAX::Base;
@ISA=qw(XML::SAX::Base);
sub start_document{
my $self=shift;
my $doc=shift;
}
sub start_element{
my $self=shift;
my $element=shift;
foreach my $key (keys %{$element->{Attributes}}){
my $attr=$element->{Attributes}->{$key};
$flag=1 if ($attr->{Name} eq "left" && $attr->{Value}==160);
}
}
sub characters{
my $self=shift;
my $char=shift;
if ($flag==1){
print "<P>",$char->{Data},"</P>\n";
$flag=0;
}
}
1
use XML::SAX;
use Leo;
$parser=XML::SAX::ParserFactory->parser(Handler=>Leo->new);
$parser->parse_uri("a.txt");
result:
you expectation