Hello everyone, I'm new to this forum and i am new as a shell scripter.
my problem is to have html files in a directory and I would like to extract from these some data that lies between two different lines
Here's my situation
<td align="default"> oxidizability (mg / l):
data_to_extract
</ td>
this structure is repeated in all of these files
how do I use awk to do this extraction and enter the data into a file. txt?
Thank you all
Try this:
awk 'p && /<\/ td>/{p=0}
p
/<td align="default">/{p=1}' htmlfile > file.txt
ok thanks for the answer but i need a customization of the command
i have a grooup of html files inside a directory and inside them lies a structure
<td align="default"> oxidizability (mg / l):
data_to_extract
</td>
"data_to_extract" is the value that changing while
<td align="default"> oxidizability (mg / l):
and
</td>
remains the same
so, assuming i have 3 html files, the resultant file.txt should be something like that
<td align="default"> oxidizability (mg / l):
34
</td> <td align="default"> oxidizability (mg / l):
45
</td> <td align="default"> oxidizability (mg / l):
56
</td>
i need exaclty do this
You could try something like:
awk '
/<td align="default">/{p=1; s=$0}
p && /<\/td>/{print $0 FS s; s=""; p=0}
p' file >> newfile
sorry but still don't work . i need to filter exactly
<td align="default"> oxidizability (mg / l):
not
<td align="default">
ctsgnb
December 17, 2010, 12:23pm
6
Please give a representative sample of input file and expected output file.
ok i made some editings starting from your example!! Now it Works!! You're was very helpfull thank you very much!!!