Hello ,
I got html file , these file are normal html as we can see .
what i would like to do is in this html file , i want to print only string start with double quote and end with double quote by line by line.
<tr><td valign=top>25.</td><td><A href="../XMLL/CatalogL.htm">XML</a></td></tr>
<tr><td valign=top>26.</td><td><A href="../XMLNQ/CatogXML-LINQ.htm">
the out put should look like
"../XMLL/CatalogL.htm"
"../XMLNQ/CatogXML-LINQ.htm"
is that simple or a lot work to do or can anybody share code.
Many thanks
nawk -F'"' '{for(i=2;i<=NF;i+=2) print FS $i FS}' myFile.html
In perl:
perl -ne '/".+"/; print $& . "\n"' file
Shorter perl:
perl -nle 'print /".+"/g' file
Using sed
sed -n 's/^.*href=\(".[^"]*"\).*$/\1/p' file
Another sed:
sed 's/.*=//;s/>.*//' file
awk 'BEGIN {FS="\""} {print "\""$2"\""}' myFile
Many thanks for all reply , it work great, but want to ask a question , if i want to modify
if string startwith <div class="codedom"> and endwith <code> , i mean
stringstart = <div class="codedom">
stringend = <code>
i want to print whole thing like
<div class="codedom"> ......................................
......................................................................
.......................................................................
......................................................................
.......................................................................
......................................................................
.............................................................<code>
how do i modify this because i don't see any start and end from above code
thanks
perl -nl -0777 -e '$,="\n";print /<div class="codedom">.*?<code>/sg' file