hello. i want to make an awk script to search an html file and output all the links (e.g .html, .htm, .jpg, .doc, .pdf, etc..) inside it. also, i want the links that will be output to be split into 3 groups (separated by an empty line), the first group with links to other webpages (.html .htm etc), the second group with links to images (.jpg .jpeg) and the third group with links to .pdf .doc or other downloadable files. and next to each link i want to output how many times each one occurs in the html file.
(i am only doing the links first, then once I have crakced this i will be able to do the other formats easily)
So I have currently got...
BEGIN{FS = " "}
{for (i=1; i<=NF;i++){if ($i ~ /^href/) {print $i}}
}
#
END{}
which prints out the word e.g href="index.html" > , I would like this to just print out...index.html and the number of times it appears in the webpage.
Any help/hints on how i could achieve the top paragraph would be a great help.