Nice. It worked. Now, how about taking care of HTML entities like ?
This is a symbol (down arrow). Gawk doesn't return it. But my parser does. Any ways to set it with gawk?
ok, the first time i do my test, i copied the html using the browser's view source. But since you mentioned wget, so here's how i do my next test. Using wget to download Google
$ wget 209.85.132.104
$ awk -F'>' '/^a href/{split($1,F,"\"");print F[2],$NF}' RS='<' index.html
http://mail.google.com/mail/?hl=en&tab=wm Gmail
http://www.google.com/intl/en/options/
/url?sa=p&pref=ig&pval=3&q=http://www.google.com/ig%3Fhl%3Den%26source%3Diglk&usg=AFQjCNFA18XPfgb7dKnXfKz7x7g1GDH1tg iGoogle
/preferences?hl=en Settings
https://www.google.com/accounts/Login?hl=en&continue=http://209.85.132.104/ Sign in
/advanced_search?hl=en Advanced Search
/language_tools?hl=en Language Tools
/intl/en/ads/ Advertising�Programs
/services/ Business Solutions
/intl/en/about.html About Google
http://www.google.com/ncr Go to Google.com
/intl/en/privacy.html Privacy
$ ruby test.rb
-->http://www.google.com/imghp?hl=en&tab=wi, Images
-->http://video.google.com/?hl=en&tab=wv, Videos
-->http://maps.google.com/maps?hl=en&tab=wl, Maps
-->http://news.google.com/nwshp?hl=en&tab=wn, News
-->http://www.google.com/prdhp?hl=en&tab=wf, Shopping
-->http://mail.google.com/mail/?hl=en&tab=wm, Gmail
-->http://www.google.com/intl/en/options/, more �
-->/url?sa=p&pref=ig&pval=3&q=http://www.google.com/ig%3Fhl%3Den%26source%3Diglk&usg=AFQjCNFA18XPfgb7dKnXfKz7x7g1GDH1tg, iGoogle
-->/preferences?hl=en, Settings
-->https://www.google.com/accounts/Login?hl=en&continue=http://209.85.132.104/, Sign in
-->/advanced_search?hl=en, Advanced Search
-->/language_tools?hl=en, Language Tools
-->/intl/en/ads/, Advertising*Programs
-->/services/, Business Solutions
-->/intl/en/about.html, About Google
-->http://www.google.com/ncr, Go to Google.com
-->/intl/en/privacy.html, Privacy
If you notice at Google main page, there is a link called "more" right at the top, and the down arrow key is next to it, which is reflected in the ruby output as