extracting domain names out of a text file

I am needing to extract and list domain names out of a very large text file. The text file contains tlds .com .net .org and others as well as third level domains e.g. host1.domain.com and the names are placed within paragraphs of text.

Domains do not have a http:// prefix so I'm thinking the only thing to match on would be the tlds for example match ".com", extract everything before it up to "space" character.

How would I go about doing this?

grep, sed and awk?

Thank you gurus!:o

er, you could use any of them, but perl is better suited:

perl -n -e '/\b\S+\.(com|org|edu)\b/ && print $&,"\n"; '

grep *.com
grep *.net
and so on..

> cat file06
blah blah www.boston.com more blah
ha ha yech yes nope not yet tomorrow
today www.unix.com future www.unix.org
forever and ever sportsillustrated.cnn.com high

> cat file06 | tr " " "\n" | grep .com
www.boston.com
www.unix.com
sportsillustrated.cnn.com

I am trying to extract .co.uk domains from html,
using the command:
cat $DIR/oldfile.txt | tr " " "\n" | grep [A-Za-z0-9_\.-].co.uk > $DIR/newfile.txt

The problem is that this command matches:
/>domain.co.uk<br
/>domain.co.uk<br
/>domain.co.uk<br
etc

How do I modify my regexp to match alphanumeric chars only? (apart from the dots and possible hyphens)

Many Thanks,

Hal

Well, if you change it to match alphanumeric only, then you get:

domain.co.ukbr

So I don't think that's what you want. If your grep accepts -o, you can do:

grep -o '[A-Za-z0-9_\.-]*.co.uk'

If not, use sed instead of grep:

sed 's/.*\([A-Za-z0-9_\.-]*.co.uk\).*/\1/'

Thank you Otheus. Working fine with grep -o.