finding and removing patterns in a large list of urls

I have a list of urls for example:

Google
Google Base
Yahoo!
Yahoo!
Yahoo! Video - It's On
Google

The problem is that Google and Google are duplicates as are Yahoo! and Yahoo!.

I'm needing to find these conical www duplicates and append the text "DUP#" in from of both Google and Google for delimited import into excel to be able to sort and review by eye.

:confused: have no idea how to begin... sed, awk, perl, cut, etc????

Many thanks for any input.

#!/usr/bin/perl
use strict;
open FH,"<a.txt";
my (@arr,%hash);
while(<FH>){
	chomp;
	push @arr,$_;
	$hash{$_}++;
}
close FH;
map { $_="#DUP ".$_ if $hash{$_} > 1 } @arr;
print join "\n" , @arr;

Hi totus,

Hope This also can do .....

inputfile:
www.Google.com
www.Google Base.com
www.Yahoo!.com
www.Yahoo!.com
www.Yahoo! Video - It's On.com
www.Google.com

command:
sort inputfile|uniq -D |awk '{print $0"_DUP#"}'> out.csv

output:
www.Google.com_DUP#
www.Google.com_DUP#
www.Yahoo!.com_DUP#
www.Yahoo!.com_DUP#

Thanks
Sha

Hello both of you! Thanks for the tips! However, I made a mistake in representing my data - as the vbulletin mucked it up. Here it is in <code> snip:

http://www.google.com
http://google.com
http://www.yahoo.com
http://video.yahoo.com
http://www.yahoo.com
http://knol.google.com

The issue is the www.domain.com and domain.com are dups. I need to identify these in a large lists by appending some delimiter to the matches e.g.

DUP#http://www.google.com
DUP#http://google.com

:confused: