remove the text (*@yahoo.com, hotmail.com) from one file

Hi,

I have different file name () at the path: /path/ with the below content:

Example of the file content.
abc@test.com: abc@hotmail.com, abc@yahoo.com
xyz@test.com: xyz@hotmail.com
qwer@test.com: qwer@gmail.com,qwer@aol.com,qwer@test2.com
ghjk@test.com: ghjk@test2.com
abcd@xyz.com: abcd@hjkl.com, abcd@fghj.com,abcd@hotmail.com
..
..

I wish to remove those email account which have hotmail.com, yahoo.com,gmail.com, etc

Means after removing, the content will be as below:
abc@test.com:
xyz@test.com:
qwer@test.com: qwer@test2.com
ghjk@test.com: ghjk@test2.com
abcd@xyz.com: abcd@hjkl.com, abcd@fghj.com
..
..

I have an idea on how to get those line which have gmail.com, hotmail.com or yahoo.com by using the below command:

egrep -i '@yahoo.com|@gmail.com|@aol.com|@hotmail.com' /path/*

The output as below:
abc@test.com: abc@hotmail.com, abc@yahoo.com
xyz@test.com: xyz@hotmail.com
qwer@test.com: qwer@gmail.com,qwer@aol.com
abcd@xyz.com: abcd@hotmail.com
..
..

But, cant think of how to remove it directly from the files, can someone please advice on the issue?

It is it can be link with the command sed?

Thanks,

tr ' ' '\n' < a | while read line
do
echo "$line" | egrep -v "hotmail|yahoo" 2>/dev/null 1>&2
if [ $? -eq 0 ]
then
echo "$line"
fi
done

input-file:

abc@test.com: abc@hotmail.com,abc@yahoo.com
xyz@test.com: xyz@hotmail.com
qwer@test.com: qwer@gmail.com,qwer@aol.com,qwer@test2.com
ghjk@test.com: ghjk@test2.com
abcd@xyz.com: abcd@hjkl.com,abcd@gmail.com,bcd@mil.com

standard sed:

$ sed 's/\([ ,]\)[^@]\+@\(gmail\|yahoo\|hotmail\)\.com,\?/\1/g' input-file

sed with extended regex option (easier to read)

sed -r 's/([ ,])[^@]+@(gmail|yahoo|hotmail)\.com,?/\1/g' input-file

Output:

abc@test.com: abc@yahoo.com
xyz@test.com: 
qwer@test.com: qwer@aol.com,qwer@test2.com
ghjk@test.com: ghjk@test2.com
abcd@xyz.com: abcd@hjkl.com,bcd@mil.com

Note: I removed the space after the comma because sometimes there was one sometimes not. Not consistent.

With Perl:

perl -i.bck -pe's/(?:,\s?)?\w+@(?:(?:hot|g)mail|aol|yahoo)\.com,?//g' /path/*

Wel, I just noticed my sed solution was not working for two consecutive banned email addresses. See first line of output. Give it a try with awk:

awk -F":" -v OFS=": " '{print $1, gensub(/[^@]+@(gmail|yahoo|hotmail)\.com,?/, "", "g", $2)}' input-file

I think you mean GNU Awk.

awk 'BEGIN{FS="[:,]"}
{
 for(i=1;i<=NF;i++) {
  if( $i !~ /yahoo|gmail|hotmail/) {
    printf $i" " 
  }  
 } 
 print ""
}' file

Yes I meant gawk. gensub() is only available in gawk.

Your perl solution produces this on my sample file:

abc@test.com: 
xyz@test.com: 
qwer@test.com: qwer@test2.com
ghjk@test.com: ghjk@test2.com
abcd@xyz.com: abcd@hjkl.combcd@mil.com

What about:

perl -pe's/(,\s?)?\w+@(?:(?:hot|g)mail|aol|yahoo)\.com,?/\1/g' input-file

Yep,
good point.
But:

zsh-4.3.4% perl -pe's/(,\s?)?\w+@(?:(?:hot|g)mail|aol|yahoo)\.com,?/\1/g'<<<'abcd@fghj.com
,abcd@hotmail.com'
abcd@fghj.com,

If I'm not missing something again:

s/\w+@(?:(?:hot|g)mail|aol|yahoo)\.com,?//g;s/,$//

It won't handle this of course:

email@yahoo.com ,

So, may be:

s/\w+@(?:(?:hot|g)mail|aol|yahoo)\.com(?:\s?,)?//g;s/,$//

isn't this the same as

tr ' ' '\n' < a | egrep -v "hotmail|yahoo" 2>/dev/null 1>&2
...

without the extra while loop.

... and why redirecting the standard error?

nah...just cut and pasted...forgot to remove them. its not needed.