Selective grep

I have to grep out only email address from a column. It has characters appended and prepended

 F=<sss1@domain.com>
 <sss2@domain.com>
(sss3@domain.com)
 <sss4@domain.com>

Whatever added before and after email, I should be able to grep out only emails.

Hi

$ grep -o '[[:alnum:]]*@[[:alpha:]]*\.com' file
sss1@domain.com
sss2@domain.com
sss3@domain.com
sss4@domain.com

Guru.

1 Like

Thank you guruprasadpr,

If there are mixed tlds, like .com, .net, .co.in, .in etc..

A slight extension to guruprasadpr's solution:

grep -Eo '[[:alnum:]]*@[[:alpha:]]*(\.[a-z]{2,4})+' file
 
$ nawk -F"[<>()]" '{print $2}' test.txt
sss1@domain.com
sss2@domain.com
sss3@domain.com
sss4@domain.com

sed 's/.*[<(]\([^>)]*\)[>)]/\1/g' infile
# cat  /root/gmail.txt
now_u.k12@gmail.com
c.gg@gmail.com
s_klk@gmail.com

When _ or . character is in email, it gives wrong result.

# cat /root/gmail.txt   |  grep -o '[[:alnum:]]*@gmail.com' |sort|uniq -c|sort -nk 1
      1 gg@gmail.com
      1 k12@gmail.com
      1 klk@gmail.com

How to solve this?

Use [a-zA-Z0-9._] instead of [[:alnum:]]

1 Like

Try:

grep -o '[[:alnum:]._]*@gmail.com'
1 Like

Both above solutions work. :slight_smile: