I have to grep out only email address from a column. It has characters appended and prepended
F=<sss1@domain.com>
<sss2@domain.com>
(sss3@domain.com)
<sss4@domain.com>
Whatever added before and after email, I should be able to grep out only emails.
Hi
$ grep -o '[[:alnum:]]*@[[:alpha:]]*\.com' file
sss1@domain.com
sss2@domain.com
sss3@domain.com
sss4@domain.com
Guru.
1 Like
Thank you guruprasadpr,
If there are mixed tlds, like .com, .net, .co.in, .in etc..
A slight extension to guruprasadpr's solution:
grep -Eo '[[:alnum:]]*@[[:alpha:]]*(\.[a-z]{2,4})+' file
$ nawk -F"[<>()]" '{print $2}' test.txt
sss1@domain.com
sss2@domain.com
sss3@domain.com
sss4@domain.com
sed 's/.*[<(]\([^>)]*\)[>)]/\1/g' infile
# cat /root/gmail.txt
now_u.k12@gmail.com
c.gg@gmail.com
s_klk@gmail.com
When _ or . character is in email, it gives wrong result.
# cat /root/gmail.txt | grep -o '[[:alnum:]]*@gmail.com' |sort|uniq -c|sort -nk 1
1 gg@gmail.com
1 k12@gmail.com
1 klk@gmail.com
How to solve this?
Use [a-zA-Z0-9._]
instead of [[:alnum:]]
1 Like
Try:
grep -o '[[:alnum:]._]*@gmail.com'
1 Like
Both above solutions work.