SED With Regex to extract Email Address

Hi Folks,
In my program, I have a variable which consists of multiple lines. i need to use each line as an input. My intention is to extract the email address of the user in each line and use it to process further.

The email address could be anywhere in the whole line. But there will be only one and I need to extract it. The most complex possible format of the email ID is:
first-name.last-name@xyz.com

In other words, first and last names are separated by a period (.) and the first and last names may have a hyphen (-). The domain name (@xyz.com) is fixed and only has letters, no numbers.

We have a Solaris OS and I am using Korn shell. I read some examples on SED command and have made a few attempts to use it. But every time, no matter what regex I use, I get the entire input as the output. I must be doing something very basic thing wrong. Could you please suggest?

$ echo '92' | sed '/[0-9]+/p'
92

$ echo 'email92' | sed '/[0-9]+/p'
email92

$ echo "abc.xyz@comp.com" | sed '/\([A-Za-z0-9]+\)\(-*\)\([A-Za-z0-9]*\)\(\.\)\([A-Za-z0-9]+\)\(-*\)\([A-Za-z0-9]*\)@comp.com/p'
abc.xyz@comp.com

$ echo "Email address is abc.xyz@comp.com" | sed '/\([A-Za-z0-9]+\)\(-*\)\([A-Za-z0-9]*\)\(\.\)\([A-Za-z0-9]+\)\(-*\)\([A-Za-z0-9]*\)@comp.com/p'
Email address is abc.xyz@comp.com

Note that below is the regex that I arrived at to extract the email address.

/\([A-Za-z0-9]+\)\(-*\)\([A-Za-z0-9]*\)\(\.\)\([A-Za-z0-9]+\)\(-*\)\([A-Za-z0-9]*\)@comp.com

Any help is greatly appreciated.

Just feel free to adjust with some additionnal [A-Z] range or whatever but here is an idea :

(The trick is to add a space at the beginning of a line so that if the line contains only the mail address, the next regular expression used to reach the mail adresse preceeded by a space would also match.)

echo "Email address is abc.xyz@comp.com" | sed 's/.*/ &/;s/.* \([^ @]*@[^ @]*.com\).*/\1/'

(Just remember that if you want the hyphen to be taken as litteral in a list it should be set at the end of it

quick example : [:#@-]

also consider :

... | sed 's/.*/ &/;s/.* \([A-Za-z0-9.-]*@[A-Za-z0-9.-]*.com\).*/\1/'

but this would also match an adress with empty user and/or domain like : @.com

so you want it to be at least one char (and you don't want it to be a an hyphen or a dot you can for example:

... | sed 's/.*/ &/;s/.* \([A-Za-z0-9][A-Za-z0-9.-]*@[A-Za-z0-9][A-Za-z0-9.-]*.com\).*/\1/'

Just adjust it to your requirements

If you know that the mail address always appear at the end of line and after a space you can simply :

echo "Email address is abc.xyz@comp.com" | sed 's/.* //g'

---------- Post updated at 08:25 PM ---------- Previous update was at 07:55 PM ----------

# cat tst
first-name1.last-name@xyz.com
bla bla first-name2.last-name@xyz.com blebla blealdsfl
first-name3.last-name@xyz.com is a valid mail address.com
This line with valid adress.com: first-name4.last-name@xyz.com next nested address.com
last valid mail address.com first-name5.last-name@xyz.com
# sed 's/.*/ &/;s/.* \([^ @]*@[^ @]*.com\).*/\1/' tst
first-name1.last-name@xyz.com
first-name2.last-name@xyz.com
first-name3.last-name@xyz.com
first-name4.last-name@xyz.com
first-name5.last-name@xyz.com

(Ok i didn't handle the case where the separator is a tabulation instead of a space but it's easy to tweak the code or to tr -s '[:blank:]' ' ' <output before applying the sed statement

For the sake of another way to do it, I was interested in this question and did as little searching. I found a perl script at this site: Extract email addresses from big file. - Unix / Linux / BSD

Now I do not know perl but it seems to work.

$ cat x
#!/bin/ksh
echo "Email address is ab-c.x-yz@comp.com" |perl -wne'while(/[\w\.\-]+@[\w\.\-]+\w+/g){print "$&\n"}'

$ ./x
ab-c.x-yz@comp.com
$

I would be inclined to prefer [^ \t@] to \w for the bit before the @, since a surprising range of character are allowed in the local part of email addresses:
Email address - Wikipedia, the free encyclopedia

1 Like

Search 'RFC2822 regex' - the regular expression for the official standard for addresses is, um, long :).

(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|�(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*�)@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])

  
1 Like

Jeepers! Looks like a cat ran across the keyboard a few times!