Removing domain suffix with SED

krypton · March 31, 2010, 3:51am

Hi Experts,

I have a syslog file from 1000's of different hosts which I want to adjust by removing the domain suffix from the hosts.

My previous attempts haven't managed to match all the different lenghts of the subdomains which are being logged.

Could somebody suggest which sed syntax could be used to accomplish this.

The domains list from 1 sub domain to 5 with ranging lengths of subdomain names ie:

host.please.remove.me

Any assistance here would be appreciated.

Best Regards,

K

ktrimu · March 31, 2010, 4:28am

awk -F. '{print $1}' filename

krypton · March 31, 2010, 4:48am

Hi Ktrimu,

thanks for the quick reply here.

Using awk this does indeed remove the rest of the suffix but also the rest of the syslog after the end of the hostname / domain.

I actually want to remove just the domain suffix extension leaving the rest of the log intact.

Apologies if I didn't state this clearly in the earlier post.

Regards,

K

alister · March 31, 2010, 8:45am

It's probably a simple problem for a sed guru, if only they knew what the line looked like. Always provide a sample of the input you want people to help you with, and don't forget to mention any special cases if there are any.

Alister

bakunin · March 31, 2010, 11:41am

As i don't know the intricacies of the actual file here are some suggestions:

sed 's/ \([^ .]*\)\.[^ ]* / \1 /g'

This will replace any string of non-spaces/non-full-stops, followed by a full stop, followed by a series of non-spaces by the part up to the first full stop:

abc.def.ghi.jkl -> abc

In the regexp there are surrounding spaces to split on word boundaries, therefore the regexp will not work with "words" at the beginning or the end of a line. Furthermore, if there are "words" with full stops in them they would be shortened too (like in "my.name@emailadress.com" -> "my").

To catch FQDNs at line beginnings or endings in the same way you could add two more regexps:

sed 's/^\([^ .]*\)\.[^ ]* /\1 /
     s/ \([^ .]*\)\.[^ ]*$/ \1/'

If you can limit the valid characters for domain names any further you can of course fine-tune this mechanism to catch less non-domainnames. To adress my example of the email-address above, for instance:

sed 's/ \([^ @.]*\)\.[^ @]* / \1 /g'

This would leave email addresses untouched, because "@" is no longer considered a valid character in an FQDN.

I hope this helps.

bakunin

ygemici · March 31, 2010, 4:44pm

maybe like this

[root@rhnserver ~]# cat domain
cmachine.domain.com
xmachine.xdomain.domain.com
ymachine.subdomain.ydomain.domain.com
wmachine23.subsubdomain.subdomain.wdomain.domain.com
zmachine23.subsubbdomain.subbbdomain.subadomain.zdomaincom.domain.com

[root@rhnserver ~]# sed 's/\([[:alnum:]][[:alnum:]]*\)\.\([[:graph:]][[:graph:]]*\)/\1/g' domain
cmachine
xmachine
ymachine
wmachine23
zmachine23

krypton · April 1, 2010, 5:24am

Hi All,

thanks for your quick and informative replies to my query.

In the end the following expressions met my requirements:

sed 's/\([[:alnum:]][[:alnum:]]*\)\.\([[:graph:]][[:graph:]]*\)/\1/g'

sed 's/ \([^ @.]\)\.[^ @] / \1 /g'

Thanks for the help.

K