non alpha characters in sed + making it fast?

rich1 · October 1, 2010, 11:12am

hello, I'm trying to write the fastest sed command possible (large files will be processed) to replace RICH with NICK in a file which looks like this (below) if the occurance of RICH is uppercase, replace with uppercase if it's lowercase, replace with lowercase

SOMTHING_RICH_SOMTHING <- replace here
hellorichwhatareyoudoing <- do not replace here, forms part of a word
SOMTHING.RICH.SOMTHING <- replace here
somthing_rich_somthing <- replace here
HELLO-RICH-HELLO <- replace here

I've used the find operator first to speed up execution how do I alter the sed to achieve the above requirements?

sed '/RICH/ s/RICH/NICK/g' filename   # executes more quickly

dragon.1431 · October 1, 2010, 12:15pm

hi,
try this:

s/\(.*[_.-]\)\([Rr][iI][Cc][Hh]\)\([_.-].*\)/\1NICK\3/g

i got the following output for your input:

SOMTHING_NICK_SOMTHING 
hellorichwhatareyoudoing 
SOMTHING.NICK.SOMTHING 
somthing_NICK_somthing 
HELLO-NICK-HELLO

use this simple one:

s/\(.*[_.-]\)\(rich\)\([_.-].*\)/\1NICK\3/ig

i - ignore case sensitive (forgot this )

with the first approach below words will be considered:

RIch
RicH etc

rich1 · October 1, 2010, 1:11pm

dragon.1431:

hi,
try this:

s/\(.*[_.-]\)\([Rr][iI][Cc][Hh]\)\([_.-].*\)/\1NICK\3/g

i got the following output for your input:

SOMTHING_NICK_SOMTHING 
hellorichwhatareyoudoing 
SOMTHING.NICK.SOMTHING 
somthing_NICK_somthing 
HELLO-NICK-HELLO

not quite what I was expecting :), My fault probably for not being clear enough with the requirement:

richard@opensolaris:~/share/cleaner$ echo "RICH_SOMTHING" | sed 's/\(.*[_.-]\)\([Rr][iI][Cc][Hh]\)\([_.-].*\)/\1NICK\3/g'
RICH_SOMTHING

richard@opensolaris:~/share/cleaner$ echo "SOMTHING@RICH" | sed 's/\(.*[_.-]\)\([Rr][iI][Cc][Hh]\)\([_.-].*\)/\1NICK\3/g'
SOMTHING@RICH

---------- Post updated at 05:48 PM ---------- Previous update was at 05:29 PM ----------

basically it's a case of "if there are any letters (with no spaces) directly to the left or right of the string I'm replacing, don't replace it" so the following must be left alone:

HELLORICHHELLO
HELLORICH
RICHHELLO

anything other than the above is fair game with the added condition that 'if you're doing a replace and the string being replaced is lower case, replace with lowercase, if uppercase, replace with uppercase

---------- Post updated at 06:11 PM ---------- Previous update was at 05:48 PM ----------

Nearly got it...

richard@opensolaris:~/share/cleaner$ echo "@rich" | sed 's/\(.*[_.-.@]\)\(rich\)\([_.-.@].*\)/\1NICK\3/ig'
@rich
richard@opensolaris:~/share/cleaner$ echo "@RICH@" | sed 's/\(.*[_.-.@]\)\(rich\)\([_.-.@].*\)/\1NICK\3/ig'
@NICK@

dragon.1431 · October 1, 2010, 1:15pm

hi,
give your exact input and output.
please note that i posted code as per your first post.

rich1 · October 3, 2010, 5:08am

input:

hellorichhello
HELLORICHHELLO
SOMTHING-RICH-SOMTHING
SOMTHING@RICH
@RICH
RICH@
-RICH
RICH-
SOMTHING_RICH_SOMTHING
_RICH
RICH_
SOMTHING RICH SOMTHING
somthing rich somthing

output:

hellorichhello
HELLORICHHELLO
SOMTHING-NICK-SOMTHING
SOMTHING@NICK
@NICK
NICK@
-NICK
NICK-
SOMTHING_NICK_SOMTHING
_NICK
NICK_
SOMTHING NICK SOMTHING
somthing NICK somthing

basically if the string to be replaced has an alphabetic character to the left or right (or left AND right) of it, don't replace it - if it has anything other than an alphabetic char (inc. spaces) to the left or right of it (or left AND RIGHT), replace it...

Scrutinizer · October 3, 2010, 5:25am

sed 's/\(^\|[^[:alnum:]]\)\(rich\|RICH\|Rich\)\([^[:alnum:]]\|$\)/\1NICK\3/g' infile

rich1 · October 5, 2010, 3:02am

I won't even pretend i understand how this sed works! but it does! is it possible to replace uppercase with uppercase and lower with lower or would i have to run 2 different seds... cheers

dragon.1431 · October 5, 2010, 3:51am

@ scrutinizer,

could you please explain this part?

especially ,

thanks in advance.

:alnum: ↩︎
:alnum: ↩︎
:alnum: ↩︎

Scrutinizer · October 5, 2010, 4:19am

Yes, AFAIK you would have to use one sed with multiple substitute statements.

Hi, it means if rich, RICH or Rich is between non-alphanumerical characters [^[:alnum:]] or only on one side and on the other side it is at the beginning( ^ ) or at the end ( $ ) then replace it with "NICK" .

rich1 · October 5, 2010, 6:37am

hi scrutinizer, just to clarify are you saying this:

sed 's/\(^\|[^[:alnum:]]\)\(rich\|RICH\|Rich\)\([^[:alnum:]]\|$\)/\1NICK\3/g' infile

will replace 'rich' with 'nick' or 'RICH' with 'NICK'?

when I run it, it always replaces with 'NICK' no matter what version of 'RICH' i pass into it? or is this intended design?

much appreciated

Scrutinizer · October 5, 2010, 6:49am

Hi, no I meant something like this:

sed -e 's/\(^\|[^[:alnum:]]\)rich\([^[:alnum:]]\|$\)/\1nick\2/g' \
    -e 's/\(^\|[^[:alnum:]]\)RICH\([^[:alnum:]]\|$\)/\1NICK\2/g' infile

or

sed 's/\(^\|[^[:alnum:]]\)rich\([^[:alnum:]]\|$\)/\1nick\2/g
     s/\(^\|[^[:alnum:]]\)RICH\([^[:alnum:]]\|$\)/\1NICK\2/g' infile