sed and the use of !

steve54 · July 1, 2013, 10:36am

I only use sed now and then and have come across a script using "!" that has me puzzled. I've searched for enlightenment but no luck. All I can find is "!" is a "not" operator. The script is below used in conjunction with

"find  . -type file -print | xargs -iF scriptbelow.zsh F"

#!/bin/zsh
echo $1
cat $1|sed '
s!:FRED.*:!:!
s!:BERT.*:!:!
s!:ALF.*;!;!
s!JOHN$!!
s!:FRANK.*:!:!
s!:JACK.*:!:!
s!:$!!
s!:ANDY.*:!:!'  >$1.tmp
mv $1.tmp $1

I've reduced the number of lines starting with "s!" ie duplicate lines and amended the variables (field names) to names.

The script I believe is used to remove word/s from multiple text files. The text files come in two forms -
a) with just variables
b) with variables+value
and can comprise of 1 or more (varying length) lines with fields separated by colons eg
ONE:TWO:THREE:BERT:FOUR:FIVE;
or
ONE,1:TWO,2:THREE,3:BERT,999:FOUR,:FIVE,5;

I played around with the script and it appeared to remove more that just the apparent target. Why do it this way? Why the different number of ":"? Notice that "s!JOHN$!!" is missing : ie s!: is this an error?

Skrynesaver · July 1, 2013, 10:54am

.* captures to the end of the line, or to the last colon in the line in the case of :FRED.*:
[^:]* captures to the next colon
JOHN$ only matches JOHN at the end of the line.

Subbeh · July 1, 2013, 11:09am

"!" is just the delimiter for sed. You can use pretty much use any character as the delimiter with sed. normally it's "/"

Don_Cragun · July 1, 2013, 11:11am

steve54:

I only use sed now and then and have come across a script using "!" that has me puzzled. I've searched for enlightenment but no luck. All I can find is "!" is a "not" operator. The script is below used in conjunction with
"find  . -type file -print | xargs -iF scriptbelow.zsh F"

#!/bin/zsh
echo $1
cat $1|sed '
s!:FRED.*:!:!
s!:BERT.*:!:!
s!:ALF.*;!;!
s!JOHN$!!
s!:FRANK.*:!:!
s!:JACK.*:!:!
s!:$!!
s!:ANDY.*:!:!'  >$1.tmp
mv $1.tmp $1
I've reduced the number of lines starting with "s!" ie duplicate lines and amended the variables (field names) to names.

The script I believe is used to remove word/s from multiple text files. The text files come in two forms -
a) with just variables
b) with variables+value
and can comprise of 1 or more (varying length) lines with fields separated by colons eg
ONE:TWO:THREE:BERT:FOUR:FIVE;
or
ONE,1:TWO,2:THREE,3:BERT,999:FOUR,:FIVE,5;

I played around with the script and it appeared to remove more that just the apparent target. Why do it this way? Why the different number of ":"? Notice that "s!JOHN$!!" is missing : ie s!: is this an error?

The most common field separator in the sed substitute command is the slash character:

[StartAddress,EndAddress]s/regex/replacement/[flags]

but the slash can be replaced by any character except backslash \ and <newline>. The sed command you're showing seem to be used to play around with the password file. I would assume the script is using ! as the delimiter instead of / because some fields in the files you'll be editing contain slash characters and you have some regex or replacement fields in your substitute command that contain slash characters.

The command s!:FRED.*:!:! looks for a line containing the string :FRED followed by the longest string of characters it can find followed by another : and replaces all of that with a : .

The command s!JOHN$!! removes the string JOHN if it appears at the end of the line. (The $ in the regular expression anchors the match to the end of the line.)

The other substitute commands are similar to one of the two above.

Without seeing a specification of what changes this sed script is intended to perform, we have absolutely no way of knowing whether or not the substitute commands implement the correct transformations on your input file.

Note also that using cat to feed the input to sed is a waste of system resources since sed accept filename operands to specify the input files to be processed. And also note that if something goes wrong with the sed command you will wipe out the file you're trying to edit. A more efficient, safer approach would be:

#!/bin/zsh
echo $1
if sed '
s!:FRED.*:!:!
s!:BERT.*:!:!
s!:ALF.*;!;!
s!JOHN$!!
s!:FRANK.*:!:!
s!:JACK.*:!:!
s!:$!!
s!:ANDY.*:!:!' "$1" >"$1.tmp"
then    mv "$1.tmp" "$1"
else    printf "sed failed trying to process %s\n" "$1"
fi

steve54 · July 1, 2013, 3:54pm

Thank you all so much for providing such speedy useful info. And wow I had no idea you could change the delimiter without specifying the change (thinking of awk -F). The purpose of the script is simply to remove a variable (which may be on its own or paired with a value) as in remove all instances of eg "FredXX". So this means amending lines in files as follows:

"Bill:FredXX:Joe" -> "Bill:FredXX:Joe"
"Bill,123:FredXX,45678:Joe,3" -> "Bill,123:Joe,3"

So from above it looks like this will work: "The command s!:FRED.*:!:! looks for a line containing the string :FRED followed by the longest string of characters it can find followed by another : and replaces all of that with a :"

I was thinking that variables could be at the start,middle, end of line but it looks like the script is hard coded knowing this not to be the case eg we also have s!JOHN$!! (find at end of the line). The thing is I've a feeling that the script has errors eg s!:ALF.*;!;! is using ";" not ":" so maybe typo? I created a dummy file and the results were not as expected. I reproduced part of this using the following-

% echo one:FRED634r543:qwerty:two | sed 's!:FRED6.*:!:!'
% one:two

So can see the "qwerty" variable is also removed

Finally was puzzled by s!:$!! then thanks to your help realised it is the same as s/:$// so search for ":" at end of line and remove it.

Don_Cragun · July 1, 2013, 5:56pm

steve54:

Thank you all so much for providing such speedy useful info. And wow I had no idea you could change the delimiter without specifying the change (thinking of awk -F). The purpose of the script is simply to remove a variable (which may be on its own or paired with a value) as in remove all instances of eg "FredXX". So this means amending lines in files as follows:
"Bill:FredXX:Joe" -> "Bill:FredXX:Joe"
"Bill,123:FredXX,45678:Joe,3" -> "Bill,123:Joe,3"
So from above it looks like this will work: "The command s!:FRED.*:!:! looks for a line containing the string :FRED followed by the longest string of characters it can find followed by another : and replaces all of that with a :"

I was thinking that variables could be at the start,middle, end of line but it looks like the script is hard coded knowing this not to be the case eg we also have s!JOHN$!! (find at end of the line). The thing is I've a feeling that the script has errors eg s!:ALF.*;!;! is using ";" not ":" so maybe typo? I created a dummy file and the results were not as expected. I reproduced part of this using the following-
% echo one:FRED634r543:qwerty:two | sed 's!:FRED6.*:!:!'
% one:two
So can see the "qwerty" variable is also removed

Finally was puzzled by s!:$!! then thanks to your help realised it is the same as s/:$// so search for ":" at end of line and remove it.

I'm puzzled by the output marked in red above. From your explanation, I expected that text to be removed.

I also assume that when you're searching for FRED6 , FRED61 should not be modified, but FRED6,string should be removed. From one of your earlier posts in this thread, I assume that your input file contains fields that are separated by a colon and each line is terminated by a semicolon (followed by a <newline>). It that is the case, I think the following may do what you want:

printf '%s\n' "FRED6;" "FRED61;" "FRED6,45678;" "abc:FRED6;" "abc:FRED61;" \
        "abc:FRED6,45678;" "abc:FRED6:def;" "abc:FRED61:def;" \
        "abc:FRED6,45678:def;" "abc:FRED6:GEORGE:def;" \
        "abc:FRED61:GEORGE:def;" "abc:FRED6,45678:GEORGE:def;" "FRED6:def;" \
        "FRED61:def;" "FRED6,45678:def;" "abc:No Changes to this:def;" |
        sed    '\!^FRED6\(,[^:]*\)\{0,1\};!d
                s!^FRED6\(,[^:]*\)\{0,1\}:!!
                s!:FRED6\(,[^:]*\)\{0,1\}:!:!
                s!:FRED6\(,[^:]*\)\{0,1\};!;!'

Obviously, you don't need the printf statement; it is just there to provide test input to verify that the sed command works as expected.

The 1st sed command deletes a line that only has one entry and it matches the target. The 2nd sed command removes a matching entry at the start of a line if there more than one entry on that line. The 3rd sed command removes a matching entry when there are other entries before and after the matching entry. And, the last sed command removes a matching entry if it is the last entry on the line.

Note, however, that this script might not work correctly if a matching entry appears in more than one field on an input line.

Hope this helps...

steve54 · July 2, 2013, 3:19am

Apologies Don, everyone,

"Bill:FredXX:Joe" -> "Bill:FredXX:Joe"

should have read

"Bill:FredXX:Joe" -> "Bill:Joe"

Don you are correct in that " fields that are separated by a colon and each line is terminated by a semicolon (followed by a <newline>)" but it may also be the case that some files are not terminated by a semicolon plus some files have in place of "field" (as above) have "field,value". So in this case the following is not true: "when you're searching for FRED6 , FRED61 should not be modified" i.e. it should be modified or rather removed as FRED61 is just another value. I don't like the way this is implemented, it would seem cleaner to me to have a file containing a list of fields to be removed and a script that operates on that file though this would be a harder script to create.

Don, sorry I don't get the code

printf '%s\n' "FRED6;"...etc

appears to print out everything regardless of whether sed is in place or not

Don_Cragun · July 2, 2013, 4:06am

steve54:

Apologies Don, everyone,

"Bill:FredXX:Joe" -> "Bill:FredXX:Joe"

should have read

"Bill:FredXX:Joe" -> "Bill:Joe"

Don you are correct in that " fields that are separated by a colon and each line is terminated by a semicolon (followed by a <newline>)" but it may also be the case that some files are not terminated by a semicolon plus some files have in place of "field" (as above) have "field,value". So in this case the following is not true: "when you're searching for FRED6 , FRED61 should not be modified" i.e. it should be modified or rather removed as FRED61 is just another value. I don't like the way this is implemented, it would seem cleaner to me to have a file containing a list of fields to be removed and a script that operates on that file though this would be a harder script to create.

Don, sorry I don't get the code

printf '%s\n' "FRED6;"...etc

appears to print out everything regardless of whether sed is in place or not

You may notice that the last line of the printf statement ends with a | . The printf command is feeding sample data through a pipe into the sed command. It is just there to show that all occurrences of FRED6 and FRED6,string will be removed from the ouput if it appears at the start of a line followed by a colon or semicolon; from the end of a line followed by a semicolon (changing the preceding colon to a semicolon in this case) and from the middle of a line following and followed by other "names". It will NOT modify FRED61 , or ALFRED6 ; only FRED6 and FRED6 immediately followed by a comma.

Changing my script to add an intermediate file and having sed read from that file involves the following huge changes: Take my original script:

printf '%s\n' "FRED6;" "FRED61;" "FRED6,45678;" "abc:FRED6;" "abc:FRED61;" \
        "abc:FRED6,45678;" "abc:FRED6:def;" "abc:FRED61:def;" \
        "abc:FRED6,45678:def;" "abc:FRED6:GEORGE:def;" \
        "abc:FRED61:GEORGE:def;" "abc:FRED6,45678:GEORGE:def;" "FRED6:def;" \
        "FRED61:def;" "FRED6,45678:def;" "abc:No Changes to this:def;" |
        sed    '\!^FRED6\(,[^:]*\)\{0,1\};!d
                s!^FRED6\(,[^:]*\)\{0,1\}:!!
                s!:FRED6\(,[^:]*\)\{0,1\}:!:!
                s!:FRED6\(,[^:]*\)\{0,1\};!;!'

and change it to:

printf '%s\n' "FRED6;" "FRED61;" "FRED6,45678;" "abc:FRED6;" "abc:FRED61;" \
        "abc:FRED6,45678;" "abc:FRED6:def;" "abc:FRED61:def;" \
        "abc:FRED6,45678:def;" "abc:FRED6:GEORGE:def;" \
        "abc:FRED61:GEORGE:def;" "abc:FRED6,45678:GEORGE:def;" "FRED6:def;" \
        "FRED61:def;" "FRED6,45678:def;" "abc:No Changes to this:def;" > tmpfile
        sed    '\!^FRED6\(,[^:]*\)\{0,1\};!d
                s!^FRED6\(,[^:]*\)\{0,1\}:!!
                s!:FRED6\(,[^:]*\)\{0,1\}:!:!
                s!:FRED6\(,[^:]*\)\{0,1\};!;!' tmpfile

Either way, the output produced is:

FRED61;
abc;
abc:FRED61;
abc;
abc:def;
abc:FRED61:def;
abc:def;
abc:GEORGE:def;
abc:FRED61:GEORGE:def;
abc:GEORGE:def;
def;
FRED61:def;
def;
abc:No Changes to this:def;

All occurrences of FRED61 are untouched; all occurrences of FRED6 (as a name by itself and as a name followed by a comma and a string of non-colon characters following the comma have been removed from the output.

The sample data produced by the printf command is:

FRED6;
FRED61;
FRED6,45678;
abc:FRED6;
abc:FRED61;
abc:FRED6,45678;
abc:FRED6:def;
abc:FRED61:def;
abc:FRED6,45678:def;
abc:FRED6:GEORGE:def;
abc:FRED61:GEORGE:def;
abc:FRED6,45678:GEORGE:def;
FRED6:def;
FRED61:def;
FRED6,45678:def;
abc:No Changes to this:def;

All of the data in red is removed by the sed script.

PS This script still assumes that lines are terminated by a semicolon as was shown in the earlier sample lines of input you provided, which were:

ONE:TWO:THREE:BERT:FOUR:FIVE;
ONE,1:TWO,2:THREE,3:BERT,999:FOUR,:FIVE,5;

If it is important to handle lines that don't end in a semicolon, I need to know whether (missing) semicolons should be added to the ends of lines that don't have them.

steve54 · July 2, 2013, 6:48am

Hi Don,
Don't know if its I'm running on Solaris but i get a different o/p. Heres xterm with commands pasted in and run.

% printf '%s\n' "FRED6;" "FRED61;" "FRED6,45678;" "abc:FRED6;" "abc:FRED61;" \
>         "abc:FRED6,45678;" "abc:FRED6:def;" "abc:FRED61:def;" \
>         "abc:FRED6,45678:def;" "abc:FRED6:GEORGE:def;" \
>         "abc:FRED61:GEORGE:def;" "abc:FRED6,45678:GEORGE:def;" "FRED6:def;" \
>         "FRED61:def;" "FRED6,45678:def;" "abc:No Changes to this:def;" |
pipe>         sed    '\!^FRED6\(,[^:]*\)\{0,1\};!d
pipe quote>                 s!^FRED6\(,[^:]*\)\{0,1\}:!!
pipe quote>                 s!:FRED6\(,[^:]*\)\{0,1\}:!:!
pipe quote>                 s!:FRED6\(,[^:]*\)\{0,1\};!;!'

' "FRED6;" "FRED61;" "FRED6,45678;" "abc:FRED6;" "abc:FRED61;" "abc:FRED6,45678;" "abc:FRED6:def;" "abc:FRED61:def;" "abc:FRED6,45678:def;" "abc:FRED6:GEORGE:def;" "abc:FRED61:GEORGE:def;" "abc:FRED6,45678:GEORGE:def;" "FRED6:def;" "FRED61:def;" "FRED6,45678:def;" "abc:No Changes to this:def;" | sed '\!^FRED6\(,[^:]*\)\{0,1\};!d
                s!^FRED6\(,[^:]*\)\{0,1\}:!!
                s!:FRED6\(,[^:]*\)\{0,1\}:!:!
                s!:FRED6\(,[^:]*\)\{0,1\};!;!' 
FRED6;
FRED61;
FRED6,45678;
abc:FRED6;
abc:FRED61;
abc:FRED6,45678;
abc:FRED6:def;
abc:FRED61:def;
abc:FRED6,45678:def;
abc:FRED6:GEORGE:def;
abc:FRED61:GEORGE:def;
abc:FRED6,45678:GEORGE:def;
FRED6:def;
FRED61:def;
FRED6,45678:def;
abc:No Changes to this:def;

===========================================================

% printf '%s\n' "FRED6;" "FRED61;" "FRED6,45678;" "abc:FRED6;" "abc:FRED61;" \
>         "abc:FRED6,45678;" "abc:FRED6:def;" "abc:FRED61:def;" \
>         "abc:FRED6,45678:def;" "abc:FRED6:GEORGE:def;" \
>         "abc:FRED61:GEORGE:def;" "abc:FRED6,45678:GEORGE:def;" "FRED6:def;" \
>         "FRED61:def;" "FRED6,45678:def;" "abc:No Changes to this:def;" > tmpfile

' "FRED6;" "FRED61;" "FRED6,45678;" "abc:FRED6;" "abc:FRED61;" "abc:FRED6,45678;" "abc:FRED6:def;" "abc:FRED61:def;" "abc:FRED6,45678:def;" "abc:FRED6:GEORGE:def;" "abc:FRED61:GEORGE:def;" "abc:FRED6,45678:GEORGE:def;" "FRED6:def;" "FRED61:def;" "FRED6,45678:def;" "abc:No Changes to this:def;" > tmpfile msgmedia@AH-IN-UMF13:/home/msgmedia/stevebzsh: file exists: tmpfile
AH-IN-UMF13{msgmedia}568%         sed    '\!^FRED6\(,[^:]*\)\{0,1\};!d
quote>                 s!^FRED6\(,[^:]*\)\{0,1\}:!!
quote>                 s!:FRED6\(,[^:]*\)\{0,1\}:!:!
quote>                 s!:FRED6\(,[^:]*\)\{0,1\};!;!' tmpfile

                s!^FRED6\(,[^:]*\)\{0,1\}:!!
                s!:FRED6\(,[^:]*\)\{0,1\}:!:!
                s!:FRED6\(,[^:]*\)\{0,1\};!;!' 
FRED6;
FRED61;
FRED6,45678;
abc:FRED6;
abc:FRED61;
abc:FRED6,45678;
abc:FRED6:def;
abc:FRED61:def;
abc:FRED6,45678:def;
abc:FRED6:GEORGE:def;
abc:FRED61:GEORGE:def;
abc:FRED6,45678:GEORGE:def;
FRED6:def;
FRED61:def;
FRED6,45678:def;
abc:No Changes to this:def;

===========================================================
Run just the sed against the tmpfile

%  sed    '\!^FRED6\(,[^:]*\)\{0,1\};!d
quote>                 s!^FRED6\(,[^:]*\)\{0,1\}:!!
quote>                 s!:FRED6\(,[^:]*\)\{0,1\}:!:!
quote>                 s!:FRED6\(,[^:]*\)\{0,1\};!;!' tmpfile

                s!^FRED6\(,[^:]*\)\{0,1\}:!!
                s!:FRED6\(,[^:]*\)\{0,1\}:!:!
                s!:FRED6\(,[^:]*\)\{0,1\};!;!' 
FRED6;
FRED61;
FRED6,45678;
abc:FRED6;
abc:FRED61;
abc:FRED6,45678;
abc:FRED6:def;
abc:FRED61:def;
abc:FRED6,45678:def;
abc:FRED6:GEORGE:def;
abc:FRED61:GEORGE:def;
abc:FRED6,45678:GEORGE:def;
FRED6:def;
FRED61:def;
FRED6,45678:def;
abc:No Changes to this:def;

As for the files, format is either

1. X:Y:Z

or

2 X,value1:Y,value2:Z,value3;

Don_Cragun · July 2, 2013, 3:03pm

There is nothing in the sed command I provided that shouldn't work on any system that provides a sed that meets POSIX requirements. The output I showed you was from tests I ran on OS X on a MacBook Pro laptop. While digging around this morning, I found a note in the Linux sed(1) man page:

So, if you are using a Linux system, that might be the problem.

What does the command:

uname -a

print on your system?

What shell are you using? If you don't know, show us the output from the command:

echo "$SHELL : $shell"

You didn't answer my question about trailing semicolons in your input: If a line in your input does not end with a semicolon, do you want this script to add one?

In your last post, you said:

Does that mean that ,value only appears on lines that end with a semicolon and that every field in a line that ends with a semicolon will have a ,value on every field in that line???

Using metanotation where everything between [ and ] is optional and where [stuff]... means that the optional stuff can appear zero or more times, is the following an accurate representation of your input file format requirements?:

name[,value][:name[,value]]...[;]

where each occurrence of name is a string of one or more characters that are not colon, comma, or semicolon; and each occurrence of value is a string of zero or more characters that are neither colon nor semicolon?

steve54 · July 3, 2013, 7:39am

No not Linux, using /bin/zsh on a Sun box uname -irs = SunOS 5.10 SUNW,Netra-T5220. Really do not understand why there should be a problem with this box:
printf '%s\n' "FRED6;" | sed '\!^FRED6$,[^:]$\{0,1\};!d'
echo "FRED6;" | sed '\!^FRED6$,[^:]$\{0,1\};!d'

all print FRED6;

Yes to = "Does that mean that ,value only appears on lines that end with a semicolon and that every field in a line that ends with a semicolon will have a ,value on every field in that line"

Nothing is to be added to the files - it's a straightforward removal of a variable, and its value if it has one. It looks like - thanks to your help - the script should work but I really don't like the use of wildcards and think using a file with a list of all the variables to be removed would be safer.

Don_Cragun · July 3, 2013, 3:24pm

Do you get the same results if you specify /usr/xpg4/bin/sed instead of just sed in the script?

Many Solaris systems have the GNU utilities loaded into a directory and on those systems some users set their command search path to pick up the GNU utilities in preference to the "standard" utilities. What output do you get from the command line:

type sed;sed --version

steve54 · July 4, 2013, 3:41pm

Absolutely amazing I'd no idea of this, you are absolutely correct, using /usr/xpg4/bin/sed in place of sed (which gives /bin/sed) works (btw also interesting there is no sed -version so seems you id what you are using...comes with the os maybe) :-

% printf '%s\n' "FRED6;" "FRED61;" "FRED6,45678;" "abc:FRED6;" "abc:FRED61;" \
         "abc:FRED6,45678;" "abc:FRED6:def;" "abc:FRED61:def;" \
         "abc:FRED6,45678:def;" "abc:FRED6:GEORGE:def;" \
         "abc:FRED61:GEORGE:def;" "abc:FRED6,45678:GEORGE:def;" "FRED6:def;" \
         "FRED61:def;" "FRED6,45678:def;" "abc:No Changes to this:def;" | /usr/xpg4/bin/sed    '
                 !^FRED6\(,[^:]*\)\{0,1\};!d
                 s!^FRED6\(,[^:]*\)\{0,1\}:!!
                 s!:FRED6\(,[^:]*\)\{0,1\}:!:!
                 s!:FRED6\(,[^:]*\)\{0,1\};!;!' > tmpfile

% cat tmpfile
FRED61;
abc;
abc:FRED61;
abc;
abc:def;
abc:FRED61:def;
abc:def;
abc:GEORGE:def;
abc:FRED61:GEORGE:def;
abc:GEORGE:def;
def;
FRED61:def;
def;
abc:No Changes to this:def;
%

Don_Cragun · July 4, 2013, 4:06pm

I'm glad to hear that you got it to work.

If your "default" sed had been a GNU version of sed, sed --version would have printed version information for that implementation of sed instead of a diagnostic saying it didn't know what --version meant. Since you are using Solaris 10 sed utilities, the command:

what /usr/bin/sed /usr/xpg4/bin/sed

will probably show you SCCS version information for those two sed utilities. (Some sys admins strip this information when installing Solaris systems to save a little disk space.)