Substitute a character with sed

Francesco_IT · March 14, 2019, 6:48am

hi all,
i'd like to modify a file with sed , i want to substuite a char "-" with "/"
how can i do this?
Thanks for all
regards
Francesco

nezabudka · March 14, 2019, 6:56am

Hi,
the command will overwrite the file

sed -i 's/-/\//g' file

will bring in stdout

tr '-' '/' <file

anbu23 · March 14, 2019, 7:48am

sed -i 's#-#/#g' file

bakunin · March 14, 2019, 8:47am

Please note that the -i switch is NOT part of the standard sed and is only understood by GNU- sed . In addition i strongly suggest NOT to use it even if it is available:

sed -i '....'  /path/to/file                              # not recommended

if sed '....' /path/to/file > /path/to/file.tmp ; then    # recommended way
     mv  /path/to/file.tmp  /path/to/file
     chown <correct ownership> /path/to/file              # optional
     chmod <correct filemode> /path/to/file               # optional
else
     <error handling procedure here>
fi

Basically sed always relies on a second file for output because it cannot work how it does work (that is: as a pipeline) in any other way. The -i switch does not change that at all, it just makes sed do it "behind the curtain". In fact a second (temporary) file is created and then moved over the original file.

The problem with this is that: the inode number (and all the other metadata) of the file is changed, of course. If you do it manually as i suggested this is obvious because you obviously have a new file with which you replace the old one. With the -i switch you seem to have the same file - but you don't! So, if you rely on the file ownership or the filemode staying constant but have a directory with sticky bits set or if you rely that the inode number stays constant you are going to be disappointed.

The even bigger problem (IMHO) is that a temporary file gets created but - unlike with the standard way of doing things - you have no control over the process. In case of an error the temporary file might stay, wasting space. Or it may be created in a filesystem where you don't expect it and this filesystem may be too small. Or....

As a last concern i'd like to offer: it is not portable. The way i sketched out above will work everywhere. The method using the -i swithc will only work on select systems. This usually will not pop up for years but the incompatibility will raise its ugly head in the moment you need that least - and it will inadvertently bite you in the behind.

(Sad truth: this will happen anyway sooner or later. If you can minimise the opportunities it can happen it is likely to hurt a bit less - and i for my part am pretty particular when it comes to parts of me i like to use for sitting.)

I hope this helps.

bakunin
I hope this helps.

joker · March 14, 2019, 9:44am

Hi bakunin,

thanks for the remarks about different aspects of sed -i .

Current GNU sed reapplies permissions and ownership after in-place edit. (Inodenumber is changed of course). I was not aware of that until know. Thanks for the hint. Grateful to the GNU-Developers, that they did a good job - here too.

See here(example as root user, so ownership-reapply is demonstrated):

# sed --version
# sed (GNU sed) 4.4

# ls -l test
-r--r--r-- 1 www-data www-data 5 Mär 14 14:16 test

# strace -o strace.out sed -r -i  's/bla/blub/' test

# ls -l test
-r--r--r-- 1 www-data www-data 5 Mär 14 14:17 test

# relevant parts within strace.out
# grep -E "(chmod|chown)" strace.out 
fchown(4, 33, 33)                       = 0
fchmod(4, 0100444)                      = 0

Yes, that's true. And as far as I understand, you are working in a more "pure" Unix-Environment (AIX, Solaris, ...) where this is very important to have scripts running on different platforms. And I assume scripts written will run on different platforms very often.

For my part I heavily rely on the GNU-Tools and what affects me, I'm working nearly 100% in a Linux environment(with different distributions) where the gnu collection is the default toolset in all cases. I think compatibility of my many scripts will break in lots of places because there are lots of differences in tools, different commands, same commands/different switches, ... . I think I would have a lot of extra work if I - for one thing - refrain from using the advanced feature set of the gnu tools and - for the other - to care about compatibility with systems I probably will never get in touch with.

Different worlds - Different rules ?

bakunin · March 14, 2019, 12:13pm

This is a new development it seems and thank you for the update. Older versions (upon the knowledge of which i drew when i wrote the above) didn't, though.

Not always, but most times and, yes, that shaped my habits. It is not about "pure UNIX" versus "Linux", though, it is more about "every UNIX and every Linux too". When i write scripts i write them with absolutely every possibility in mind and if i have only, say, Solaris and AIX to serve but i know something won't work that way in HP-UX i will try to find a way that will work there too. Otherwise chances are you are finished with the script and two days later HP-UX is introduced and you restart from zero.

If i have to deal with things which are different in every system i try to encapsulate them as much as possible. Most times i try to isolate the logic and use layer-functions which then only do the OS-dependent parts like this:

do_something ()
case "$OS" in
     AIX)
          <AIX-related stuff here>
          ;;

     SunOS)
          <SunOS-related stuff here>
          ;;

     .....
esac

Everything else is decided/processed/triggered/... above and finally the OS-dependent part is called.

Well, the work you talk about you have only once. Second, i try to write my scripts like any software developer writes his programs: as general (and generalised) as possible. As a freelancer i regularly change the customer and work in different surroundings. It often happened to me that i was asked to write a certain procedure for systems A, B, and C and i just smiled, took out a USB-stick and said "on this you'll find exactly that". I might have written it for systems D, E and F, but as i write as generalised and OS-agnostic as possible it usually works on A, B and C too or at least does so with minor changes. If it takes you 10 minutes to adapt a script you would have worked two weeks to create anew (because it took you two weeks to write it in first place) you know that the one day you spent making it as independent from the actual surroundings as possible was well invested. It was well invested not only for me but also the customer who got a flexible and extensible solution.

Before doing systems administration i was programmer and i write scripts the same way i wrote programs: with adaptability and maintainability in mind. If you have a 50-lines script it doesn't matter if you produce a well-written piece of software or a kludge. If your scripts hit and break the 1000-lines mark you better have your source-code under control because otherwise you will never get finished. Yes, i have several 1000-lines scripts doing things for me.

And the old if it was hard to write it should be hard to read (and even harder to understand) is a great source of amusement but once you stopped chuckling over it you do as it should be done.

As i said here we are essentially artists as "technics" is derived from the greek word for "art". A painting isn't finished when everything is covered with paint but when you - as the painter - feel that sense of accomplishment and satisfaction with what you did. A program (=script) will be finished not when it runs without an error but when it runs AND has that inherent (and hard to describe) quality of "rightness", is documented, has error handling, will be able to deal with all conceivable conditions and more. In one word, it is finished not when it is running but when it is "good".

The great sculptist Bernini, considered to be one of the greatest of all times didn't stop when the plaster was used up but when he thought there was nothing left to do on a statue. This was what made him "the Bernini" and what took him above other sculptors - not the number of works, not the amount of material used. Write your programs like you do any other art: put your everything, the essence of your being, into it and don't stop until you think you could make your whole existence depending on it.

bakunin

MadeInGermany · March 14, 2019, 6:10pm

Bakunin's replacement code in post#4 is portable but still creates a new inode for the input file ("/path/to/file")
This not only denotes a repair of the original owner/group/permissions, it will make a previously hard-linked file stand-alone, and this damage cannot be repaired.
Also GNU sed -i (and perl -i) cannot preserve the original linkage.
If we are going for a real improvement then we avoid the mv command!

if sed '....' /path/to/file > /path/to/file.tmp ; then
     cp  /path/to/file.tmp  /path/to/file
else
     <error handling procedure here>
fi

The cp command preserves the original inode (owner/group/permissions and linkage) - only the timestamp changes, and that's pretty correct.

joker · March 14, 2019, 7:09pm

GNU sed has the -c/--copy Option which keeps hard+soft links. But this is not an atomic operation. (the mv-Variant is atomic).

Alternatively sponge is there too. (Not very portable, too ).

Sponge soaks up stdin and after that writes to stdout:

sed -e s/bla/blub/ file.txt | sponge file.txt

One should know when atomic updates are more important than keeping files/inodes and vice versa. Use the solution which fits your case.

Note
GNU sed version 4.7.6(current sed git version) does not contain --copy. There is a man-page on the web(This one: sed(1) - Linux man page) which states there is a --copy. But it seems not to be really there.

nezabudka · March 14, 2019, 11:19pm

another option

sed 'y%-%/%' file

RavinderSingh13 · March 14, 2019, 11:30pm

bakunin:

Please note that the -i switch is NOT part of the standard sed and is only understood by GNU- sed . In addition i strongly suggest NOT to use it even if it is available:
sed -i '....'  /path/to/file                              # not recommended

if sed '....' /path/to/file > /path/to/file.tmp ; then    # recommended way
   mv  /path/to/file.tmp  /path/to/file
   chown <correct ownership> /path/to/file              # optional
   chmod <correct filemode> /path/to/file               # optional
else
   <error handling procedure here>
fi
................................................

Hello Bakunin sir,

With full respect I competely agree with you on this part, would like to add 1 point here that GNU sed is having -i.bak option too, which takes backup of Input_file before editing it in-place, here is an example for same.

Let's say following is my Input_file:

cat Input_file
A                           B        C                  D

Now when I run following command:

sed -i.bak 's/^A/Singh/'   Input_file

I could see backup file named Input_file.bak is created with actual content and Input_file is in-placed with new string. IMHO may be we could use this for safer side?

-rw-r--r--  1 singh  test_singh_bla   411 Mar 15 09:00 Input_file.bak
-rw-r--r--  1 singh  test_singh_bla   415 Mar 15 09:01 Input_file

Thanks,
R. Singh

bakunin · March 15, 2019, 7:17am

First: I haven't known that, so thank you for teaching me something new.

The point was not that it takes or takes not a backup, that it changes or doesn't change the inode (as MadeInGermany mentioned - thank you for providing a way to circumvent that too, should the need arise), though. It should be worth noting, though, that using a mv within filesystem boundaries is faster than cp . Across filesystems mv and cp are probably equally fast.

For me the main point is portability. I can choose between using the POSIX-way of doing things and that works on all systems and using the GNU-way and that works on less than all systems. For the same reason i would rather hesitate to use a feature in the POSIX-standard which i know would not work (or work differently) on GNU. This is not limited to sed options or something particular at all. The same goes for anything else i do on a UNIX (or Linux) system.

The second reason i would not suggest to use -i is NOT, that it rolls some usually necessary actions into one: i talked about not preserving the inode or filemode or owner, but it won't always matter if they are being preserved or not. The point is it makes it look like it would be the same file while it is not. One can do everything and anything but one: lie to your users. If you do something, whatever it is, do it - but don't pretend you do something else.

bakunin

joker · March 15, 2019, 7:57am

You're right. The exact method is not fully explained in the man-page. There's only a very short sentence about -i .

In the sed info-page however it is explained more in detail.

'-i[SUFFIX]'
'--in-place[=SUFFIX]'
     This option specifies that files are to be edited in-place.  GNU
     'sed' does this by creating a temporary file and sending output to
      this file rather than to the standard output.(1).

I would feel better if somone's called a liar not so easily.

---

What's regarding portability, I think the message is clear:
If one wants portability: Stick to POSIX. Don't use GNU, bash, php, python, ruby, whatever, ...

jim_mcnamara · March 15, 2019, 9:06am

Speaking strictly as a scientist: python ruby, and julia are extremely important for research and analysis.

GNU supports the environment variable POSIXLY_CORRECT . This helps a lot when you are porting code back and forth between several platforms. I would suggest that you check it out. Defining it changes behavior to be pretty close to POSIX. Not perfect, IMO.

Linux is effectively inescapable in modern data centers. Even appliances like routers run on embedded Linux. So does your refrigerator ....

GNU.org position on POSIX:
GNU Coding Standards: Non-GNU Standards

Some OSes kind of go off the deep end in attempt to be a sort of "all versions of standards" - which is kind of what you are discussing here.

Solaris is an example:

Compliance with standards in that OS - for a new user or somebody porting an app - can be confusing. You get different versions of awk , tr and so on. This depends on which of several possible paths, and their order, ex: /usr/xpg4/bin versus /usr/bin , you have set up in PATH . This can break a lot of things that worked correctly on HP-UX but not Solaris, for example. I spent a lot of time tweaking the PATH variable for different applications and their associated users

XPG4 -
man pages section 5: Standards, Environments, and Macros

Don Cragun here actively works on standards, and definitely will have have some opinions here.

joker · March 15, 2019, 10:08am

Maybe the Standards-Part should be separated from the thread? (Last time I requested Thread separation there was a broken thread afterwards in the forum, which might have nothing to do with that, but I just wanted to mention).

--- Post updated at 03:08 PM ---

I would throw in another question:

Does POSIX contain a programming language environment?
Is there a programming language available in every environment?

In contrast to Linux the situation of HP-UX,Solaris,AIX, ... in terms of easy available and easy installable software for me(as linux-only-user) seems like the availablity of water in the ocean and the desert.

So I'm wondering if there is any programming language installed in those unices so there maybe one programming environment available everywhere. I'm not talking about sh/ksh/bash. I do not consider them as real programming language. Yes, one can write very large shell scripts, but I think it's a mess. Bad maintainability. Slow Speed. High Resource Footprint(New process creatings for most things). Unsafe Programming environment. Inferior programming feature set.

I would imagine perl 5(perl 5 is 25 years old now. Having perl means: just the perl-base, no modules) could be everywhere. Despite being the opposite of my favourite programming language, if I had/wanted to write portable code, perl would be a lot better than any shell. The developer of inxi(hardware information tool,Inxi at github) went that way - inxi a 20,000 line perl script - written targeted at very high compatibility(Before switching to perl he had used bash and gnu awk).

MadeInGermany · March 15, 2019, 10:29am

Yes, /usr/bin/perl -> perl5 should be everywhere. But do not expect certain perl modules!

Don_Cragun · March 15, 2019, 8:59pm

When I was working at Sun, you could set PATH to include various *bin directories in various different orders to get an environment that met the requirements of several standards including (but not limited to) SVID3, XPG3, XPG4, XPG5, SUSv1, SUSv2, each of the revisions of the POSIX.1 and POSIX.2 standards (and most of the other POSIX standards except for POSIX.3), and also some versions of BSD and UNIX System V.

I make no claims about what has happened to that guaranteed backwards compatibility between releases since Oracle bought out Sun. (It may still work, I just don't know.)

If you don't consider shell and awk to be programming languages, the POSIX standards do still provide the C programming language. The current version of the POSIX standards are built on top of the 1999 ISO C Standard. The next revision of the POSIX standards will be built on top of the 2011 or 2017 C standard (depending on whether the Austin Group or the ISO C committee gets their next revision ready first. At one time there were POSIX working groups working on Ada and C++ bindings to POSIX.1, but those groups are no longer active. The Austin Group does still work closely with both the ISO C and ISO C++ working groups and although there is no C++ compiler specified in the POSIX.2 standard, the ISO C++ standard libraries are intended to be fully compatible with the POSIX.1 standards.

I was once offered up as a sacrificial lamb to attend some meetings helping to define standards for the Linux operating system. I was immediately deemed to be an enemy of everything that committee wanted to do. I believed that standards were created to help users write portable code; everyone else in the room thought standards were written to constrain operating systems development. Where the SVID, XPG, SUS, and POSIX standard specified requirements that conforming operating systems had to supply; the LSB (Linux Standards Base) went a different way. The LSB defined the names of some functions that the Linux operating system had to supply, but only gave very brief descriptions of what those functions did. The definition of how those functions behaved is defined by the source code for the Linux operating system; not by any document that an applications programmer could use to determine how to write a portable application that would run on any Linux system. If you write an application that conforms to the LSB and it works on one Linux system today, there is no guarantee that it will work on any Linux operating system tomorrow.

That was twenty-five years ago. Today, some Linux vendors actively participate in the Austin Group and try to mostly comply with POSIX requirements. (Note mostly; not completely. No Linux vendor has gotten a POSIX or UNIX certification demonstrating that their implementation passes the verification suites testing conformance to the POSIX standards nor the Single UNIX Specifications yet.)

Note that all of the above is my own biased opinion. Other people may well have different opinions.

joker · March 19, 2019, 5:55am

Thanks all for the participation.

I hope I did not make my statements in the way they are the truth, but just did show them as my own preferences(e. g. the statement that "Shell scripting is now programming language") . Others may have other preferences. (@Bakunin: I'm sure you have a great script library of portable scripts he can use when he needs them in any environment).

In overall my views have not changed. Portability may be an important goal in writing scripts. But one does not get it for free. And it's not cheap, because it impacts other objectives one might have in programming, which may be...

Ease of programming
Efficiency of Resources
Security
Robustness
Maintainability/Readability
Slim Programming Runtime Environment

But one can do easy things to keep portability and need not to break it lightly (Using BRE instead of ERE,...).

Despite being a bit interested in standardization, I was only reading news about that topic here and there. Thanks for the details Don. I gave up on that 20 years ago. Linux is absolutely great in terms of standardization like every distribution has it's own standard. \(See [xkcd: Standards](https://xkcd.com/927/)\).  Just kidding. The linux folks are not even able to get together in which file the hostname of the system should reside. In the end the decision was made by using systemd which had it at the fixed location /etc/hostname.

In all my time it has never come to a common way e. g. to configure persistent networking. There are about 50 different ways to do it.

I do not know that. But I assume, you know what you are speaking of. The question for me is, what is needed as a runtime which is needed as a base - as a compatibility layer. I think it's not that good to have 100 GB of every possible and well working programming language installed in every system. I would rather like to have that base not too fat.