Pattern Replacement

palash2k · April 24, 2008, 2:51am

There is a requirement that i need to replaced a pattern by another pattern in all the files in my entire file system. there are 1000s of file in the system. let the pattern is "calcuta". i have to replace this pattern by "kolkata" in all those files which contain "calcuta".

I am only able to find out all those list file in the entire file system which contain the pattern using the command.

find / -type f | xargs grep -il "calcuta"

and usind vi i can do the job. but it is very lengthy process. everytime i have open file to replace the pattern.

can anybody please help me how i can easily do the job? is it possible to do it in sinle command line. is there any extension that i tried to do the rest?

era · April 24, 2008, 3:22am

If your sed supports the -i option, it can do the replacement in-place. Otherwise you will need a small script which writes to a temporary file and then replaces the original file with that.

find / -type f | xargs sed -i 's/calcuta/kolkata/g'

The sed manual page is probably not a good place to start learning, but any decent Unix tutorial or book will have a section about sed.

rubin · April 24, 2008, 11:08pm

Or try this:

find . -type f  | while read files
  do
     echo "$(awk '{gsub(/calcuta/,"kolkata")}1' "$files")" > "$files"
  done

Tested on a number of files about 1Mb in size.

era · April 25, 2008, 12:15am

What's the echo for? Why not just

awk '{ gsub(/calculta/, "kolkata) }1' "$files" >"$files"

... within the same while loop?

In particular, echo on some systems does a fair amount of backslash parsing etc, which will cause undesired changes in the output.

rubin · April 25, 2008, 1:23am

Because going that way :

awk ...       "$files" > "$files"

the files will get completely emptied out. Just test it and see it for yourself.

While the command was written quickly, I agree it's not the best solution. Anyway, with proper quoting on the files that I tested, I got the results that I expected.

era · April 25, 2008, 2:23am

Ah yes, of course. You could also simply use a temporary file, or wiggle with redirection.

rubin · April 25, 2008, 1:31pm

find / -type f | while read files
 do
  awk -v a="$files" '{ gsub(/calcuta/,"kolkata") ; print > a".tmp" }' "$files"
  mv "$files".tmp "$files" 
 done

or

find / -type f | xargs -i awk -v a='{}' '{ gsub(/calcuta/,"kolkata"); print > a".tmp" } END{print "mv", a".tmp", a }' '{}' | sh

Tested on Ubuntu and BSD.

bakunin · April 25, 2008, 2:55pm

I can't understand why "find ... | xargs ..." is so widespread. "find" has a "-exec" clause as long as i can think (and that is pretty long), so why not use it?

find / -type f -exec fTgt={} ; sed 's/calcuta/kolkata/g' $fTgt > ${fTgt}.tmp ; mv ${fTgt}.tmp ${fTgt} \;

I hope this helps.

bakunin

rubin · April 25, 2008, 5:57pm

Very good point, but -exec does have an issue, better explained in this extract taken from softpanorama.org:

Link: softpanorama.org/Tools/Find/find_mini_tutorial.shtml

Also unixreview.com has a nice article on this regard ( Examining the "Too Many Arguments" Problem ), and on xargs in general, as a powerful unix tool.

Link: unixreview.com/documents/s=8274/sam0306g/

Hope you'll find them useful.

reborg · April 25, 2008, 7:21pm

The same result can be achieved without the pipe or xargs if the find command on a systems supports the

-exec <command> +

syntax.

era · April 26, 2008, 12:04am

In this particular context you extrapolate the individual file name into the command anyway, so the multiple files at a time argument doesn't hold.

Personally, I simply find xargs easier to develop and debug. Also find's -exec syntax is kind of awkward. Constructing a command line and piping it to sh is also kind of a lazy man's solution, but often easy to write and understand.

reborg · April 26, 2008, 9:39am

Are you sure about that, wouldn't that depend what exactly you do for the command?

       -exec command {} +
              This variant of the -exec option runs the specified  command  on
              the  selected  files, but the command line is built by appending
              each selected file name at the end; the total number of  invoca-
              tions  of  the  command  will  be  much  less than the number of
              matched files.  The command line is built in much the  same  way
              that  xargs builds its command lines.  Only one instance of {}
              is allowed within the command.  The command is executed  in  the
              starting directory.

of example:

find / -type f -exec perl -pi -e 's/calcuta/kolkata/g' {} +

or with gnu sed:

find / -type f -exec sed -i 's/calcuta/kolkata/g' {} +

era · April 26, 2008, 12:05pm

Fair enough, you win (and this was news to me; hadn't stumbled over -exec + before -- thanks for the tip!)