Search and replace multi-line text in files

marz · October 6, 2005, 10:51am

Hello
I need to search for a mult-line text in a file exfile1 and replace that text with another text. The text to search for is in exfile2 and the replacement text is in exfile3.

I work with kornshell under AIX and need to do this with a lot of files. (the file type is postscript and they need to be edited before printing with our old card plotter that cannot manage bitmaps)

exfile1:
asdasdasdasd
asdasdasdasd
abc
def
ghi
sdasdasdasda
asdasdasdada

exfile2:
abc
def
ghi

exfile3:
jkl
mno
pqr

I have tried with sed with little sucess.
Any ideas?

Ygor · October 6, 2005, 8:50pm

You can't use sed because its processing is line-based. You can use awk if you unset the record separator, like this...

awk ' BEGIN { RS="" }
      FILENAME==ARGV[1] { s=$0 }
      FILENAME==ARGV[2] { r=$0 }
      FILENAME==ARGV[3] { sub(s,r) ; print }
    ' exfile2 exfile3 exfile1 > exfile4

..which gives...

asdasdasdasd
asdasdasdasd
jkl
mno
pqr
sdasdasdasda
asdasdasdada

marz · October 7, 2005, 2:12am

Thank you very much for the solution.

Works great!

marz · October 7, 2005, 10:06am

I have tested a little more and I have a problem.
All the files are bigger than 10,239 bytes and cannot be processed by the awk function.

Error:
"awk: 0602-534 Input line xxxxxxxx cannot be longer than 10,239 bytes."

Any idea to solve this problem?

Best regards
marz

akrathi · October 7, 2005, 3:22pm

I am doing awk on file size of 84461608 bytes without
any trouble . Can you please expalin what all you are doing on that ?

reborg · October 7, 2005, 4:41pm

My guess is that you are:

Working on a different platform
You are woking with gnu awk, the OP is not.

Perderabo · October 7, 2005, 5:05pm

My guess is that he is not using RS="" and his file does not contain a line with more than 10,239 bytes.

marz · October 9, 2005, 1:23pm

I'am running AIX 5.2 on IBM RS/6000.
I ran this script with RS="" and it worked great on small files. But when i started testing with bigger files this error occured.
the search, replace and input files are alla bigger than 10k.
I will test more on monday.

Perderabo · October 9, 2005, 2:50pm

Maybe this will do it...

#! /usr/bin/ksh

DATA=data
OLD=old
NEW=new
TMP=repfiletmp$$


target1=$(sed 1q < $OLD)
targetnum=$(wc -l < $OLD)
echo target: $targetnum $target1
echo tmpfile = $TMP


exec < $DATA
IFS=""
while read line ; do
        if [[ $line = $target1 ]] ; then
                tmpct=1
                echo $line > $TMP
                while ((tmpct<targetnum)) ; do
                        read line && echo $line >> $TMP
                        ((tmpct=tmpct+1))
                done
                if cmp -s $OLD $TMP ; then
                        cat $NEW
                else
                        cat $TMP
                fi
                rm $TMP

        else
                echo $line
        fi
done

exit 0

Ygor · October 9, 2005, 10:22pm

This is the same idea as before but using perl instead of awk...

perl -e '
   undef $/;
   open(SEAR, "< exfile2");
   open(REPL, "< exfile3");
   open(INFI, "< exfile1");
   open(OUTF, "> exfile4");
   $sear = <SEAR>;
   $repl = <REPL>;
   $data = <INFI>;
   $data =~ s/$sear/$repl/g;
   print OUTF $data;
'

marz · October 10, 2005, 8:05am

It almost did. Postscript files contain \n \ and / etc. With some modifications this code works!

I think the perl variant that Ygor posted is faster. But i have not been able to get it to work with \n \ and / and such.

DATA=ain.ps
OLD=aold.ps
NEW=anew.ps
OUT=aout.ps
rm -f $OUT

TMP=repfiletmp$$
target1=$(sed 1q < $OLD)
targetnum=$(wc -l < $OLD)
echo target: $targetnum $target1
echo tmpfile = $TMP

exec < $DATA
IFS=""
while read -r line ; do
        if [[ $line = $target1 ]] ; then
                tmpct=1
                print -r -- "$line" > $TMP
                while ((tmpct<targetnum)) ; do
                        read -r line && print -r -- "$line" >> $TMP
                        ((tmpct=tmpct+1))
                done
                if cmp -s $OLD $TMP ; then
                        cat $NEW >> $OUT
                else
                        cat $TMP >> $OUT
                fi
                rm $TMP
        else
                print -r -- "$line" >> $OUT
        fi
done
exit 0