marz
October 6, 2005, 10:51am
1
Hello
I need to search for a mult-line text in a file exfile1 and replace that text with another text. The text to search for is in exfile2 and the replacement text is in exfile3.
I work with kornshell under AIX and need to do this with a lot of files. (the file type is postscript and they need to be edited before printing with our old card plotter that cannot manage bitmaps)
exfile1:
asdasdasdasd
asdasdasdasd
abc
def
ghi
sdasdasdasda
asdasdasdada
exfile2:
abc
def
ghi
exfile3:
jkl
mno
pqr
I have tried with sed with little sucess.
Any ideas?
Ygor
October 6, 2005, 8:50pm
2
You can't use sed because its processing is line-based. You can use awk if you unset the record separator, like this...
awk ' BEGIN { RS="" }
FILENAME==ARGV[1] { s=$0 }
FILENAME==ARGV[2] { r=$0 }
FILENAME==ARGV[3] { sub(s,r) ; print }
' exfile2 exfile3 exfile1 > exfile4
..which gives...
asdasdasdasd
asdasdasdasd
jkl
mno
pqr
sdasdasdasda
asdasdasdada
marz
October 7, 2005, 2:12am
3
Thank you very much for the solution.
Works great!
marz
October 7, 2005, 10:06am
4
I have tested a little more and I have a problem.
All the files are bigger than 10,239 bytes and cannot be processed by the awk function.
Error:
"awk: 0602-534 Input line xxxxxxxx cannot be longer than 10,239 bytes."
Any idea to solve this problem?
Best regards
marz
I am doing awk on file size of 84461608 bytes without
any trouble . Can you please expalin what all you are doing on that ?
reborg
October 7, 2005, 4:41pm
6
My guess is that you are:
Working on a different platform
You are woking with gnu awk, the OP is not.
My guess is that he is not using RS="" and his file does not contain a line with more than 10,239 bytes.
marz
October 9, 2005, 1:23pm
8
I'am running AIX 5.2 on IBM RS/6000.
I ran this script with RS="" and it worked great on small files. But when i started testing with bigger files this error occured.
the search, replace and input files are alla bigger than 10k.
I will test more on monday.
Maybe this will do it...
#! /usr/bin/ksh
DATA=data
OLD=old
NEW=new
TMP=repfiletmp$$
target1=$(sed 1q < $OLD)
targetnum=$(wc -l < $OLD)
echo target: $targetnum $target1
echo tmpfile = $TMP
exec < $DATA
IFS=""
while read line ; do
if [[ $line = $target1 ]] ; then
tmpct=1
echo $line > $TMP
while ((tmpct<targetnum)) ; do
read line && echo $line >> $TMP
((tmpct=tmpct+1))
done
if cmp -s $OLD $TMP ; then
cat $NEW
else
cat $TMP
fi
rm $TMP
else
echo $line
fi
done
exit 0
Ygor
October 9, 2005, 10:22pm
10
This is the same idea as before but using perl instead of awk...
perl -e '
undef $/;
open(SEAR, "< exfile2");
open(REPL, "< exfile3");
open(INFI, "< exfile1");
open(OUTF, "> exfile4");
$sear = <SEAR>;
$repl = <REPL>;
$data = <INFI>;
$data =~ s/$sear/$repl/g;
print OUTF $data;
'
marz
October 10, 2005, 8:05am
11
It almost did. Postscript files contain \n \ and / etc. With some modifications this code works!
I think the perl variant that Ygor posted is faster. But i have not been able to get it to work with \n \ and / and such.
DATA=ain.ps
OLD=aold.ps
NEW=anew.ps
OUT=aout.ps
rm -f $OUT
TMP=repfiletmp$$
target1=$(sed 1q < $OLD)
targetnum=$(wc -l < $OLD)
echo target: $targetnum $target1
echo tmpfile = $TMP
exec < $DATA
IFS=""
while read -r line ; do
if [[ $line = $target1 ]] ; then
tmpct=1
print -r -- "$line" > $TMP
while ((tmpct<targetnum)) ; do
read -r line && print -r -- "$line" >> $TMP
((tmpct=tmpct+1))
done
if cmp -s $OLD $TMP ; then
cat $NEW >> $OUT
else
cat $TMP >> $OUT
fi
rm $TMP
else
print -r -- "$line" >> $OUT
fi
done
exit 0
perderabo:
Maybe this will do it...