Hi.
This is a long solution, but modular for generality. It consists primarily of 2 perl scripts. The first reads paragraphs and creates single lines of them, using a character for the newline (I used "%"). The second perl script does the opposite, takes the long "%"-embedded lines, and creates separate lines.
With those two on the outside, we can manipulate the file as we wish with line-oriented *nix tools. In this demonstration a grep is used to eliminate paragraphs (long lines) matching the phrases you wish.
Here are the perl scripts:
#!/usr/bin/perl
# @(#) p1 Demonstrate paragraphs into lines, substitution for newline.
use warnings;
use strict;
my($debug);
$debug = 0;
$debug = 1;
my($FAKE_RS) = "%";
# read paragraphs
$/ = "\n\n";
while ( <> ) {
s/\n/$FAKE_RS/msg;
print "$_\n";
}
exit(0);
and:
#!/usr/bin/perl
# @(#) p2 Demonstrate read lines, substituting newline.
use warnings;
use strict;
my($debug);
$debug = 0;
$debug = 1;
my($FAKE_RS) = "%";
while ( <> ) {
s/$FAKE_RS$FAKE_RS/\n/msg;
s/$FAKE_RS/\n/msg;
print "$_";
}
exit(0);
These can be driven by a shell script:
#!/bin/bash -
# @(#) s1 Demonstrate pipeline for paragraph matching.
echo
echo "(Versions displayed with local utility \"version\")"
version >/dev/null 2>&1 && version =o $(_eat $0 $1)
set -o nounset
echo
FILE1=data1
FILE2=data2
echo
echo " Data file $FILE1:"
cat $FILE1
echo
echo " Data file $FILE2:"
cat $FILE2
echo
echo " Results:"
./p1 $FILE1 |
tee t1 |
grep -v -f $FILE2 |
tee t2 |
./p2
exit 0
Producing:
% ./s1
(Versions displayed with local utility "version")
Linux 2.6.11-x1
GNU bash 2.05b.0
Data file data1:
Switch not possible Error code 1234
Process number 678
Log not available Error code 567
Process number 874
Log not available Error code 333
Process number 34
Log not available Error code 33334234
Process number 012
Log not available Error code 333 hello
Process number 012
Log not available Error code 567
Process number 8743434
Log not available Error code 567
Process number 874 ok
Log not available Error code 999
Process number 777 missing
Data file data2:
Error code 1234
Error code 333
Process number 874%
Results:
Log not available Error code 567
Process number 8743434
Log not available Error code 567
Process number 874 ok
Log not available Error code 999
Process number 777 missing
The patterns to be excluded could contain a "%" to force exclusion of patterns ending at the point corresponding to a newline, as with pattern 3. If the character "%" occurs in either data file, then changes would need to be made to the perl scripts and the pattern data file. However, the manipulation is generally unaware that it is operating on paragraphs -- it sees everything as just a line, a long line to be sure, but just a line. This places the complexity outside the scope of the real operation you wish to perform.
The intermediate files t1 and t2 may be viewed to see in more detail how the process works. The tee commands may be removed when desired -- just delete the lines, that's why they are on separate lines in the pipeline.
The paragraphs must be separated by empty lines, no spaces, TABs, etc. are allowed, only a newline.
See man pages for details ... cheers, drl