How to search for multiple lines and put them into one paragraph?

Dear all,

I'm trying to manipulate a data file and putting a certain lines into one paragraph.
What am I actually want to do is that search some lines in a data file. These lines begin with "1\1\GINC-" and end with "\\@" or the following two empty lines as shown in blue.

A part of the text shows as follows, (the sign "==>| |<==" is used to show that each line begins with a space, which is not contained in the original file. The original file also shown at the end of this post for your script testing.)

==>| Symmetry A    KE= 5.003963609647D+02|<==
==>| Symmetry B    KE= 2.202693107013D+02|<==
==>| Leave Link  601 at Sat Nov 27 09:26:00 2010, MaxMem=  671088640 cpu:|<==
==>| (Enter /home/bpliu/gdata/gaussian/g09/l9999.exe)|<==
==>| 1\1\GINC-NODE59\SP\UCCSD(T)-FC\GenECP\Cr1H2O2Si1(3)\BPLIU\27-Nov-2010\|<==
==>| 0\\#P scf=(maxcycle=1000,qc) uccsd-t(maxcycle=1000)/genecp gfprint\\3a|<==
==>| 0-b3lyp\\0,3\Si,0,0.014066,-1.355809,0.\O,0,1.238257,-0.219629,0.00000|<==
==>| 1\Cr,0,-0.01268,1.131665,0.\O,0,-1.234272,-0.246214,-0.000002\H,0,0.02|<==
==>| 3229,-2.208103,1.211679\H,0,0.023232,-2.208103,-1.21168\\Version=EM64L|<==
==>| -G09RevA.01\State=3-A\HF=-525.4164345\MP2=-526.0063196\MP3=-526.013047|<==
==>| 9\MP4D=-526.0298231\MP4DQ=-526.0236019\PUHF=-525.3995614\PMP2-0=-525.9|<==
==>| 887651\PMP3-0=-525.9948332\MP4SDQ=-526.0382803\CCSD=-526.0625561\CCSD(|<==
==>| T)=-526.1090413\S2=2.885749\S2-1=2.957951\S2A=2.012381\RMSD=0.000e+00\|<==
==>| PG=C02 [C2(Si1Cr1),X(H2O2)]\\@|<==
==>||<==
==>||<==
==>| ... UNTIL SCIENCE IS MIXED WITH EMOTION AND APPEALS TO THE HEART AND|<==
==>| IMAGINATION , IT IS LIKE DEAD INORGANIC MATTER; AND WHEN IT IS SO MIXED|<==
==>| AND SO TRANSFORMED IT IS LITERATURE.|<==

The expect output file is,

==>|1\1\GINC-NODE59\SP\UCCSD(T)-FC\GenECP\Cr1H2O2Si1(3)\BPLIU\27-Nov-2010\0\\#P scf=(maxcycle=1000,qc) uccsd-t(maxcycle=1000)/genecp gfprint\\3a0-b3lyp\\0,3\Si,0,0.014066,-1.355809,0.\O,0,1.238257,-0.219629,0.000001\Cr,0,-0.01268,1.131665,0.\O,0,-1.234272,-0.246214,-0.000002\H,0,0.023229,-2.208103,1.211679\H,0,0.023232,-2.208103,-1.21168\\Version=EM64L-G09RevA.01\State=3-A\HF=-525.4164345\MP2=-526.0063196\MP3=-526.0130479\MP4D=-526.0298231\MP4DQ=-526.0236019\PUHF=-525.3995614\PMP2-0=-525.9887651\PMP3-0=-525.9948332\MP4SDQ=-526.0382803\CCSD=-526.0625561\CCSD(T)=-526.1090413\S2=2.885749\S2-1=2.957951\S2A=2.012381\RMSD=0.000e+00\PG=C02 [C2(Si1Cr1),X(H2O2)]\\@|<==

the original file is put as follows for testing.

 Symmetry A    KE= 5.003963609647D+02
 Symmetry B    KE= 2.202693107013D+02
 Leave Link  601 at Sat Nov 27 09:26:00 2010, MaxMem=  671088640 cpu:
 (Enter /home/bpliu/gdata/gaussian/g09/l9999.exe)
  1\1\GINC-NODE59\SP\UCCSD(T)-FC\GenECP\Cr1H2O2Si1(3)\BPLIU\27-Nov-2010\
 0\\#P scf=(maxcycle=1000,qc) uccsd-t(maxcycle=1000)/genecp gfprint\\3a
 0-b3lyp\\0,3\Si,0,0.014066,-1.355809,0.\O,0,1.238257,-0.219629,0.00000
 1\Cr,0,-0.01268,1.131665,0.\O,0,-1.234272,-0.246214,-0.000002\H,0,0.02
 3229,-2.208103,1.211679\H,0,0.023232,-2.208103,-1.21168\\Version=EM64L
 -G09RevA.01\State=3-A\HF=-525.4164345\MP2=-526.0063196\MP3=-526.013047
 9\MP4D=-526.0298231\MP4DQ=-526.0236019\PUHF=-525.3995614\PMP2-0=-525.9
 887651\PMP3-0=-525.9948332\MP4SDQ=-526.0382803\CCSD=-526.0625561\CCSD(
 T)=-526.1090413\S2=2.885749\S2-1=2.957951\S2A=2.012381\RMSD=0.000e+00\
 PG=C02 [C2(Si1Cr1),X(H2O2)]\\@


 ... UNTIL SCIENCE IS MIXED WITH EMOTION AND APPEALS TO THE HEART AND
 IMAGINATION , IT IS LIKE DEAD INORGANIC MATTER; AND WHEN IT IS SO MIXED
 AND SO TRANSFORMED IT IS LITERATURE.

Sorry for my poor English. I hope I have made myself clear. In short, get the blue part and put into one paragraph.
Thank you in advanced for your kind help.

ZHEN

What have you attempted so far?

I may misunderstand your meaning by "What have you attempted so far?". I was thing I posted an unclear question on my problem.

The problem still there. I'm not get the correct script to do that. Please help.

ZHEN

Zhen, your question was, clear, I was just asking what you had tried yourself...

---------- Post updated at 13:29 ---------- Previous update was at 12:41 ----------

You could give this a try:

awk '!NF{p=0}p{$1=$1;print}/^ \(/{p=1}' ORS= infile > outfile

Thanks a lot. It works well.

I also search google and find another way to do this work,

sed -n '/^\ 1\\1\\GINC-/,/\\\\\@/p'   < input                       >  temp
tr "\n" " "                           < temp   | sed "s. ..g"       >  output
rm temp

Both work for me!

Thanks for your kind help!

ZHEN

@scrutinizer

Can you please explain your command ?

'!NF{p=0}p{$1=$1;print}/^ \(/{p=1}'

The idea is to use the 'sed' work to get some continuous data begin with "1\1\GINC-" and end with "\\@". These two points are unique in my data sheet. So, It works!
The second line 'tr' is used to delete the "\n" at each line and put these multiple lines into single line. The parallel command 'sed "s. ..g"' is used to remove the empty space in the text, which is introduced in the original data file. Although it may remove all the single white space in the data file and put some some words into a continuous string. I don't mind this, becasuse my interest is to get some numbers stored in the data, which may be beaked by the "\n" at the end of each line and the "withe space" the the beginning of each line. So, Work is done.

Thanks. If I was not able to make myself clear, pardon me for my English. Learning a good English is my dream.

ZHEN

I guess it's clear even if p{$1=$1;print} seems peculiar to me...

Thanks scrutinizer :wink:

You're welcome. If p=1 then the part in curly brackets gets executed. $1=$1 is a way of applying formatting with OFS as field separator... (See http://www.unix.com/shell-programming-scripting/148216-view-ouput-file.html#7, perhaps I explained it more clearly there.)

@scrutinizer:
Thanks for your answer again. One more thing, can p{$1=$1;print} be "read" like if p is true then print $1=$1 ?

I followed your previous link and I understand your command but why can we do $2=$2, $3=3 etc but not $0=0 ?

Almost, it can be read like if p is true then $1=$1 and then print $0.
Second question: because $0 signifies the record itself, not one of the fields...