Extract a block of text??

Hello all,

I have a large output file from which I would like to extract a single block of text.

An example block of text is shown below:

      ***** EQUILIBRIUM GEOMETRY LOCATED *****
 COORDINATES OF ALL ATOMS ARE (ANGS)
   ATOM   CHARGE       X              Y              Z
 ------------------------------------------------------------
 MOLYBDENUM 42.0   5.9067578125   5.0087332497  17.4699146400
 SULFUR     16.0   7.9742837782   3.7588015097  17.3910898169
 SULFUR     16.0   5.0973219622   3.0091611327  16.3724427108
 SULFUR     16.0   3.8536412225   4.7600928861  18.7261323168
 SULFUR     16.0   6.7241053728   5.6252659948  19.6631739883
 SULFUR     16.0   4.4480017991   6.0998251866  15.8770432027
 SULFUR     16.0   7.3603883558   6.8227401283  16.8054397187
 FLUORINE    9.0   5.8587551406  -0.4318887949  16.7077822115
 FLUORINE    9.0   4.8684829005   0.5366582782  15.0410139777
 FLUORINE    9.0   6.9652096608   0.0213710608  14.8874157686
 FLUORINE    9.0   9.8286766391   1.6474365190  17.5253067335
 FLUORINE    9.0   9.3721734932   1.1952324810  15.4562302461
 FLUORINE    9.0   2.3592720767   6.5544854427  21.3357174293
 FLUORINE    9.0   3.1699656631   4.9028713258  22.5014597961
 FLUORINE    9.0   1.9319720986   4.5053494937  20.7732346644
 FLUORINE    9.0   4.7451572178   7.0343764808  22.6351068043
 FLUORINE    9.0   6.8559715258   6.9105198521  22.1746011766
 FLUORINE    9.0   5.8455988240   5.1957182885  23.0296196726
 CARBON      6.0   7.6418735826   2.1732663119  16.7663636854
 CARBON      6.0   6.3910488857   1.8518385954  16.3063208287
 CARBON      6.0   8.8417049356   1.2443576492  16.7001587395
 CARBON      6.0   6.0232071075   0.4940281964  15.7351792136
 CARBON      6.0   4.1288361674   5.3066236659  20.3512668012
 CARBON      6.0   5.3747193873   5.7159036535  20.7514082208
 CARBON      6.0   2.9007274452   5.3180135016  21.2426525584
 CARBON      6.0   5.7024203684   6.2124662475  22.1483988647
 CARBON      6.0   5.3190614021   7.4034151598  15.1297202687
 CARBON      6.0   6.5982269835   7.7062300006  15.5200187491
 FLUORINE    9.0   8.5364306907  -0.0238472200  17.0558426396
 CARBON      6.0   4.5748883318   8.1326876115  14.0248882589
 FLUORINE    9.0   4.7379257177   9.4734646857  14.0919813199
 FLUORINE    9.0   4.9873657426   7.7398577115  12.7976197748
 FLUORINE    9.0   3.2469470625   7.9053664599  14.0777358797
 CARBON      6.0   7.4147633291   8.8527083808  14.9501654595
 FLUORINE    9.0   7.0294168591  10.0458152013  15.4588563333
 FLUORINE    9.0   8.7276249402   8.7205894905  15.2268083921
 FLUORINE    9.0   7.3114031179   8.9380862348  13.6046918391

What I need is the text under the line

      ***** EQUILIBRIUM GEOMETRY LOCATED *****

until the next blank line i.e, the line right after

 FLUORINE    9.0   7.3114031179   8.9380862348  13.6046918391

. Also, each of the columns of text should be separated by spaces or tabs i.e. the first column should be the atom name "MOLYBEDUM" the second column should be the atomic number "42.0" etc...

Thanks in advance

Ok, I can do the first thing...

 awk '{ if($0 ~ /EQUILIBRIUM GEOMETRY LOCATED/) { while($0 !~ /^\S*$/ && getline) { print $0; } } }' filename.txt

However I dont understand what you mean by "each of the columns of text should be seperated"....by what ? tabs, spaces (justified ?), commas ?

Cheers...

It is always helpful to know what system you're on, so getting in the habit of posting it would be good so we don't have to ask... :wink: grep -m makes starting at a certain point easy but AIX and Solaris may not have it.

Using sed:

sed -n '/EQUILIBRIUM GEOMETRY LOCATED/,/^ *$/p' infile

uname -a gives:

Linux 2.6.18-194.21.1.e15 x86_64 GNU/LINUX

A perl solution that includes some column spacing:

$
$ cat form
#! /usr/bin/perl -wn
BEGIN {$state="skip";};
/^-+$/           and do {$state="proc"; print; next LINE};
/EQUIL/          and do {$state="copy";};
$state eq "skip" and do {next LINE;};
/^\s*$/          and do {last LINE;};
$state eq "copy" and do {print ; next LINE;};
$state eq "proc" and do { printf "%-10s   %6s  %16s %16s %16s\n", split(" ", $_);};
$
$
$
$ ./form < datafile | head
***** EQUILIBRIUM GEOMETRY LOCATED *****
COORDINATES OF ALL ATOMS ARE (ANGS)
ATOM CHARGE X Y Z
------------------------------------------------------------
MOLYBDENUM     42.0      5.9067578125     5.0087332497    17.4699146400
SULFUR         16.0      7.9742837782     3.7588015097    17.3910898169
SULFUR         16.0      5.0973219622     3.0091611327    16.3724427108
SULFUR         16.0      3.8536412225     4.7600928861    18.7261323168
SULFUR         16.0      6.7241053728     5.6252659948    19.6631739883
$

Hi

Thanks very much for your reply but it doesn't seem to work for me.

Nothing happens when I type in this line of code. The sed version doesn't do anything either.

Thanks though

---------- Post updated at 03:12 PM ---------- Previous update was at 03:03 PM ----------

How do I use this? What part do I place in the shell script to make this usable?

Ummm.. it is a shell script. You just run it.

Yes thanks. I realize that.:o But I have more than one output file so I don't understand why there is a part on my example output file within your suggestion.

I'm not sure what the problem is. I did leave in a debug statement and I edited my post to remove it. I piped the scipt though head to cut down on the output which is why only part of your data shows. You should not use head. Run the script yourself without the head, you will see all of the data.

If you must have the perl script embedded in another language it would help if you would reveal which language. Guessing bash... here you go. But note that I'm piping through head again... you still see only part of your data.

$ cat form2
#! /bin/bash
echo I am a bash script
exec < datafile
#
#
perl -wn -e '
BEGIN {$state="skip";};
/^-+$/           and do {$state="proc"; print; next LINE};
/EQUIL/          and do {$state="copy";};
$state eq "skip" and do {next LINE;};
/^\s*$/          and do {last LINE;};
$state eq "copy" and do {print ; next LINE;};
$state eq "proc" and do { printf "%-10s   %6s  %16s %16s %16s\n", split(" ", $_);};
'
#
exit 0
$ ./form2 | head
I am a bash script
***** EQUILIBRIUM GEOMETRY LOCATED *****
COORDINATES OF ALL ATOMS ARE (ANGS)
ATOM CHARGE X Y Z
------------------------------------------------------------
MOLYBDENUM     42.0      5.9067578125     5.0087332497    17.4699146400
SULFUR         16.0      7.9742837782     3.7588015097    17.3910898169
SULFUR         16.0      5.0973219622     3.0091611327    16.3724427108
SULFUR         16.0      3.8536412225     4.7600928861    18.7261323168
SULFUR         16.0      6.7241053728     5.6252659948    19.6631739883
$

Yes - I would need it as a bash script. Thanks for your help. I will let you know how it goes.

---------- Post updated at 03:40 PM ---------- Previous update was at 03:36 PM ----------

One other thing - where do I put my input file?

For previous scripts, I usually define a variable like $DAT.log for the name of the output file.

Then in the command line I put DAT=name_of_output_file ./shellscript.sh

Is this how it would work with your suggestion?