"How to get an exact string from a txt file?"

I have many Gaussian output files, which contain a string start from "HF=" but follws the different values. I'm trying to get this exact string from these txt files.

example 1,

 2.524075,-0.563322,-1.285286\H,0,-2.544438,-0.678834,1.199166\H,0,2.18
 5962,-1.978001,0.018499\\Version=EM64T-G03RevE.01\State=5-A\HF=-1277.1
 557592\S2=6.033269\S2-1=0.\S2A=6.000179\RMSD=8.037e-05\Thermal=0.\Dipo
 le=-0.1643425,0.9094768,0.0321427\PG=C01 [X(C4H9Cr1O1)]\\@

example 2,

 6256\H,0,-2.980236,0.00009,-0.45647\\Version=EM64T-G03RevE.01\State=5-
 A\HF=-1198.5241253\S2=6.077753\S2-1=0.\S2A=6.000457\RMSD=4.977e-05\The
 rmal=0.\Dipole=0.8534315,0.002042,-0.5745813\PG=C01 [X(C2H5Cr1O1)]\\@

how to get "HF=-1277.1557592"( this value in one line or two lines) and put this value into a new file?

If you have a good suggestion, please help me!
Thanks!

If "example 1" and "example 2" are examples of a single lines (i.e. they do not span multiple lines in your text file), then here's a way to do it with Perl:

$
$ cat -n f1
     1  2.524075,-0.563322,-1.285286\H,0,-2.544438,-0.678834,1.199166\H,0,2.185962,-1.978001,0.018499\\Version=EM64T-G03RevE.01\State=5-A\HF=-1277.1557592\S2=6.033269\S2-1=0.\S2A=6.000179\RMSD=8.037e-05\Thermal=0.\Dipole=-0.1643425,0.9094768,0.0321427\PG=C01 [X(C4H9Cr1O1)]\\@
     2  6256\H,0,-2.980236,0.00009,-0.45647\\Version=EM64T-G03RevE.01\State=5-A\HF=-1198.5241253\S2=6.077753\S2-1=0.\S2A=6.000457\RMSD=4.977e-05\Thermal=0.\Dipole=0.8534315,0.002042,-0.5745813\PG=C01 [X(C2H5Cr1O1)]\\@
$
$ perl -lne '/.*(HF=[^\\]+)\\.*/ and print $1' f1
HF=-1277.1557592
HF=-1198.5241253
$
$

You can redirect the output to a new file using the shell's redirection operator.

tyler_durden

##
Sorry, did not read your post carefully.
For the ones that span multiple lines, here's how you can do it with Perl:

$
$ cat -n f2
     1  example 1,
     2  2.524075,-0.563322,-1.285286\H,0,-2.544438,-0.678834,1.199166\H,0,2.18
     3  5962,-1.978001,0.018499\\Version=EM64T-G03RevE.01\State=5-A\HF=-1277.1
     4  557592\S2=6.033269\S2-1=0.\S2A=6.000179\RMSD=8.037e-05\Thermal=0.\Dipo
     5  le=-0.1643425,0.9094768,0.0321427\PG=C01 [X(C4H9Cr1O1)]\\@
     6  example 2,
     7  6256\H,0,-2.980236,0.00009,-0.45647\\Version=EM64T-G03RevE.01\State=5-
     8  A\HF=-1198.5241253\S2=6.077753\S2-1=0.\S2A=6.000457\RMSD=4.977e-05\The
     9  rmal=0.\Dipole=0.8534315,0.002042,-0.5745813\PG=C01 [X(C2H5Cr1O1)]\\@
$
$ perl -lne 'BEGIN {undef $/} while(/.*(HF=[^\\]+)\\.*/mg){$x=$1; $x=~s/\n//g; print $x}' f2
HF=-1277.1557592
HF=-1198.5241253
$

tyler_durden

This code works very well. Thanks!
the output file like,

E201GECP.log
HF=-1270.9 206497
E202GECP.log
HF=-1270.91 25011
E301GECP.log
HF=-1349.1118043
E302GECP.log
HF=-1349.1068494
E313GECP.log
HF=-1349.1072849
E401GECP.log
HF=-1427.2934015
E407-2.log
HF=-1427.2920096

the only problem is that there is a space if the original number showed in two lines. how can i fix this problem??
thanks !

bash.

#!/bin/bash
# if your example is one long single line only
var=$(<"file")
IFS='\'
set -- $var
for i in ${var[@]}
do 
    case $i in HF* ) echo $i;; esac
done

No, there are in different lines. some values stored in one single line, but some others stored in two lines ( from the end of one line to the beginning of the next line).

If there is a space in your output file then it means there is a space in the HF value when it spans multiple lines.

The HF values in my test file were like so:

$
$ cat f2
example 1,
2.524075,-0.563322,-1.285286\H,0,-2.544438,-0.678834,1.199166\H,0,2.18
5962,-1.978001,0.018499\\Version=EM64T-G03RevE.01\State=5-A\HF=-1277.1
557592\S2=6.033269\S2-1=0.\S2A=6.000179\RMSD=8.037e-05\Thermal=0.\Dipo
le=-0.1643425,0.9094768,0.0321427\PG=C01 [X(C4H9Cr1O1)]\\@
example 2,
6256\H,0,-2.980236,0.00009,-0.45647\\Version=EM64T-G03RevE.01\State=5-
A\HF=-1198.5241253\S2=6.077753\S2-1=0.\S2A=6.000457\RMSD=4.977e-05\The
rmal=0.\Dipole=0.8534315,0.002042,-0.5745813\PG=C01 [X(C2H5Cr1O1)]\\@
$
$
$ # check HF values
$ perl -lne '/HF/ and print "==>|",$_,"|<=="' f2
==>|5962,-1.978001,0.018499\\Version=EM64T-G03RevE.01\State=5-A\HF=-1277.1|<==
==>|A\HF=-1198.5241253\S2=6.077753\S2-1=0.\S2A=6.000457\RMSD=4.977e-05\The|<==
$
$

And your input file probably has this:

$
$ perl -lne '/HF/ and print "==>|",$_,"|<=="' f2
==>|5962,-1.978001,0.018499\\Version=EM64T-G03RevE.01\State=5-A\HF=-1277.1 |<==
==>|A\HF=-1198.5241253\S2=6.077753\S2-1=0.\S2A=6.000457\RMSD=4.977e-05\The|<==
$
$

Notice how the Perl script, posted earlier, spews incorrect output for such a case:

$
$ perl -lne 'BEGIN {undef $/} while(/.*(HF=[^\\]+)\\.*/mg){$x=$1; $x=~s/\n//g; print $x}' f2
HF=-1277.1 557592
HF=-1198.5241253
$

That's because I am removing the newline but not the space. The following Perl script assumes that there could be one or more spaces in the HF value, and removes them.

$
$ # check HF value
$ perl -lne '/HF/ and print "==>|",$_,"|<=="' f2
==>|5962,-1.978001,0.018499\\Version=EM64T-G03RevE.01\State=5-A\HF=-1277.1     |<==
==>|A\HF=-1198.5241253\S2=6.077753\S2-1=0.\S2A=6.000457\RMSD=4.977e-05\The|<==
$
$ # modified Perl script
$ perl -lne 'BEGIN {undef $/} while(/.*(HF=[^\\]+)\\.*/mg){$x=$1; $x=~s/[\n ]//g; print $x}' f2
HF=-1277.1557592
HF=-1198.5241253
$
$

If you think there could be tabs as well, then add "\t" within those square brackets in the s/// operator. Or you could also add "\s" to take care of all whitespaces.

HTH,
tyler_durden

grep -ho 'HF=[^\]*' *.txt > HF.all
local $/="@\n";
open FH,"<a.txt";
while(<FH>){
  if(/.*\\(HF=[^=]*)\\/){
    my $str=$1;
    $str=~s/\n//;
    print $str,"\n";
  }
}