Reading a log file

Hi,

I'm trying to write a script to go through few folders read some log files and make a list from the data.

the log files are in few steps each step having a final energy but I need to read the last final energy coming after the keyword "hurray".

I have so far accomplished a code like this

$MYDIR=$(pwd)
DIRS=`ls -l $MYDIR | egrep '^d' | awk '{print $9}'`

for DIR in $DIRS
do
echo  ${DIR}
cd ${DIR}
optimized=0
while IFS= read -r line; do
  case $line in
    *"HURRAY"*)
      optimized=1;#echo nice
      continue
      ;;
    "FINAL SINGLE POINT ENERGY      "*)
      [ "$optimized" = 1 ] || continue
      #final_energy=`awk '{ print 5 }' $line`
      #echo "Found optimized final energy: $final_energy"
      ;;
    "final energy")
  esac
done <input-s.out

#Read the coordinates
#n=`awk '{ print $0;exit }' job1.xyz`; #echo $n
#awk -v VAR="$n" 'NR>=3 && NR<=2+VAR'  job1.xyz

cd ..
done

the code almost works fine but there are few errors
at first I get an error

line 1: =/Users/ray/Documents/orca/bu: No such file or directory

(this is the current folder)

and second awk is not able to split the variable it's just expecting a file how can I tell it to split a variable

update: here's a link containing an example log file
https://onedrive.live.com/redir?resid=48E6DEE5D6109E04!168165&authkey=!AD5NVvvzCip7FBw&ithint=file%2Cout

The cd error is because you use $DIR without quoting it, which splits it upon spaces into two strings. You should put it in double quotes. But I don't think you need cd at all.

It'd be easier to search for your log files and read from them than finding dirs and cd-ing into them.

Please show a sample of your input data, too. Without that, I'm just wild-guessing. You should either do the whole thing or none of it in awk, using it on a single line is pointlessly wasteful.

# create a file, /tmp/$$ where $$ is this script's process id.
# it will have lines like /path/to/input-s.out final-energy
find . -name 'input-s.out' | while read FILENAME
do
        awk '/HURRAY/ { opt=1 } ; opt && /^FINAL SINGLE POINT ENERGY/ { print FILENAME, $5 }' OFS="\t" "$FILENAME"
done > /tmp/$$

while IFS=$'\t' read FILE ENERGY
do
        # Do whatever you want to do with this energy data
done < /tmp/$$

rm -f /tmp/$$
1 Like

thanks so much Corona688 I updated the post adding an example log file.

Where? I don't see it.

[edit] Oh, you used a dropbox link instead of pasting any text.

1 Like

Astonishingly, I seem to have guessed it right the first try, the awk bit at least. Does the rest work for you too?

1 Like

ok, there's something here
I try to say it in a logical manner
inside my home folder there are subfolder and inside the subfolder there are log files.
each subfolder name is the formula of a compound and each compound has four log files inside of the folder and all their names start with the compound's formula same as the subfolder.
an example is:
CHSiH3 (subfolder) -> CHSiH3-t.out, CHSiH3-s.out, CHSiH3-CC-s.out, CHSiH3-CC-t.out

so actually by what you said simply by searching from the current directory for each subdirectory name there should be four log files found where all have the .out extension.

I want to classify the energies based on the name of the log files
t-> triplet, s or not t -> singlet, CC -> coupled cluser, if no CC -> DFT

so at the end I can have tabular data like CSV file

Compound  State Method Energy
CHSiH3       triplet  CC      10.3243
CHSiH3       singlet DFT      9.9498
....

Do you think you can help me with cause I feel abit lost here.

So, even though there's four per folder, they can all be processed individually, as long as their name is considered? Or do you need them to be grouped in four?

I am also feeling a bit lost, because knowing there's four files per folder doesn't explain what you want done with them all.

1 Like

I just want to read energies from them and make a table as I showed in the fomer post.
and I can only classify the energies in those tables if I have file names.
and the file names are different they can only be found by their extensions .out.

Specifications are difficult thing to write; wild guessing lead me to offer this as a zeroth approximation for the four files in one directory as posted above:

awk '
BEGIN   {print "Compound\tState\tMethod\tEnergy"}
        {n=split(FILENAME, T, "-")
         printf "%s\t%s\t%s\n", T[1], substr(T[n],1,1)=="t"?"triplet":"singlet", T[2]=="CC"?"CC":"DFT"
        }
' *.out
Compound    State    Method    Energy
CHSiH3    singlet    CC
CHSiH3    triplet    CC
CHSiH3    singlet    DFT
CHSiH3    triplet    DFT

By no means I'm in a position to fill in the energy column, as the sample input file you posted has 97 occurrences of the word "energy" in it, and even the "FINAL SINGLE POINT ENERGY" phrase that you try to match in your code snippet comes up with 6 different values in that file.

1 Like

the energy which would needs to be placed there is the "FINAL SINGLE POINT ENERGY" which shows up after the keyword "Hurray" shows up.
I had written a code in mathematica to deal it but shell is so different and I'm mixed up!

Try

awk '
BEGIN           {print "Compound\tState\tMethod\tEnergy"}
FNR==1          {n=split(FILENAME, T, "-")
                 printf "%s\t%s\t%s\t", T[1], substr(T[n],1,1)=="t"?"triplet":"singlet", T[2]=="CC"?"CC":"DFT"
                 FOUND=0
                }
/HURRAY/        {FOUND=1
                }
FOUND &&
/^FINAL.*ERGY/  {print $NF
                }
' *.out
Compound    State    Method    Energy
/tmp/input    singlet    DFT    -39.022584378179

And - try to be way more specific, precise, and detailed in your requests to follow!

1 Like

Sorry I get a syntax error on the line

printf "%s\t%s\t%s\t", T[1], substr(T[n],1,1)=="t"?"triplet":"singlet", T[2]=="CC"?"CC":"DFT"
awk: syntax error at source line 4
 context is
	                 printf "%s\t%s\t%s\n", T[1], >>>  substr(T[n],1,1)== <<< 
awk: illegal statement at source line 4
awk: illegal statement at source line 4

Use nawk on Solaris.

1 Like

I'm on a max OS X 10.10

Hm.

Try replacing that statement with

printf("%s\t%s\t%s\t", T[1], substr(T[n],1,1)=="t"?"triplet":"singlet", T[2]=="CC"?"CC":"DFT");
1 Like
awk '
BEGIN           {print "Compound\tState\tMethod\tEnergy"}
FNR==1          {n=split(FILENAME, T, "-")
                 printf "%s\t%s\t%s\t", T[1], substr(T[n],1,1)=="t"?"triplet":"singlet", T[2]=="CC"?"CC":"DFT"
                 FOUND=0
                }
/HURRAY/        {FOUND=1
                }
FOUND &&
/^FINAL.*ERGY/  {print $NF
                }
' *.out

one last question,

how can I ask the the script to print not converged or error
if the last condition was not met

FOUND &&
/^FINAL.*ERGY/  {print $NF
                }

Try to add

FOUND &&
/^FINAL.*ERGY/  {print $NF
                 CONV=1
                }
END             {if (!CONV) print "not converged or error"
                }