Reading a log file

raymondg · August 19, 2015, 12:40pm

Hi,

I'm trying to write a script to go through few folders read some log files and make a list from the data.

the log files are in few steps each step having a final energy but I need to read the last final energy coming after the keyword "hurray".

I have so far accomplished a code like this

$MYDIR=$(pwd)
DIRS=`ls -l $MYDIR | egrep '^d' | awk '{print $9}'`

for DIR in $DIRS
do
echo  ${DIR}
cd ${DIR}
optimized=0
while IFS= read -r line; do
  case $line in
    *"HURRAY"*)
      optimized=1;#echo nice
      continue
      ;;
    "FINAL SINGLE POINT ENERGY      "*)
      [ "$optimized" = 1 ] || continue
      #final_energy=`awk '{ print 5 }' $line`
      #echo "Found optimized final energy: $final_energy"
      ;;
    "final energy")
  esac
done <input-s.out

#Read the coordinates
#n=`awk '{ print $0;exit }' job1.xyz`; #echo $n
#awk -v VAR="$n" 'NR>=3 && NR<=2+VAR'  job1.xyz

cd ..
done

the code almost works fine but there are few errors
at first I get an error

line 1: =/Users/ray/Documents/orca/bu: No such file or directory

(this is the current folder)

and second awk is not able to split the variable it's just expecting a file how can I tell it to split a variable

update: here's a link containing an example log file
https://onedrive.live.com/redir?resid=48E6DEE5D6109E04!168165&authkey=!AD5NVvvzCip7FBw&ithint=file%2Cout

Corona688 · August 19, 2015, 12:54pm

The cd error is because you use $DIR without quoting it, which splits it upon spaces into two strings. You should put it in double quotes. But I don't think you need cd at all.

It'd be easier to search for your log files and read from them than finding dirs and cd-ing into them.

Please show a sample of your input data, too. Without that, I'm just wild-guessing. You should either do the whole thing or none of it in awk, using it on a single line is pointlessly wasteful.

# create a file, /tmp/$$ where $$ is this script's process id.
# it will have lines like /path/to/input-s.out final-energy
find . -name 'input-s.out' | while read FILENAME
do
        awk '/HURRAY/ { opt=1 } ; opt && /^FINAL SINGLE POINT ENERGY/ { print FILENAME, $5 }' OFS="\t" "$FILENAME"
done > /tmp/$$

while IFS=$'\t' read FILE ENERGY
do
        # Do whatever you want to do with this energy data
done < /tmp/$$

rm -f /tmp/$$

raymondg · August 19, 2015, 1:41pm

thanks so much Corona688 I updated the post adding an example log file.

Corona688 · August 19, 2015, 1:46pm

Where? I don't see it.

[edit] Oh, you used a dropbox link instead of pasting any text.

Corona688 · August 19, 2015, 1:50pm

Astonishingly, I seem to have guessed it right the first try, the awk bit at least. Does the rest work for you too?

raymondg · August 19, 2015, 3:37pm

ok, there's something here
I try to say it in a logical manner
inside my home folder there are subfolder and inside the subfolder there are log files.
each subfolder name is the formula of a compound and each compound has four log files inside of the folder and all their names start with the compound's formula same as the subfolder.
an example is:
CHSiH3 (subfolder) -> CHSiH3-t.out, CHSiH3-s.out, CHSiH3-CC-s.out, CHSiH3-CC-t.out

so actually by what you said simply by searching from the current directory for each subdirectory name there should be four log files found where all have the .out extension.

I want to classify the energies based on the name of the log files
t-> triplet, s or not t -> singlet, CC -> coupled cluser, if no CC -> DFT

so at the end I can have tabular data like CSV file

Compound  State Method Energy
CHSiH3       triplet  CC      10.3243
CHSiH3       singlet DFT      9.9498
....

Do you think you can help me with cause I feel abit lost here.

Corona688 · August 19, 2015, 4:02pm

So, even though there's four per folder, they can all be processed individually, as long as their name is considered? Or do you need them to be grouped in four?

I am also feeling a bit lost, because knowing there's four files per folder doesn't explain what you want done with them all.

raymondg · August 19, 2015, 4:12pm

I just want to read energies from them and make a table as I showed in the fomer post.
and I can only classify the energies in those tables if I have file names.
and the file names are different they can only be found by their extensions .out.

RudiC · August 19, 2015, 4:28pm

Specifications are difficult thing to write; wild guessing lead me to offer this as a zeroth approximation for the four files in one directory as posted above:

awk '
BEGIN   {print "Compound\tState\tMethod\tEnergy"}
        {n=split(FILENAME, T, "-")
         printf "%s\t%s\t%s\n", T[1], substr(T[n],1,1)=="t"?"triplet":"singlet", T[2]=="CC"?"CC":"DFT"
        }
' *.out
Compound    State    Method    Energy
CHSiH3    singlet    CC
CHSiH3    triplet    CC
CHSiH3    singlet    DFT
CHSiH3    triplet    DFT

By no means I'm in a position to fill in the energy column, as the sample input file you posted has 97 occurrences of the word "energy" in it, and even the "FINAL SINGLE POINT ENERGY" phrase that you try to match in your code snippet comes up with 6 different values in that file.

raymondg · August 19, 2015, 4:37pm

the energy which would needs to be placed there is the "FINAL SINGLE POINT ENERGY" which shows up after the keyword "Hurray" shows up.
I had written a code in mathematica to deal it but shell is so different and I'm mixed up!

RudiC · August 19, 2015, 4:39pm

Try

awk '
BEGIN           {print "Compound\tState\tMethod\tEnergy"}
FNR==1          {n=split(FILENAME, T, "-")
                 printf "%s\t%s\t%s\t", T[1], substr(T[n],1,1)=="t"?"triplet":"singlet", T[2]=="CC"?"CC":"DFT"
                 FOUND=0
                }
/HURRAY/        {FOUND=1
                }
FOUND &&
/^FINAL.*ERGY/  {print $NF
                }
' *.out
Compound    State    Method    Energy
/tmp/input    singlet    DFT    -39.022584378179

And - try to be way more specific, precise, and detailed in your requests to follow!

raymondg · August 20, 2015, 6:22am

rudic:

Try

awk '
BEGIN           {print "Compound\tState\tMethod\tEnergy"}
FNR==1          {n=split(FILENAME, T, "-")
   printf "%s\t%s\t%s\t", T[1], substr(T[n],1,1)=="t"?"triplet":"singlet", T[2]=="CC"?"CC":"DFT"
   FOUND=0
   }
/HURRAY/        {FOUND=1
   }
FOUND &&
/^FINAL.*ERGY/  {print $NF
   }
' *.out
Compound    State    Method    Energy
/tmp/input    singlet    DFT    -39.022584378179

And - try to be way more specific, precise, and detailed in your requests to follow!

Sorry I get a syntax error on the line

printf "%s\t%s\t%s\t", T[1], substr(T[n],1,1)=="t"?"triplet":"singlet", T[2]=="CC"?"CC":"DFT"

awk: syntax error at source line 4
 context is
	                 printf "%s\t%s\t%s\n", T[1], >>>  substr(T[n],1,1)== <<< 
awk: illegal statement at source line 4
awk: illegal statement at source line 4

Corona688 · August 20, 2015, 11:06am

Use nawk on Solaris.

raymondg · August 20, 2015, 11:20am

I'm on a max OS X 10.10

Corona688 · August 20, 2015, 11:34am

Hm.

Try replacing that statement with

printf("%s\t%s\t%s\t", T[1], substr(T[n],1,1)=="t"?"triplet":"singlet", T[2]=="CC"?"CC":"DFT");

raymondg · August 24, 2015, 9:35am

awk '
BEGIN           {print "Compound\tState\tMethod\tEnergy"}
FNR==1          {n=split(FILENAME, T, "-")
                 printf "%s\t%s\t%s\t", T[1], substr(T[n],1,1)=="t"?"triplet":"singlet", T[2]=="CC"?"CC":"DFT"
                 FOUND=0
                }
/HURRAY/        {FOUND=1
                }
FOUND &&
/^FINAL.*ERGY/  {print $NF
                }
' *.out

one last question,

how can I ask the the script to print not converged or error
if the last condition was not met

FOUND &&
/^FINAL.*ERGY/  {print $NF
                }

RudiC · August 24, 2015, 9:46am

Try to add

FOUND &&
/^FINAL.*ERGY/  {print $NF
                 CONV=1
                }
END             {if (!CONV) print "not converged or error"
                }