Shell Scripting

shell scripting
From the given input file i want the output (shown below) which is the values of statements starting with Haripin loop and Muti-loop in each structure with its dG no. In input file they are shown in bold case. (original input file is a big one consists of n structures with n statements in this form only)HELP FOR SHELL SCRIPTING IS APPRECIATED. THANKS IN ADVANCE.

General format of output:

Initial dG no.
Haripin loop:
Value1 - Value2
Value1 - Value2
---- ---- --------
Multi-loop
Value1 - Value2

Initial dG no.
Haripin loop:
Value1 - Value2
Value1 - Value2
Multi-loop
Value1 - Value2

Output

-25.40

Hairpin loop: 
102-110
65-71
Multi-loop: 
37 - 115

-25.10
Hairpin loop: 
79-85
Multi-loop: 
37 - 93

Input file

Structure 1
798 mmu-miR 879 
Initial dG = -25.40

External loop: ddG = -0.40 12 ss bases & 2 closing helices.
Helix: ddG = -4.20 4 base pairs.
Multi-loop: ddG = -0.70 External closing pair is C( 37)-G( 115)
Stack: ddG = -2.40 External closing pair is G( 101)-C( 111)
Helix: ddG = -6.90 4 base pairs.
Hairpin loop: ddG = 5.80 Closing pair is A( 102)-U( 110)
Interior loop: ddG = 1.10 External closing pair is A( 61)-U( 75)
Helix: ddG = -3.10 3 base pairs.
Hairpin loop: ddG = 5.00 Closing pair is U( 65)-A( 71)
Stack: ddG = -0.60 External closing pair is U( 11)-G( 18)

Structure 2

798 mmu-miR-879 
Initial dG = -25.10

Interior loop: ddG = 0.90 External closing pair is C( 29)-G( 101)
Multi-loop: ddG = 1.90 External closing pair is C( 37)-G( 93)
Stack: ddG = -2.10 External closing pair is C( 78)-G( 86)
Helix: ddG = -10.30 6 base pairs.
Hairpin loop: ddG = 5.30 Closing pair is U( 79)-A( 85)
Stack: ddG = -3.40 External closing pair is G( 41)-C( 73)

Something approaching:
#!/bin/bash
PrintVal() { C=${C#(}; C=${C%)}; echo $C | sed 's/).*( /-/'; }
INFILE= # Put here the name of your file
while IFS=':=' read A B C
do
case "$A" in
"Initial dG ")
echo -e "\n\n$B\n"
Hairpins=0
;;
"Hairpin loop")
((Hairpins==0)) && echo "$A:"
PrintVal
((Hairpins++))
;;
"Multi-loop")
echo "$A:"
PrintVal
;;
esac
done <$INFILE

Result:

 -25.40

Multi-loop:
37-115
Hairpin loop:
102-110
65-71


 -25.10

Multi-loop:
37-93
Hairpin loop:
79-85

In line3 of the program I placed the file name as

INFILE=  sai \(here sai is my input file\)

and run the shell program with 'sh t1prog'

it is giving error as
line 21: $INFILE: ambiguous redirect

try with ()

done <($INFILE)

... or back tick ``

I modified as u suggested but it giving error like this

t2prog: line 21: syntax error near unexpected token `('
t2prog: line 21: `      done <($INFILE)'

Modified program is given here

#!/bin/bash
PrintVal()  { C=${C#*(}; C=${C%)*}; echo $C | sed 's/).*( /-/'; }
        INFILE=  sai
        while IFS=':=' read A B C
        do
           case "$A" in
              "Initial dG ")
                 echo -e "\n\n$B\n"
                 Hairpins=0
              ;;
              "Hairpin loop")
                 ((Hairpins==0)) && echo "$A:"
                 PrintVal
                 ((Hairpins++))
              ;;
              "Multi-loop")
                 echo "$A:"
                 PrintVal
              ;;
           esac
        done <($INFILE)

---------- Post updated at 04:48 PM ---------- Previous update was at 03:55 PM ----------

The code is working (given below) and executed with original data. A small correction is required. In origianl data file the dG number statement will be generated like this

b Initial dG = -25.10 (i.e. one space from the beginning of the statement; so it is not coming on the output. The input cant change because it is automatically generated. In program make modification to print dG number if it is starting after one space from the beginning. If this statement is from the beginning of the line, the dG value is printing)
In the output, after printing each sequence '-e' is printing. that has to be removed (shown below)

The modified code is like this
--------------------------------

#!/bin/bash
PrintVal() { C=${C#*(}; C=${C%)*}; echo $C | sed 's/).*( /-/'; }
while IFS=':=' read A B C
do
case "$A" in
"Initial dG ")
echo -e "\n\n$B\n"
Hairpins=0
;;
"Hairpin loop")
((Hairpins==0)) && echo "$A:"
PrintVal
((Hairpins++))
;;
"Multi-loop")
echo "$A:"
PrintVal
;;
esac
done < sai

----------------------------------------
Output after run the program (original data output partial)

Multi-loop:
37-115
Hairpin loop:
102-110
65-71
12-17
-e

what can be tried is to remove all spaces in the A and check for string without spaces with following changes :

Remove the quotes around $A:

line 05: case "$A" in to be replaced by case $A in

and in the 3 case statements:

line 06: "Initial dG ") to be replaced by Initial*)

line 10: "Hairpin loop") to be replaced by Hairpin*)

line 15: "Multi-loop") to be replaced by Multi-loop)

Don't forget to put the * in the first two ones

This means the output of program $INFILE is presented as a file. Using backticks instead means the output of program $INFILE is presented as a variable.

Correct would be:

done < "$INFILE"

---------- Post updated at 01:42 ---------- Previous update was at 01:39 ----------

You have a space between INFILE= and sai. There should be no space there.

Oops... true !
I always mix those <() $() "" and `` just let's try them all...
One will work LoL 8)

awk '
function s(a) {gsub(/[a-zA-Z()]/,"",a);return a}
/^Initial dG/ {init=$NF;t=1} 
/^Multi-loop/ { m=(m=="")?X s($(NF-1)$NF):m RS s($(NF-1)$NF)}
/^Hairpin loop/ {h=(h=="")?X s($(NF-1)$NF):h RS s($(NF-1)$NF)}
/^Structure/&&t==1 {printf "%s\nHairpin loop:\n%s\nMulti-loop:\n%s\n\n",init,h,m ; init=m=h="";t=0}
END {printf "%s\nHairpin loop:\n%s\nMulti-loop:\n%s\n",init,h,m }
' infile

-25.40
Hairpin loop:
102-110
65-71
Multi-loop:
37-115

-25.10
Hairpin loop:
79-85
Multi-loop:
37-93

I made corrections in the program as you suggested. It giving the output like this; It is not printing the dg number for all structures and Hairpin loop heading from the second structure.

Output given from the corrections:

Multi-loop:
37-115
Hairpin loop:
102-110
65-71
Multi-loop:
37-93
79-85

Actual output :

-25.40

Multi-loop:
37-115
Hairpin loop:
102-110
65-71

-25.10

Multi-loop:
37-93
Hairpin loop:
79-85

I saved the code, and this one is working by me (with your sample)

#!/bin/bash
PrintVal() { C=${C#(}; C=${C%)}; echo $C | sed 's/).( /-/'; }
while IFS=':=' read A B C
do
case $A in
Initial
)
echo -e "\n\n$B\n"
Hairpins=0
;;
Hairpin*)
((Hairpins==0)) && echo "$A:"
PrintVal
((Hairpins++))
;;
Multi-loop)
echo "$A:"
PrintVal
;;
esac
done <$INFILE

Gives the output:

 -25.40

Multi-loop:
37-115
Hairpin loop:
102-110
65-71


 -25.10

Multi-loop:
37-93
Hairpin loop:
79-85

I made corrections in the program as you suggested. It giving the output like this; It is not printing the dg number for all structures and Hairpin loop heading from the second structure.

Output given from the corrections:

Multi-loop:
37-115
Hairpin loop:
102-110
65-71
Multi-loop:
37-93
79-85

Actual output :

-25.40

Multi-loop:
37-115
Hairpin loop:
102-110
65-71

-25.10

Multi-loop:
37-93
Hairpin loop:
79-85

What's the output of this:
#!/bin/bash
while IFS=':=' read A B C
do
echo "A='$A' B='$B' C='$C'"
done <$INFILE

---------- Post updated at 23:06 ---------- Previous update was at 21:52 ----------

Problem with leading and trailing tabs and spaces, to get rid of them, replace in line 5.

case $A in

by

case $(echo $A) in

I made modification as 'case $(echo $A) in' in script. Output is ok except '-e'. -e has to be removed. It is coming before every dG number i.e., -25.40, -25.10, etc. How it is coming I am in confusion. Thanks in advance.

-e 
-25.40
Multi-loop:
37-115
Hairpin loop:
102-110
65-71
12-17
-e 
-25.10
Multi-loop:
37-93
Hairpin loop:
79-85
57-62
-e 
-24.80
Multi-loop:
37-115
Hairpin loop:
102-110
65-71
5-10

Try this Posix version, which should be insensitive to leading spaces:
INFILE=infile # Put the name of your file here
while read line
do
case $line in
"Initial dG")
printf "\n${line##
}\n\n"
hairpins=false
;;
"Hairpin loop")
if ! $hairpins; then
printf "Hairpin loop: \n"
fi
( IFS="()"; set -- $line ; echo "${2# } ${4# }" )
hairpins=true
;;
"Multi-loop"
)
printf "Multi-loop: \n"
( IFS="()"; set -- $line ; echo "${2# } ${4# }" )
;;
esac
done <"$INFILE"