I have this shell script that I wrote to check an input file to see if it is empty or not, and then clean the file from any line that starts with the sign "<" (without quotation marks" and then spell the number of line of the file, and the empty lines, too. The script then will create two output files, DNA.out and RNA.out.
First I get an error message at that says:
./script.sh: line 3: [: dna_input.txt: integer expression expected
but it gives me the results I want.
Here is the code:
#!/bin/bash
#check to see if there is an input file:
if [ $1 -lt 1 ]
then
echo "Usage: $0 file ..."
exit 1
fi
#Check if the file is empty or not
file=$1
if [[ -s $1 ]]
then
echo ""
echo "**** $file has data."
echo "Number of non-empty lines:"
grep -cve '^\s*$' $file
echo "Mumber of empty lines:"
grep -ce '^\s*$' $file
#grep -cvP '^\s*$' $file -- above line originally was like this one
echo ""
cat $1 | sed 's/>/\n>/g' > temp1.txt
cat temp1.txt | sed '/^>/ d' > temp2.txt
#Remove duplicate empty lines:
awk '!NF{if(++n <=1) print; next}; {n=0; print}' < temp2.txt > DNA.out
echo ""
#convert to mRNA and remove temp1, temp2 to avoid confusion
tr ACGT UGCA < DNA.out > RNA.out
rm temp1.txt temp2.txt
else
echo "**** $1 has no data, or file does not exist."
echo "**** done!"
echo ""
fi;
After I get the two output files (DNA.out and RNA.out) I use another script to convert the contents of these two files into Amino Acids. The conversion script is:
#!/bin/sh
while read rna;do
aawork=$(echo "${rna}" |sed -n -e 's/\(...\)/\1 /gp' | sed -f rna.sed)
echo "$aawork" | sed 's/ //g'
echo "$aawork" | tr ' ' '\012' | sort | sed '/^$/d' | uniq -c | sed 's/[ ]*\([0-9]*\) \(.*\)/\2: \1/'
done
This is how I use it:
./conversion.sh < RNA.out
where rna.sed is:
s/UUU /Phe /g
s/UUC /Phe /g
s/UUA /Leu /g
s/UUG /Leu /g
s/UCU /Ser /g
s/UCC /Ser /g
s/UCA /Ser /g
s/UCG /Ser /g
s/UAU /Tyr /g
s/UAC /Tyr /g
s/UAA /STOP /g
s/UAG /STOP /g
s/UGU /Cys /g
s/UGC /Cys /g
s/UGA /STOP /g
s/UGG /Trp /g
s/CUU /Leu /g
s/CUC /Leu /g
s/CUA /Leu /g
s/CUG /Leu /g
s/CCU /Pro /g
s/CCC /Pro /g
s/CCA /Pro /g
s/CCG /Pro /g
s/CAU /His /g
s/CAC /His /g
s/CAA /Gln /g
s/CAG /Gln /g
s/CGU /Arg /g
s/CGC /Arg /g
s/CGA /Arg /g
s/CGG /Arg /g
s/AUU /Ile /g
s/AUC /Ile /g
s/AUA /Ile /g
s/AUG /Met /g
s/ACU /Thr /g
s/ACC /Thr /g
s/ACA /The /g
s/ACG /Thr /g
s/AAU /Asn /g
s/AAC /Asn /g
s/AAA /Lys /g
s/AAG /Lys /g
s/AGU /Ser /g
s/AGC /Ser /g
s/AGA /Arg /g
s/AGG /Arg /g
s/GUU /Val /g
s/GUC /Val /g
s/GUA /Val /g
s/GUG /Val /g
s/GCU /Ala /g
s/GCC /Ala /g
s/GCA /Ala /g
s/GCG /Ala /g
s/GAU /Asp /g
s/GAC /Asp /g
s/GAA /Glu /g
s/GAG /Glu /g
s/GGU /Gly /g
s/GGC /Gly /g
s/GGA /Gly /g
s/GGG /Gly /g
Now I want to know if I can put these two scripts together in one file and if possible to clean up the script.
My input file (sample) to be used with the first script (script.sh) is here:
>Header_Sequence_1
GTACGACGGAGTGTTATAAGATGGGAAATCGGATACCAGATGAAATTGTGGATCGGTGCAAAA
GTCGGCAGATATCGTTGAAGTCATAGGTGATTATGTTCAATTAAAGAAGCAAGGCCGAAACTAC
TTTGGACTCTGTCCTTTTCATGGAGAAAGCACACCTTCGTTTTCCGTATCGCCCGACAAACAGAT
TTTTCATTGCTTTGGCTGCGGAGCGGGCGGCAATGTTTTCTCTTTTTTAAGGCAGATGGAAGGCT
ATTCTTTTGCCGAGTCGGTTTCTCACCTTGCTGACAAATACCAAATTGATTTTCCAGATGATATAA
CAGTCCATTCCGGAGCCCGGCCAGAG
>Header_Sequence_2
TCTTCTGGAGAACAAAAAATGGCTGAGGCACATGAGCTCCTGAAGAAATTTTACCATCATTTGT
TAATAAATACAAAAGAAGGTCAAGAGGCACTGGATTATCTGCTTTCTAGGGGCTTTACGAAAGA
GCTGATTAATGAATTTCAGATTGGCTATGCTCTTGATTCTTGGGACTTTATCACGAAATTCCTTGT
AAAGAGGGGATTTAGTGAGGCGCAAATGGAAAAAGCGGGTCTCCTGATCAGACGCGAAGACGGAAGCGGATATTTCGACCGCTTCAGAAACC
GTGTCATGTTTCCGATCCATGATCATCACGGGGCTGTTGTTGCTTTCTCAGGCAGGGCTCTTGG
>Header_Sequence_3
CCGCTGTATTCTCAGCCAAGCGGTATAGTCTCCGCTGTATTCTCAGCCCCAGCCGTTCCACTCAG
AGGAACTTTAAAGGATGTTCCTGTTGAGGGCTCATCATCGTCATCGTCATCATCATCATCATCAT
CATCATCATCATCATCAACATCAACCGTCGCACCAGCAAATAAGGCAAGAACTGGAGAAGACGC
AGAAGGCAGTCAAGATTCTAGTGGTACTGAAGCTTCTGGTAGCCAGGGTTCTGAAGAGGAAGG
TAGTGAAGACGATGGCCAAACTAGTGCTGCTTCCCAACCCACTACTCCAGCTCAAAGTGAAGGC
GCAACTACCGAAACCATAGAAGCTACTCCAAAAGAAGAATGCGGCACTTCATTTGTAATGTGGT
TCGGAGAAGGTACCCCAGCTGCGACATTGAAGTGTGGTGCCTACACTATCGTCTATGCACCTAT
AAAAGACCAAACAGATCCCGCACCAAGATATATCTCTGGTGAAGTTACATCTGTAACCTTTGAA
AAGAGTGATAATACAGTTAAAATCAAGGTTAACGGTCAGGATTTCAGCACTCTCTCTGCTAATTC
AAGTAGTCCAACTGAAAATGGCGGATCTGCGGGTCAGGCTTCATCAAGATCAAGAAGATCACT
CTCAGAGGAAACCAGTGAAGCTGCTGCAACCGTCGATTTGTTTGCCTTTACCCTTGATGGTGGT
AAAAGAATTGAAGTGGCTGTACCAAACGTCGAAGATGCATCTAAAAGAGACAAGTACAGTTTG
GTTGCAGACGATAAACCTTTCTATACCGGCGCAAACAGCGGCACTACCAATGGTGTCTACAGGT
TGAATGAGAACGGAGACTTGGTTGATAAGGACAACACAGT
to sum up what I do:
./script.sh input.txt
Which generates: DNA.out and RNA.out (with the error I mentioned), then:
./conversion.sh < RNA.out
where conversion.sh uses rna.sed
I hope I could make my questions clear and I appreciate your help.