have been trying for a few weeks not to get this program running. I am newer to programming and it has definitely been a challenge. I think my problem arises with my if statement. I can get it to append the name to the new file, but it simply appends the whole sequence to the file rather than counting it. I am working with a fasta file that contains multiple sequences, the name starting with '>' and the sequence on one line below it. Here is my code. Please help, and thank you so much in advance!!
#! /bin/bash
#exit program with error if user does not specify input on command line
if [ $# != 1 ]; then
echo "Please specify fasta input on command line and rerun"
exit
else echo "Begining count"
fi
#collect input from user each time they run the program
input=`cat $1`
#seperate the sequence from the sequence name
name=`grep '>' $1`
sequence=`grep -v '>' $1`
#if name, if sequence
IFS=$'\n'
set -f
for i in $(cat "$1");
do
if [ $i=">" ]; then
echo "$i" >> GCcontent.txt
else
#count number of occurence of motif ATGC in fasta sequence
countG=`echo $i | grep -o "G" | wc -l`
countC=`echo $i | grep -o "C" | wc -l`
total=`echo $i | wc -m`
count=`echo "scale=2" ; ($countG+$countC) | bc`
#calculate percent over total divided by 3bp
percent=`echo "scale=2 ; ($count/$total*100)" | bc`
#print output name and percent to file
echo "$percent" >> GCcontent.txt
fi
done
echo "Exiting"
exit
Edit:
The input file has multiple sequence within it, all with respective titles. they look something like this:
>gi|226451773|gb|FJ846591.1
CATTATAGACTGCGTGGTCCGTATTCCCAAGGAGCAGGGAGTTCTGTCCTTCTGGCGCGGTAACCTGGCCAATGTCATCAGATACTTCCCCACCCAGGCTCTTAACTTCGCCTTCAAAGATAAATACAAGCAGATCTTCCTAGGTGGTGTGGACAAGAGGACCCAGTTTTGGCGCTACTTTGCAGGGAATCTGGCATCAGGTGGTGCCGCAGGGGCCACATCCCTGTGTTTTGTGTACCCTCTTGATTTTGCCCGTACCCGTCTAGCAGCTGATGTGGGTAAAGCTGGAGCTGAAAGGGAATTCCGAGGCCTCGGTGACTGCCTGGTTAAGATCTACAAATCTGATGGGATTAAGGGCCTGTACCAAGGCTTTAACGTGTCTGTGCAGGGTATTATCATCTACCGAGCCGCCTACTTCGGTATCTATGACACTGCAAAGGGTAAGTTTGCTGTGGGCTTTAAAGTTGTGTTCTTAGGAGACAATTTAAAAGAGCGTTGTACCAACCTAACATTCCAAGAGCTAGAGAGTTTTTTTAATTGCTGAAGGAAGCCAAGATCATCCAGTGCGACCCTCATGCACAGATGACATGTTTAGGGGATGTGGGGAAAGGAAGTCAGTAAAACTCTACTTTTTGGTAAAAGCATCTCTTTCCTATTCCCAGGAATGCTTCCGGATCCCAAAAACACTCACATCGTCATCAGCTGGATGATCGCACAGACTGTCACTGCTGTTGCTGGGTTGACTTCCTATCCATTTGA
The output i would like to contain the name and the percent of Gs an Cs (totaled together)
>Name of the file
percent of GCs
My idea for the program was to have the user input the file, then the loop either append the line that contains the title to the file GCcontent.txt or to run through the counter i have set up and append it to the file GCcontent.txt