I am trying to reverse and complement my DNA sequences. The file format is FASTA, something like this:
Now, to reverse the sequence, I should start reading from right to left. At the same should be complemented. Thus, "A" should be read as "T"; "C" should be read as "G"; "T" should be converted into "A"; and "G" must be seen as "C" in the reversed-complemented sequence. From my example, the resulting file should look something like this:
You can use sth like that to reverse and complement dna chain (given as parameter to the script):
#!/bin/bash
# get chain as parameter
dna=$1
function rrev() {
case $1 in
"T")
echo -n A
;;
"A")
echo -n T
;;
"G")
echo -n C
;;
"C")
echo -n G
;;
esac
}
reversed=`echo $dna | rev`
for i in $(seq 0 $((${#dna}-1))); do
rrev ${reversed:$i:1};
done
echo
#!/bin/bash
file=$1
function rrev() {
case $1 in
"T")
echo -n A
;;
"A")
echo -n T
;;
"G")
echo -n C
;;
"C")
echo -n G
;;
esac
}
while read line; do
if [ `echo $line | grep -c "^>"` -eq 0 ]; then
orig_revd=`echo $line | rev`
for i in $(seq 0 $((${#line}-1))); do
rrev ${orig_revd:$i:1};
done
echo
else
echo $line
fi
done < $file
awk 'BEGIN {
j = n = split("A C G T", t)
for (i = 0; ++i <= n;)
map[t] = t[j--]
}
{
if (/^>/) print
else {
for (i = length; i; i--)
printf "%s", map[substr($0, i, 1)]
print x
}
}' infile
That's great! The only part that is missing is the headers (>Sequence ID) for each sequence. How can I modify your perl script so it can include the headers for each and every sequence?
Thanks!