I have a little problem with the following perl script that is supposed to remove gaps (dashes) from a sequence alignment:
perl -nla -F"" -e 'if (!/^>/){$n++;for ($i=0;$i<=$#F;$i++){$a{$i}{$F[$i]}++}}END{for ($i=0;$i<=$#F;$i++){if ($a{$i}{"-"}/$n>0.5){print $i}}print "-1"}' infile | awk -vFS="" -vOFS="" 'NR==FNR{a[$0+1]++}{for (i=1;i<=NF;i++) if (i in a) $i=""}FNR!=NR' - infile > outfile
The thing is that when the gap is at the beggining of the sequence, the script will remove it along with the character in the sequence ID, example:
Unfortunately, that's enough to mess up the FASTA format which completely stops me from doing any further analysis.
Any help will be greatly appreciated!