Nothing was print out.
I noticed the DNA sequence line are 60bp wide, if my i is less than 60, only the header (the lines with ">" as separator) and the first sequence line were printed, but I am expecting all sequence lines under each header. If the i is bigger than 60, nothing was printed. For sure most of the entries in the infile.fasta are longer than 60bp. Did I miss anything?
Thanks a lot!
I *think* this is what you want. awk -v len=200 -f yi.awk myFile
yi.awk:
function doPrint(r,h, n)
{
n=gsub(RS,RS,r)
if (length(r)-n>len)
print h RS r
}
/^>S/ {
if (h) {
doPrint(r,h)
r=""
}
h=$0
next
}
{r=(r)?r RS $0:$0}
END {
doPrint(r,h)
}
The interesting part is the script works with another fasta file in which each sequence line/row has more 120bp wide.
>S94D_ctg_8004 Average coverage: 402.95
ATAATGCCTGTGAATATGACATGTGTTCCTGTTTCTACATCAGACTACTATTCTTGCATATCCCGCATCAGTCACTTATGATAAGGCACCTTTAAGTCGCCACTGTGATATTGGTTGTGT
CTTGCATATGATTTCTAGCATGTACTGCTTCTAATTTGTTGAAACGCCATACAGTTTGAACTTGGCAGAGTGATATTGGTTGCATCTTTCATATGGTGTCTAGCTTGCAGTGCTTCTAAT
TT
>S94D_ctg_8022 Average coverage: 212.74
CGTTGTCTGTGTGACATTTAAGAGTGTTTTGTGCAGTGCAACAAAAGGTCACAATGGCAACGAGGTGAAGTAGATAACCCTGTTTACACAGAGGTGCAAAGGAAACAACTAGTGGTCAAT
AACCCTGTTTACCAGCACCATGTGCATCTACCGATGTACTGGGGTGATGCACAGTCTGTTG
>S94D_ctg_8062 Average coverage: 710.87
ATGAATTTCAGACGAGTTTTGGATTTACTAGAATTTAAAAACCAGGCATCTCAATGTTTTGCCGGCAATCAACGGTGCCCTGGTGTTTGAAATTCATTCCCATTTCTTGCATGGGACCTA
AGCATGCACCCAAGGACA
>S94D_ctg_9034 Average coverage: 37.84
AGATTATTTTGTCTGCCATGTATAATTTTGGTTGATGTTTAGCCTGTTGTGCTTAACATGCTTCTCGACGTACCTACACAGGACAATTTGGGAACGACTGCTGTTTTCCATCGAGGTTAG
TTTCATCCCATGGCTTATATCTGCTCAATGTTCAGGATATCGGTAGCCGGTACCATATAGGCCGGCGGCTGATAGGAGACTAATCGGTGAATCGGACT
I thought the script may be related to the newline, but I did check and could not find any abnormal. Not sure the EOL is really the same of the two files. I will check it. Thank you both!
A program which doesn't do what you want really doesn't help explain what you do want, and you still haven't showed output demonstrating what you actually want. So we're still a bit in the dark, here.
Thanks! All worked like charm!
Can I ask what was wrong with my own script? I thought it should work. Actually it did sometime, yet sometime did not!
By the way, I believe the scripts will be used by a lot of "bioinformaticians" for this routine job with DNA sequences!
Thank you all again!
I couldn't tell what your original script was even trying to do.
getline profoundly doesn't do what what you thought. It reads lines, but doesn't squeeze them together.
Your control logic was also kind of odd. That 'next' guarantees no lines containing > would ever be printed unless you read them manually with getline. And when you didn't see a >, 'getline' ran every single line -- so you were reading double-speed, then throwing half your lines away. Doing a manual getline bypasses your /^>S/ completely, which is the only way any of those lines could have been printed at all.
I can only guess that some of your input had very long lines, and that work only meant "work". Or perhaps you weren't reading the output you thought you were.
Thanks Corona!
That's the part I was afraid of, which can lead to a totally wrong result! I am trying to catch more details about awk and hope use it correctly.
Thank you again!