Hi all,
So I have a script that reads a file called FILEA.txt and in that file there are several columns. The ones that are most important are the $name $start and $stop. So currently the script takes values between the start and stop (inside) by using a program called fastamd. But what I want it to do is take values flanking the start and stop (outside). So lets say I want to take the first 200 values flanking the start and stop. What modifications do I have to make to my current script. Right now I made it very detailed to show what I've done.
Here it is:
use strict;
use warnings;
### open the FILEA
(open my $FILEA,"<","FILEA.txt") || die "could not open FILEA file";
### open an output file
(open my $OUT,">","FILEB.txt") || die "could not open output file";
### go through the file, line by line
while(<$FILEA>)
{
chomp; ### to get rid of carriage returns at the end of each line
#### contents of each line are stored in a variable called $_ , which does not need to be indicated for many of the function calls below
next if(/^#/); #### skips any lines that being with #
my @f=split /\t/; #### splits the line on the tab characters, storing each part in an array called @f with n elements numbered 0 to n-1
my $chrom=$f[0]; #### Type is the 0th element
my $start=$f[3]; #### start is the 3rd element
my $stop=$f[4]; #### stop is the 4th element
my $strand=$f[6]; #### strand is the 6th element
my $name=$f[8]; #### names
print "$name\n"; ### print the name to the terminal window (STDOUT)
#### will use the program "fastacmd" to extract values from the database "NMR_DATA" stored in a subdirectory here
#### the program fastacmd should be in the same directory as this script
### fastacmd refers to strand as either 1 or 2 (+ or -).
### this line checks if strand is "+". if it is then strandnum is set to 1. if it is not then strandnum is set to 2
my $strandnum=($strand eq "+") ? 1 : 2;
#### prepare the command
#### arguments to fastacmd
#### -d database
#### -p Type, F=nucleotide
#### -s search string in header, here we use the chromosome in the form chrX
#### -L start,stop positions
#### -S strandnum
my $command="fastacmd.exe -d NMR_DATA/NMR_DATA -p F -s $chrom -L $start,$stop -S $strandnum";
### remove the comment character from this line and it will show the command that is being run in the terminal window
print "$command\n";
### run the command and capture the results, which should be a fasta sequence record
my $seqrec=`$command`;
my @seqreclines=split /\n/,$seqrec; #### split the sequence record into lines, store in array @seqreclines
my $defline=shift @seqreclines; #### the 0th element is the defline, remove it using shift and capture in a variable $defline
my $seq = join("",@seqreclines); #### the remaining elements are sequence, join them all together into a single line
#### print out to the file
print $OUT "$name\t$chrom\t$start\t$stop\t$strand\t$defline\t$seq\n";
}