I would like to determine the maximum and minimum bp for each chromosome for each sample, and then output the distance between the min and max bp for each chromosome for each sample.
As I am super new to all this, and I have noone to help me, I was wondering if someone here could help me out?
I have so far written this:
for sample in `listofsamples.txt`
do
awk 'chr=$2; for (chr=1; chr<=25; chr++) {
NR==1 { min=$3; max=$3; length=0; next }
{ max < $3 {max=$3} min > $3 {min=$3} }
END { roh=(max-min)/1000000; print "sample", "chr", "min", "max", "length"; print $sample, chr, min, max, length }}' awktestfile.txt
done
And I keep getting the thi following syntax error message
'/file: line 2: syntax error near unexpected token `do
'/file: line 2: `do
I have no idea what it means - please help? Any advice would be greatly appreciated.
Thanks Don, I have a single data file which simply lists all base pairs for all chromosomes for the samples one under the other, so the file is quite large (there are over 3000 samples). Ideally I would like to output for each sample, the max and min bp for each chromosome eg.
Thanks to everyone who helped me out in the past with this problem, but I seem to have found another error - not in the script kindly provided but in hindsight I should've thought this through better. My apologies to all for re-opening this thread, please let me know if I need to open a nee thread?
I did not think that my regions of interest might not be in one block, but actually broken up, an example is shown below: