The bash below executes and seems to work fine on those files in which . However on those files where there is no additional CNV detected that line repeats multiple times
instead of only once. I tried adding an END as all lines are printed but that doesn't help. I can not seem to solve this without encountering new issues. Thank you :).
for f in /home/cmccabe/Desktop/oca/*.tsv ; do # loop through all files in directory and start processing
echo "Start check for cnv creation: $(date) - file: $f" # log start
bname=`basename $f` # strirp of path
pref=${bname%%_*.tsv} # strip off extension
awk ' # call awk script
# capture CNV gain and loss in 26 CDS genes
NR==FNR { a[$1]; next }
$2=="CNV" {
c=split($12, b, "[,:]")
if (b[2]>=4.0 || (b[2]<=1.0 && b[c]<=1.9 && ($14 in a))) {
if (!wasfound) {
print "Additional CNV Detected:"
wasfound=1
}
print
}
END {
if (!wasfound) { print "No Additional CNV Detected" }
}
}' /home/cmccabe/Desktop/oca/gene FS='\t' $f >> /home/cmccabe/Desktop/oca/${pref}_oca.txt
echo "End check for CNV creation: $(date) - file: $f" # log end
done
file with CNV detected (correct)
5 Expression controls detected
13 NOCALL detected
2178 REF detected
3 ASSAYS_5P_3P absent controls detected
1 ASSAYS_5P_3P NoCall controls detected
No Oncomine Drivers Detected
No Additional Clinvar Detected
No Additional Function Detected
No Additional Fusion Detected
No Additional Hotspots Detected
Additional CNV Detected:
chr1:11184539 CNV 32772 1.0E-10 1p36.22(11184539-11217311)x2.03333 5%:4.52,95%:2.93 MTOR
chr16:68771250 CNV 96180 1.0E-10 16q22.1(68771250-68867430)x1.02222 5%:0.9,95%:1.16 CDH1 esv25425:esv29196:nsv817735:esv2714658:nsv833267:nsv103068:nsv457515:esv2661913
No additional CNV detected repeats
5 Expression controls detected
17 NOCALL detected
2174 REF detected
3 ASSAYS_5P_3P absent controls detected
1 ASSAYS_5P_3P NoCall controls detected
No Oncomine Drivers Detected
No Additional Clinvar Detected
No Additional Function Detected
No Additional Fusion Detected
No Additional Hotspots Detected
No Additional CNV Detected
No Additional CNV Detected
No Additional CNV Detected
No Additional CNV Detected
The output you have shown us might have come from the code you have shown us as an output based on input files you have not shown us or it might be totally unrelated to the code you have shown us. And, we have no way to determine whether it is a product of this code or not.
You definitely have not shown us the output produced by the echo statements in the code you have shown us.
We have no idea what your input files look like.
We have no idea what the names of the input files you are processing look like.
We have no idea what operating system you're using.
We don't know what output you're hoping to get.
Under these conditions, how do you expect us to help you?
each input file is a tsv of 40 columns, the below is an example of multiple lines (I only show 4 columns, as all 40 are the same or close to it and the script does produce the desired output). The problem I am having is that if there and Additional CNV detected as in output 1, then the script works printing Additional CNV detected followed by the line or lines.
If there are No Additional CNV detected as in output 2, that line prints multiple times (presumably 2800 because that is the total lines).
I am using ubuntu 14.04 as my OS.
Post 1 is actual output produces with the complete files, but to keep the post easier to read I only used several lines.
The output is close as is, I just cant seem to solve why No Additional CNV detected repeats. Thank you very much :).
file
chr1:11184539 CNV 5%:5.5,95%:2.68 Name
chr1:11184539 REF
chr1:11184539 SNV A
chr1:11184555 CNV 5%:0.9,95%:1.9 BRCA1
chr1:11184539 FUSION
chr1:11184539 INDEL G
chr1:11184555 CNV 5%:2.5,95%:2.68 Name2
chr1:11184555 CNV 5%:1.1,95%:1.8 BRCA2
Yes, the code you have shown us is easy to understand. And, with or without sample input files, we can easily say that most of the output you have shown did not come from the code you have shown us.
We have no reason not to believe that the code you have not shown us is what is producing the extra output that you don't want.
I asked what files were being processed. You didn't answer. For all we know, there are hundreds of files being processed by your loop with many of them adding a line to the output you say you don't want.
I asked for sample input files and you showed us a sample with at most 4 input fields that is being fed into code you showed us that is evaluating data found in fields 12 and 14.
You have made it very clear that you want us to explain why code you won't show us won't work with data you won't show us using filenames you won't show us. I wish you luck, but I can't help you under these conditions.
I am sorry I did not understand what you were asking fully until now.
I am only testing on two tsv files that get converted that get processed by the loop and the output is 2 text files.
I will try again tomorrow. I apologize I can only post samples input as the file is not fully usable. i work in heathcare and am somewhat limited. That being said I do not mean to frustrate or be difficult. My posts are not always as clear as they should be but I try to include important pieces. Thank you :).
Don't apologize, just work with us. Create a stripped-down sample file and stripped-down code file that still show the same problem. Until then, good luck. Without that we can't help you.
(And if it doesn't show the same problem? That's a giant clue to whatever the problem was.)