Bash repeating lines for some files but not all

The bash below executes and seems to work fine on those files in which . However on those files where there is no additional CNV detected that line repeats multiple times
instead of only once. I tried adding an END as all lines are printed but that doesn't help. I can not seem to solve this without encountering new issues. Thank you :).

for f in /home/cmccabe/Desktop/oca/*.tsv ; do # loop through all files in directory and start processing
     echo "Start check for cnv creation: $(date) - file: $f" # log start
     bname=`basename $f` # strirp of path
     pref=${bname%%_*.tsv} # strip off extension
     awk ' # call awk script
             # capture CNV gain and loss in 26 CDS genes
             NR==FNR { a[$1]; next }
               $2=="CNV" {
                 c=split($12, b, "[,:]")
               if (b[2]>=4.0 || (b[2]<=1.0 && b[c]<=1.9 && ($14 in a))) {
                  if (!wasfound) {
      print "Additional CNV Detected:"
      wasfound=1
    }
    print
  }
   END {
    if (!wasfound) { print "No Additional CNV Detected" }
    }
  }' /home/cmccabe/Desktop/oca/gene FS='\t' $f >> /home/cmccabe/Desktop/oca/${pref}_oca.txt
     echo "End check for CNV creation: $(date) - file: $f" # log end
done 

file with CNV detected (correct)

5 Expression controls detected
13 NOCALL detected
2178 REF detected
3 ASSAYS_5P_3P absent controls detected
1 ASSAYS_5P_3P NoCall controls detected
No Oncomine Drivers Detected
No Additional Clinvar Detected
No Additional Function Detected
No Additional Fusion Detected
No Additional Hotspots Detected
Additional CNV Detected:
chr1:11184539	CNV		32772		1.0E-10					1p36.22(11184539-11217311)x2.03333	5%:4.52,95%:2.93		MTOR																						
chr16:68771250	CNV		96180		1.0E-10					16q22.1(68771250-68867430)x1.02222	5%:0.9,95%:1.16		CDH1																esv25425:esv29196:nsv817735:esv2714658:nsv833267:nsv103068:nsv457515:esv2661913

No additional CNV detected repeats

5 Expression controls detected
17 NOCALL detected
2174 REF detected
3 ASSAYS_5P_3P absent controls detected
1 ASSAYS_5P_3P NoCall controls detected
No Oncomine Drivers Detected
No Additional Clinvar Detected
No Additional Function Detected
No Additional Fusion Detected
No Additional Hotspots Detected
No Additional CNV Detected
No Additional CNV Detected
No Additional CNV Detected
No Additional CNV Detected
  1. The output you have shown us might have come from the code you have shown us as an output based on input files you have not shown us or it might be totally unrelated to the code you have shown us. And, we have no way to determine whether it is a product of this code or not.
  2. You definitely have not shown us the output produced by the echo statements in the code you have shown us.
  3. We have no idea what your input files look like.
  4. We have no idea what the names of the input files you are processing look like.
  5. We have no idea what operating system you're using.
  6. We don't know what output you're hoping to get.

Under these conditions, how do you expect us to help you?

1 Like

I apologize and hopefully the below will help:

each input file is a tsv of 40 columns, the below is an example of multiple lines (I only show 4 columns, as all 40 are the same or close to it and the script does produce the desired output). The problem I am having is that if there and Additional CNV detected as in output 1, then the script works printing Additional CNV detected followed by the line or lines.
If there are No Additional CNV detected as in output 2, that line prints multiple times (presumably 2800 because that is the total lines).
I am using ubuntu 14.04 as my OS.

Post 1 is actual output produces with the complete files, but to keep the post easier to read I only used several lines.

The output is close as is, I just cant seem to solve why No Additional CNV detected repeats. Thank you very much :).

file

chr1:11184539	CNV	5%:5.5,95%:2.68	Name
chr1:11184539	REF
chr1:11184539	SNV		A
chr1:11184555	CNV	5%:0.9,95%:1.9	BRCA1
chr1:11184539	FUSION
chr1:11184539	INDEL	G
chr1:11184555	CNV	5%:2.5,95%:2.68	Name2
chr1:11184555	CNV	5%:1.1,95%:1.8	BRCA2

desired output 1 ---- if detected

Additional CNV detected:
chr1:11184539	CNV	5%:5.5,95%:2.68	Name
chr1:11184555	CNV	5%:0.9,95%:1.9	BRCA1

desired output 2 --- if not detected

No Additional CNV detected

Yes, the code you have shown us is easy to understand. And, with or without sample input files, we can easily say that most of the output you have shown did not come from the code you have shown us.

We have no reason not to believe that the code you have not shown us is what is producing the extra output that you don't want.

I asked what files were being processed. You didn't answer. For all we know, there are hundreds of files being processed by your loop with many of them adding a line to the output you say you don't want.

I asked for sample input files and you showed us a sample with at most 4 input fields that is being fed into code you showed us that is evaluating data found in fields 12 and 14.

You have made it very clear that you want us to explain why code you won't show us won't work with data you won't show us using filenames you won't show us. I wish you luck, but I can't help you under these conditions.

1 Like

I am sorry I did not understand what you were asking fully until now.

I am only testing on two tsv files that get converted that get processed by the loop and the output is 2 text files.

I will try again tomorrow. I apologize I can only post samples input as the file is not fully usable. i work in heathcare and am somewhat limited. That being said I do not mean to frustrate or be difficult. My posts are not always as clear as they should be but I try to include important pieces. Thank you :).

Don't apologize, just work with us. Create a stripped-down sample file and stripped-down code file that still show the same problem. Until then, good luck. Without that we can't help you.

(And if it doesn't show the same problem? That's a giant clue to whatever the problem was.)