Error removed from file

Below is a flowchart of a program. Most everything works as expected, but there are a couple of issues that I need some expert help on. The check function was setup initially for a single user input. The input has been modified to allow for multiple inputs, so the code below does not work. My question is if two variants are inputted and an error is found in one of them, can that specific variant that caused an error be removed from the saved file here c:/Users/cmccabe/Desktop/Python27/${id}.txt ? Thank you :).

I have attached a screenshot (there is a file saved in that same directory c:/Users/cmccabe/Desktop/Python27/${id}_name.txt of two variants and the file saved. In the example

 check() {
    printf "\n\n"
	awk 'NR>1 { if ($2 ~ /^\(/ ) {$1=""; print "Found error: ", $0} else { sub(/.*:/, "", $1); sub(/.*:/, "", $7); print "No error: " $1 "," $7}}' C:/Users/cmccabe/Desktop/annovar/${id}_name.txt
	printf "Is the variant correct?  Y/N "; read match_choice
	
    case "$match_choice" in
        [yY]) id="${id}"; position ;; 
        [nN]) id=""; gjb2 ;;  
    esac
}
user choice 
|
v 
menu +> gjb2 -> -> -> -> gjb2name -> -> check -> -> -> -> position -> parse -> add2text -> additional? -> annovar ------+
  +  |     ^                 ^            ^                   +         ^          ^            ^           |         ^
  |  |     |                 |            |                   |         |          |            |           |         |
  |  - input id/variant     python      verify             python  (awk/perl       |           more         |       +--Y/N select---+
  |    add NM_                       +--Y/N select-----------+      conditional    |      +--Y/N select-----+         menu         exit
  |    merge input/save                     clear text                   parse)    |            |                     | 
  |    add2text                                                         |          |            |                     |
  |        |                                                            |          |            |                     |
  |        |                      	                                    |          |            |                     |
  |        |                                                            |          |            |                     |
  |	       |                                                            |          |            |                     |
  |        +<-----------------------------------------------------------+<<--------<+-----------<<+-------------------|                        
  |        |     
1. user menu - (4 choices: gjb2,mecp2,phox2b,exit)
2. gjb2 -(user inputs the id and variant, the unique NM_ is added, the input, variant, and NM_ are combined and saved for variant syntax check, id written to text file in seperate directory)
3. gjb2name - python script to verify variant syntax
4. check - Y/N select (If "Y" then goto [position] function, If "N": then clear variant text from file that error was found 
5. position - python - script to convert input to cordinates
6. parse - working on
7. add2text - create list of all ids
8. additional - Y/N select (If "Y" then goto [menu] function, if "N" then goto [annovar] function
9. annovar - perl script to annotate file
10. end - user prompt with Y/N select (If "Y" then goto [menu] function, if "N" {exit program}                                                                              

When i open the screenshot.doc in AbiWord i see nothing but an empty page.

On the other hand, not sure if i understand you correct, if i do, this might help:

while read id;do check ;done<GJB2.txt

Or in multiline:

while read id
do  check
done<GJB2.txt

hth

The screenshot is also saved as a text file ${id}_name in that same directory If there is an error in the variant $3 of that file will have (variantchecker): and in $1 in that file will match $1 of the ${id}.txt. That is the variant to remove. Is this possible? Thank you :).

 check() {
    printf "\n\n"
	awk 'NR>1 { if ($2 ~ /^\(/ ) {$1=""; print "Found error: ", $0} else { sub(/.*:/, "", $1); sub(/.*:/, "", $7); print "No error: " $1 "," $7}}' C:/Users/cmccabe/Desktop/Python27/${id}_name.txt
	printf "Is the variant correct?  Y/N "; read match_choice
	
    case "$match_choice" in
        [yY]) id="${id}"; position ;; 
        [nN]) id=""; while read id;do check "$id";done<GJB2.txt; gjb2 ;;  
    esac
}

Does the variantchecker give an error exit code? Or: how about grep -v error on your id.txt.

The text "variantchecker" will only be there in that file if a variant has an error in it. Thank you :).

---------- Post updated at 03:39 PM ---------- Previous update was at 02:25 PM ----------

I added a warning to the menu:

If the user says "N" then it goes to the warning function and the variant is inputted again and the process continues as before. When the warning function is called I am trying to do post 3 (old variant removed and replaced with new variant). Thank you :).

 check() {
    printf "\n\n"
	awk 'NR>1 { if ($2 ~ /^\(/ ) {$1=""; print "Found error: ", $0} else { sub(/.*:/, "", $1); sub(/.*:/, "", $7); print "No error: " $1 "," $7}}' C:/Users/cmccabe/Desktop/Python27/${id}_name.txt
	printf "Is the variant correct?  Y/N "; read match_choice
	
    case "$match_choice" in
        [yY]) id="${id}"; position ;; 
        [nN]) awk 'FNR>1 FNR==NR {a[$i]; next}; !($1 in a)' ${id}_name.txt ${id}.txt; warning ;;  
    esac
}

warning() {
    printf "\n\n"
	printf "Please re-enter variant(s): "; IFS="," read -a variants
        
        [ -z "$id" ] && printf "\n No ID supplied. Leaving match function." && sleep 2 && return
        [ "$id" = "end" ] && printf "\n Leaving match function." && sleep 2 && return
		
		for ((i=0; i<${#variants[@]}; i++))
                do printf "NM_004004.5:%s\n" ${variants[$i]} >> c:/Users/cmccabe/Desktop/Python27/$id.txt
		done
	gjb2name
}

---------- Post updated 03-21-15 at 09:55 AM ---------- Previous update was 03-20-15 at 03:39 PM ----------

I made an edit to the awk that is underlined but I dont think its correct. Thank you :).

What do you want to achieve with that awk ?

FNR>1 FNR==NR is not a valid awk conditional expression. If you're trying to select all lines in the 1st input file except for the 1st line in that file, you want something like: FNR>1 && FNR==NR .

The user inputs the variant(s) and those two inputs are checked and the output is written to a file ${id}_name.txt. In that file after the header row is skipped in $3 if the text (variantchecker): is there an error has occurred and in $1 the text will match in the file ${id}.txt. When a match is found it is removed from the file. Thank you :).

So using the attached files from post 1 as an example:

The header of ${id}_name is skipped and there is an error in line 2 (indicated by the (variantchecker): in $3) in the first variant and $1 (NM_004004.5:c.74G>C) matches $1 of ${id).txt, that variant is removed from the ${id}.txt file.

I am trying to achieve this and the awk was my attempt.

Maybe this awk ?

 awk '{if (f==1) { r[$3] } else if (! ($3 in r)) { print $1 } } ' f=1 ${id}_name.txt f=2 $(id}.txt 

Thank you :).

I don't think there's a $3 in the second file.

How about

awk '/variantchecker/ {r[$1]} FNR==NR {next} !($1 in r)' /tmp/GJB2_name.txt /tmp/GJB2.txt
NM_004004.5:c.283G>T

The code below runs, but the variant that had an error in it is still in the file,

 check() {
    printf "\n\n"
	awk 'NR>1 { if ($2 ~ /^\(/ ) {$1=""; print "Found error: ", $0} else { sub(/.*:/, "", $1); sub(/.*:/, "", $7); print "No error: " $1 "," $7}}' C:/Users/cmccabe/Desktop/Python27/${id}_name.txt
	printf "Is the variant correct?  Y/N "; read match_choice
	
    case "$match_choice" in
        [yY]) id="${id}"; additionalg ;; 
        [nN]) cd 'C:/Users/cmccabe/Desktop/Python27/' awk '/variantchecker/ {r[$1]} FNR==NR {next} !($1 in r)' ${id}_name.txt ${id}.txt; gjb2 ;;  
    esac
} 
 
Found error:   (variantchecker): G not found at position 289, found T instead.
No error: c.79G>A,p.(Val27Ile)
Is the variant correct?  Y/N 

Thank you :).

Where's the semicolon between cd and awk ?

Tried with and w/o " " and got the same error but I see the file in the directory it says is not there. Thank you :).

 check() {
    printf "\n\n"
	awk 'NR>1 { if ($2 ~ /^\(/ ) {$1=""; print "Found error: ", $0} else { sub(/.*:/, "", $1); sub(/.*:/, "", $7); print "No error: " $1 "," $7}}' C:/Users/cmccabe/Desktop/Python27/${id}_name.txt
	printf "Is the variant correct?  Y/N "; read match_choice
	
    case "$match_choice" in
        [yY]) id="${id}"; additionalg ;; 
        [nN]) cd 'C:' C:/Users/cmccabe/Desktop/Python27/; awk '/variantchecker/ {r[$1]} FNR==NR {next} !($1 in r)' "${id}"_name.txt "${id}".txt; gjb2 ;;  
    esac
} 
 
Found error:   (variantchecker): G not found at position 289, found T instead.
Is the variant correct?  Y/N n
awk: fatal: cannot open file `gj_name.txt' for reading (No such file or director
y) 

The command runs now, it just needed the path, however both variants are in the file.

For example, in the attached output NM_004004.5:c.74G>A is an error and remains in the file even after the user inputs the correct variant

 
NM_004004.5:c.79G>A 

.

Here is the command, though I'm sure it can be written better (I see some things I will change tomorrow). I.m reading the awk manual as you suggested.

 check() {
    printf "\n\n"
	awk 'NR>1 { if ($2 ~ /^\(/ ) {$1=""; print "Found error: ", $0} else { sub(/.*:/, "", $1); sub(/.*:/, "", $7); print "No error: " $1 "," $7}}' C:/Users/cmccabe/Desktop/Python27/${id}_name.txt
	printf "Is the variant correct?  Y/N "; read match_choice
	
    case "$match_choice" in
        [yY]) id="${id}"; additionalg ;; 
        [nN]) id="$id"; cd 'C:' C:/Users/cmccabe/Desktop/Python27/; awk '/variantchecker/ {r[$1]} FNR==NR {next} !($1 in r)' C:/Users/cmccabe/Desktop/Python27/${id}_name.txt C:/Users/cmccabe/Desktop/Python27/${id}.txt; gjb2 ;;
    esac
} 

Thank you for all your help:).

When I run my proposal on your two attachments, I think it's OK:

awk '/variantchecker/ {r[$1]} FNR==NR {next} !($1 in r)' /tmp/12_name.txt /tmp/12.txt
NM_004004.5:c.79G>A

You don't capture the output in a file; mayhap you're working on the original unaltered file?

1 Like

works perfect.... thank you :).