Improve awk code that has three separate parts

I have a very inefficient awk below that I need some help improving. Basically, there are three parts, that ideally, could be combined into one search and one output file. Thank you :).

Part 1:
Check if the user inputted string contains + or - in it and if it does the input is writting to a file "input" and displayed on screen.

awk '/+/{print $0,"is an intronic variant"}' c:/Users/cmccabe/Desktop/Python27/input.txt
awk '/-/{print $0,"is an intronic variant"}' c:/Users/cmccabe/Desktop/Python27/input.txt 

Part 2:
Looks within that "input" file for any lines with a + or - and if found writes them to a new file (temp+ or temp-)

awk '/+/' c:/Users/cmccabe/Desktop/Python27/input.txt > c:/Users/cmccabe/Desktop/Python27/temp+.txt 
awk '/-/' c:/Users/cmccabe/Desktop/Python27/input.txt > c:/Users/cmccabe/Desktop/Python27/temp-.txt 

Part 3:
Removes the + or - lines from the "input" file.

sed -i '/+/d' C:/Users/cmccabe/Desktop/Python27/input.txt
sed -i '/-/d' C:/Users/cmccabe/Desktop/Python27/input.txt

You say you only want one output file, but your current procedures are producing three output files (plus the text written to standard output). Which one output file do you want? Is it C:/Users/cmccabe/Desktop/Python27/input.txt , c:/Users/cmccabe/Desktop/Python27/temp+.txt , or c:/Users/cmccabe/Desktop/Python27/temp.txt ?

1 Like

Currently, there are three output files because the awk searches for the + and writes it to a the "input" file. Then searches for the - and writes it to "input" file. The sed then outputs a separate file for each... one for + one for - and the original input. Only the original "input" file is needed, but I am not sure how to search for both +/- at the same time. Then both, if found, would result in the "input" file. I hope this helps and thank you :).

No...

If we go back and look at the steps in your 1st post:

This copies lines found in the file input.txt (ignoring the directory part) that contain a + or a - and writes those lines to your terminal (or to whatever file you might redirect the output). Nothing is read from a user inputted string and nothing is written to any file (unless you redirect the output from the above commands to another file).

This can be simplified to:

sed -n '/[-+]/s/$/ is an intronic variant/p' c:/Users/cmccabe/Desktop/Python27/input.txt

if you don't mind having the output from lines containing + and - characters mixed together instead of having all + printed before any lines containing - and if you don't mind just getting one output line if a line in input.txt contains both a + and a - .

Again, this looks at the same input file (not at any user supplied string and not at the output produced by Part 1). It copies lines from that input file (without the text added to the ends of the selected lines by Part 1) that contain + to temp+.txt and lines that contain - to temp-.txt . Part 1 and Part 2 can be combined into the single script:

cd c:/Users/cmccabe/Desktop/Python27
awk '
/[-+]/{	print $0, "is an intronic variant"}
/-/{	print > "temp-.txt"}
/+/{	print > "temp+.txt"}
' input.txt

Yes, that is what this does (assuming that you are using a system that has a sed that includes the -i option and that no errors occur while either of those sed commands are running).

So, if what you want to do is:

  1. Copy lines from input.txt that contain + to temp+.txt ,
  2. copy lines from input.txt that contain - to temp-.txt ,
  3. print lines from input.txt that contain + or - or both to standard output with the added text is an intronic variant , and
  4. remove lines from input.txt that contain a + or a - or both.

you could try just using:

cd c:/Users/cmccabe/Desktop/Python27
tmpf="input$$.txt"
awk -v tmpf="$tmpf" '
/[-+]/{	print $0, "is an intronic variant"}
/-/{	print > "temp-.txt"}
/+/{	print > "temp+.txt"}
!/[-+]/{print > tmpf}
' input.txt && cp "$tmpf" "input.txt" && rm -f "$tmpf"
1 Like

I will try it out today. If the search of the input file contain both the + and - , then another command would remove any lines with a + or - in the input file and copy the + or - lines to a temp file. Thank you very much :).