Perl command in bash

cmccabe · March 27, 2015, 5:24pm

I have the below perl script that runs (I think):

 'C:\Users\cmccabe\Desktop\annovar\matrix.pl' < "${id}".txt.hg19_multianno.txt > "L:\NGS\3_BUSINESS\Matrix\Torrent\matrix_${id}.txt"

The problem is I get an error that the ${id}.txt does not exist. I have tried "${id}.txt as well. There is a file (attached) with the id in it as well but can sed or awk be used to extract this id from the file and put it in the perl command?

For example, snp.txt is the first line of target.txt - can that value be extracted and substituted for the ${id).txt in the perl . Also, if there are multiple lines can it loop through one at a time? Thank you :).

Corona688 · March 27, 2015, 5:32pm

I suspect that, when it says file not found, it means what it says, file not found.

Check for things like typos in the filename, unwanted spaces or carriage returns in the id variable, etc.

agent.kgb · March 27, 2015, 5:41pm

it seems that you run perl in windows, not in unix or linux. what is the posix environment you use? which shell?

vgersh99 · March 27, 2015, 6:12pm

I suspect it might be Cygwin, but let's hear it from the OP.

cmccabe · March 28, 2015, 9:00am

Yes, the posix environment is cygwin and I am using

 bash

to run the perl script. Thank you :).

derekludwig · March 28, 2015, 9:08am

And what is the value of $id ?

RudiC · March 28, 2015, 9:13am

Why don't you concatenate all your threads (e.g. this) about that one topic into one?

cmccabe · March 28, 2015, 12:28pm

I use a perl script to add specific text to specific columns in a text file. A new file gets created in the location listed below but it is empty. So I am not sure if the script itself is incorrect or if the input file is not being used. Below is how I call the perl script:

I provide the full path to the .pl , the input and the output

 perl 'C:\Users\cmccabe\Desktop\annovar\matrix.pl' < "${id}".txt.hg19_multianno.txt > "L:\NGS\3_BUSINESS\Matrix\Torrent\matrix_"${id}".txt"

The overall goal is use ${id}.txt.hg19_multianno.txt as the input file and after the perl script is run on that file (which adds the columns and text) and the new file is saved to the path L:\NGS\3_BUSINESS\Matrix\Torrent\matrix_${id}.txt . That path is a static path and will never change, there is probably a better way but I am not expert enough to know. I attached the .pl as well. Thank you

durden_tyler · March 28, 2015, 9:06pm

Your Perl program is broken. The first 49 "if" statements do not have their closing braces. A part of your script is shown below with the problem:

  ...
  ...
  if ( $. == 1 ) {  <== this opening brace does not have a corresponding closing brace
    s/$/ Index/;
  if ( $. == 2 ) { <== this opening brace does not have a corresponding closing brace
    s/$/ Chromosome Position/;
  ...
  ...

As a result the Perl script, when run, will fail and abort with a compile time error similar to the following (I've put the 3 dots below (...), but you'll see some information instead of the dots):

Missing right curly or square bracket at ...
syntax error at ...
Execution of ... aborted due to compilation errors.

(1) The reason you are seeing a file in your L:\NGS location is because that is done by your shell. When you put a redirection operator "> some_file", the shell immediately creates a 0-byte file at that moment in that location.

(2) And the reason the file at L:\NGS is empty is because the Perl program does not work. It fails with the error and does not print anything in the standard output (STDOUT) that could go to the L:\NGS file.

Now, you should be able to see the error message thrown by Perl. I'm not sure why you're not.

Maybe you are calling your Bash shell script like so?

my_shell_script.sh 2>/dev/null

That will redirect all error messages into the NULL device and you won't see the error message spewed by Perl.

To fix the problems:

(1) Ensure that the Perl program is correct syntax-wise. All opening braces must have corresponding closing braces.

(2) Replace this line in your shell script:

perl 'C:\Users\cmccabe\Desktop\annovar\matrix.pl' < "${id}".txt.hg19_multianno.txt > "L:\NGS\3_BUSINESS\Matrix\Torrent\matrix_"${id}".txt"

by this:

perl 'C:\Users\cmccabe\Desktop\annovar\matrix.pl' < "${id}".txt.hg19_multianno.txt

and run your shell script without the "2 >/dev/null" (if it is so now).
That should print the output to your screen (terminal). If it looks satisfactory, then add the "> L:\NGS\..." part to it.

cmccabe · March 30, 2015, 2:22pm

Thank you, the perl scripts runs but the results are not as expected. The .pl attached is supposed to add the text to the specific columns, but it is not and I'm not sure why or if there is a better way.

Basically, ${id}.txt.hg19_multianno.txt is the input file and after the perl script is run on that file (which adds the columns and text) and the new file is saved to the path L:\NGS\3_BUSINESS\Matrix\Torrent\matrix_${id}.txt

I also attached the desired results as an excel with the yellow color being the text/fields added by the script and the green are the text/fields in the original file. Thank you :).

Skrynesaver · March 30, 2015, 3:00pm

In the Perl script you add strings to the end of specific lines.

$. is the line number

$ in a substitution is the end of the current line.

So you're adding strings to the end of some lines rather than amending columns.

durden_tyler · March 30, 2015, 6:54pm

Refering to your "output.xls" file in the previous post, are the following assumptions correct?

(1) The data in the range S1:AP87 in the "Test" worksheet is in your "${id}.txt.hg19_multianno.txt" file already.

(2) You want to add the data in the range A1:R87 to the left of the existing data.

(3) You want to add the data in the range AQ1:AX87 to the right of the existing data.

If the assumptions above are correct, then:

(a) From where do you get the data in the range A2:R87?
(b) And from where do you get the data in the range AQ2:AX87?

cmccabe · March 30, 2015, 7:52pm

Only the header text is needed. The attempted perl script was only going to add these columns.

So basically,

 A1 = Index, B1 = Chromosomal Position, C1 = etc...

The only columns with data come from "${id}.txt.hg19_multianno.txt" . Thank you :).

durden_tyler · March 30, 2015, 9:46pm

I don't think I understood this statement:

Do you mean:

(a) the "${id}.txt.hg19_multianno.txt" file has only one line similar to this?

Chr Start ... clinvarsubmit clinvarreference

And you want to change it to this?

Index Chromosome ... Amino Acid Change Chr Start ... clinvarsubmit clinvarreference HP SPLICE ... Sanger References

Or

(b) the "${id}.txt.hg19_multianno.txt" file has a header line like this?

Chr Start ... clinvarsubmit clinvarreference

and data (from line no. 2 onwards) similar to the range A2:AX87 in the "Test" worksheet of your Excel workbook "output.xls".
And you want to convert the header to this:

Index Chromosome ... Amino Acid Change Chr Start ... clinvarsubmit clinvarreference HP SPLICE ... Sanger References

and keep the data as it is (i.e. just iterate from line no. 2 onwards till end of file without making any changes)?

cmccabe · March 30, 2015, 10:11pm

"${id}.txt.hg19_multianno.txt" the green highlight will have S1 - AP1 as row headers. It will also have data in those columns (in this case 87)... A1 - R1 (headers only) and AQ1-AX1 (headers only). No data is needed for those columns (only the headers). I have a SQL import that is expecting all those columns in one file.

Index Chromosome ... Amino Acid Change Chr Start ... clinvarsubmit clinvarreference HP SPLICE ... Sanger References

So, the text in bold is added to the "${id}.txt.hg19_multianno.txt" and a new file with the combined is saved in the path. I hope this helps and apologize for the confusion. Thank you for all your help :).

durden_tyler · March 30, 2015, 10:57pm

Could you attach the file: "${id}.txt.hg19_multianno.txt" over here?

cmccabe · March 30, 2015, 11:29pm

Sure. I will post it tomorrow morning. Thank you :).

cmccabe · March 31, 2015, 9:36am

I attached the "${id}.txt.hg19_multianno.txt" [/COLOR][/FONT] Thank you :).

durden_tyler · March 31, 2015, 10:08am

$
$ cat -n set_header.pl
     1  #!/usr/bin/perl
     2  use strict;
     3
     4  # Accept the input and output files as parameters
     5  my $input_file = $ARGV[0];
     6  my $output_file = $ARGV[1];
     7
     8  # Set the header columns to be added to the left
     9  # and to the right of the header in the input file
    10  my @left =  (
    11                   "Index",
    12                   "Chromosome Position",
    13                   "Gene",
    14                   "Inheritance",
    15                   "RNA Accession",
    16                   "Chr",
    17                   "Coverage",
    18                   "Score",
    19                   "A(#F,#R)",
    20                   "C(#F,#R)",
    21                   "G(#F,#R)",
    22                   "T(#F,#R)",
    23                   "Ins(#F,#R)",
    24                   "Del(#F,#R)",
    25                   "SNP db_xref",
    26                   "Mutation Call",
    27                   "Mutant Allele Frequency",
    28                   "Amino Acid Change"
    29              );
    30  my @right = (
    31                  "HP",
    32                  "SPLICE",
    33                  "Pseudogene",
    34                  "Classification",
    35                  "HGMD",
    36                  "Disease",
    37                  "Sanger",
    38                  "References"
    39              );
    40
    41  # Now open the input file, read the header line and sandwich it
    42  # between @left and @right arrays
    43  my $final_header;
    44  open (FH, "<", $input_file) or die "Can't open $input_file: $!";
    45  while (<FH>) {
    46      chomp;
    47      if ($. == 1) {
    48          $final_header = sprintf("%s\t%s\t%s\n", join("\t", @left), $_, join("\t",@right));
    49          last;
    50      }
    51  }
    52  close (FH) or die "Can't close $input_file: $!";
    53
    54  # Once the final header is set, print it to the output file
    55  open (FH, ">", $output_file) or die "Can't open $output_file: $!";
    56  print FH $final_header;
    57  close (FH) or die "Can't close $output_file: $!";
    58
$
$ perl set_header.pl del.txt.hg19_multianno.txt my_output.txt
$
$ sed 's/\t/\n/g' my_output.txt
Index
Chromosome Position
Gene
Inheritance
RNA Accession
Chr
Coverage
Score
A(#F,#R)
C(#F,#R)
G(#F,#R)
T(#F,#R)
Ins(#F,#R)
Del(#F,#R)
SNP db_xref
Mutation Call
Mutant Allele Frequency
Amino Acid Change
Chr
Start
End
Ref
Alt
Func.refGene
Gene.refGene
GeneDetail.refGene
ExonicFunc.refGene
AAChange.refGene
PopFreqMax
1000G2012APR_ALL
1000G2012APR_AFR
1000G2012APR_AMR
1000G2012APR_ASN
1000G2012APR_EUR
ESP6500si_ALL
ESP6500si_AA
ESP6500si_EA
CG46
common
clinvar
clinvarsubmit
clinvarreference
Otherinfo
HP
SPLICE
Pseudogene
Classification
HGMD
Disease
Sanger
References
$
$

cmccabe · March 31, 2015, 1:19pm

I am getting the below error:

Removing old files, please wait
 Old files removed, formatting for matrix Can't open : No such file or directory
 at C:\Users\cmccabe\Desktop\annovar\matrix.pl line 42.

line 42 in the .pl
open (FH, "<", $input_file) or die "Can't open $input_file: $!";

Thank you