In the perl
below, which does execute, I am having trouble with the else
in Rule 3. The digit in f{8} is extracted and used to update f[55] accordinly along with the value in f[13].
There can be either - * or +
before the number that is extracted but the same logic applies, that is if the value is greater than 10 and f[13] is greater than 0.01 f[55] is Likely Benign, if the value is less than 10
and f[13] is less than 0.01 f[55] is VUS in. It is possible for f[13] to be .
(dot) but that is the same as zero. However, currently the value does not seem to be extracted by the perl
below and I
get different output then the desired. If I comment this rule out it fixes some of the lines but causes other lines to be incorrect. I am not sure what is causing the issue or how to fix it. Thank you :).
I added a description to explain each line of the output and what should be happening, which currently is not :).
file
R_Index Chr Start End Ref Alt Func.refGene Gene.refGene GeneDetail.refGene Inheritence ExonicFunc.refGene AAChange.refGene avsnp147 PopFreqMax 1000G_ALL 1000G_AFR 1000G_AMR 1000G_EAS 1000G_EUR 1000G_SAS ExAC_ALL ExAC_AFR ExAC_AMR ExAC_EAS ExAC_FIN ExAC_NFE ExAC_OTH ExAC_SAS ESP6500siv2_ALL ESP6500siv2_AA ESP6500siv2_EA CG46 SIFT_score SIFT_pred Polyphen2_HDIV_score Polyphen2_HDIV_pred Polyphen2_HVAR_score Polyphen2_HVAR_pred LRT_score LRT_pred MutationTaster_score MutationTaster_pred MutationAssessor_score MutationAssessor_pred dpsi_max_tissue dpsi_zscore CLINSIG CLNDBN CLNACC CLNDSDB CLNDSDBID Quality Reads Zygosity Score Classification HGMD Sanger
47 chr16 10032451 10032451 T C splicing GRIN2A NM_001134408:exon3:c.415-43A>G;NM_001134407:exon3:c.415-43A>G;NM_000833:exon4:c.415-43A>G . . . rs760077182 0.0004 . . . . . . 0.0001 . . . . . . 0.0004 . . . . . . . . . . . . . . . . 0.7699 1.365 . . . . . GOOD 93 het 9 . . .
77 chr14 76427420 76427420 C G splicing TGFB3 NM_003239:exon6:c.927-1G>C . . . rs767548724 0.0001 . . . . . . . . 0.0001 . . . . . . . . . . . . . . . . . 1 D . . -3.9123 -2.334 . . . . . GOOD 190 het 13 . . .
160 chr3 186944313 186944313 G C splicing MASP1 NM_001879:exon12:c.1442-5C>G . . . rs138989954 0.0055 0.0018 . . . 0.005 0.0041 0.0036 0.0008 0.0032 . 0.0007 0.0043 0.0037 0.0055 0.0025 0.0005 0.0036 . . . . . . . . . . . . . -0.8646 -1.388 . . . . . GOOD 265 het 13 . . .
desired output tab-delimeted
R_Index Chr Start End Ref Alt Func.refGene Gene.refGene GeneDetail.refGene Inheritence ExonicFunc.refGene AAChange.refGene avsnp147 PopFreqMax 1000G_ALL 1000G_AFR 1000G_AMR 1000G_EAS 1000G_EUR 1000G_SAS ExAC_ALL ExAC_AFR ExAC_AMR ExAC_EAS ExAC_FIN ExAC_NFE ExAC_OTH ExAC_SAS ESP6500siv2_ALL ESP6500siv2_AA ESP6500siv2_EA CG46 SIFT_score SIFT_pred Polyphen2_HDIV_score Polyphen2_HDIV_pred Polyphen2_HVAR_score Polyphen2_HVAR_pred LRT_score LRT_pred MutationTaster_score MutationTaster_pred MutationAssessor_score MutationAssessor_pred dpsi_max_tissue dpsi_zscore CLINSIG CLNDBN CLNACC CLNDSDB CLNDSDBID Quality Reads Zygosity Score Classification HGMD Sanger
47 chr16 10032451 10032451 T C splicing GRIN2A NM_001134408:exon3:c.415-43A>G;NM_001134407:exon3:c.415-43A>G;NM_000833:exon4:c.415-43A>G . . . rs760077182 0.0004 . . . . . . 0.0001 . . . . . . 0.0004 . . . . . . . . . . . . . . . . 0.7699 1.365 . . . . . GOOD 93 het 9 Likely Benign . .
77 chr14 76427420 76427420 C G splicing TGFB3 NM_003239:exon6:c.927-1G>C . . . rs767548724 0.0001 . . . . . . . . 0.0001 . . . . . . . . . . . . . . . . . 1 D . . -3.9123 -2.334 . . . . . GOOD 190 het 13 VUS . .
160 chr3 186944313 186944313 G C splicing MASP1 NM_001879:exon12:c.1442-5C>G . . . rs138989954 0.0055 0.0018 . . . 0.005 0.0041 0.0036 0.0008 0.0032 . 0.0007 0.0043 0.0037 0.0055 0.0025 0.0005 0.0036 . . . . . . . . . . . . . -0.8646 -1.388 . . . . . GOOD 265 het 13 VUS . .
description
line 1 is Likely Benign in f[55] because in NM_001134408:exon3:c.415-43A>G in f[8] the 43 after the minus is extracted and since that value is greater than 10 f[55] is Likely Benign
line 2 is VUS in f[55] because in NM_003239:exon6:c.927-1G>C in f[8] the 1 after the minus is extracted and since that value is less than 10 and f[13] is less than 0.01 f[55] is VUS
line 3 is VUS in f[55] because in NM_001879:exon12:c.1442-5C>G in f[8] the 1 after the minus is extracted and since that value is less than 10 and f[13] is less than 0.01 f[55] is VUS
perl
# disables certain Perl expressions that could behave unexpectedly or are difficult to debug, turning them into errors
use strict;
use warnings;
# Read and display the first line of the file passed at command line.
print scalar <>;
# Read line by line the file given at the command line, could be the stdin if no file is give as argument.
while (<>)
{ # start conditional block
# Make tokens out of the line, using the tab as separator.
my @f = split /\t/;
# Select 4 tokens and define @f.
my ($FuncrefGene,$PopFreqMax,$GeneDetailrefGene,$ClinSig) = @f[6,13,8,46];
# Rule1: Default classification VUS.
my $classification = 'VUS';
# map to 0 if it doesn't have numeric meaning.
$GeneDetailrefGene = 0 if $GeneDetailrefGene eq '.';
$PopFreqMax = 0 if $PopFreqMax eq '.';
# Rule2: Change to Likely Benign if conditions occurs.
if ($PopFreqMax > 0.011) {
$classification = 'Likely Benign';
}
#Rule3: GeneDetail condition
if ($FuncrefGene !~ /exonic/i && $GeneDetailrefGene=~/^\D(\d+)$/) { # capture the digits after any non-digit into $1
if ($1 > 10) { #
$1 //= 0; # Give it a value of zero if no numeric value was found.
$classification = 'Likely Benign'; # Reclassify intronic variants (with distance only) based on distance to exon > 10 to Likely Benign
}
}
else {
if ($FuncrefGene !~ /exonic/i) {
my ($transcript) = ($GeneDetailrefGene) =~ /(?:\.\d+[+*-]|\D)(\d+)/; # Get a numeric value if exists using (.) and (+/*/-) and capture digits into $transcript.
$transcript //= 0; # Give it a value of zero if no numeric value was found.
$classification = 'Likely Benign' if $transcript > 10; # Reclassify intronic variants (following c. nomenclature) to Likely Benign if distance greater than 10
}
}
# Rule4: ClinVar condition for ACMG
if ($ClinSig =~ /Pathogenic|Likely pathogenic/i) {
$classification = 'VUS';
}
# token 55 is classification.
$f[55] = $classification;
# display results and update @f.
print join "\t", @f;
} # end conditional block
current output
R_Index Chr Start End Ref Alt Func.refGene Gene.refGene GeneDetail.refGene Inheritence ExonicFunc.refGene AAChange.refGene avsnp147 PopFreqMax 1000G_ALL 1000G_AFR 1000G_AMR 1000G_EAS 1000G_EUR 1000G_SAS ExAC_ALL ExAC_AFR ExAC_AMR ExAC_EAS ExAC_FIN ExAC_NFE ExAC_OTH ExAC_SAS ESP6500siv2_ALL ESP6500siv2_AA ESP6500siv2_EA CG46 SIFT_score SIFT_pred Polyphen2_HDIV_score Polyphen2_HDIV_pred Polyphen2_HVAR_score Polyphen2_HVAR_pred LRT_score LRT_pred MutationTaster_score MutationTaster_pred MutationAssessor_score MutationAssessor_pred dpsi_max_tissue dpsi_zscore CLINSIG CLNDBN CLNACC CLNDSDB CLNDSDBID Quality Reads Zygosity Score Classification HGMD Sanger
47 chr16 10032451 10032451 T C splicing GRIN2A NM_001134408:exon3:c.415-43A>G;NM_001134407:exon3:c.415-43A>G;NM_000833:exon4:c.415-43A>G . . . rs760077182 0.0004 . . . . . . 0.0001 . . . . . . 0.0004 . . . . . . . . . . . . . . . . 0.7699 1.365 . . . . . GOOD 93 het 9 Likely Benign n .
77 chr14 76427420 76427420 C G splicing TGFB3 NM_003239:exon6:c.927-1G>C . . . rs767548724 0.0001 . . . . . . . . 0.0001 . . . . . . . . . . . . . . . . . 1 D . . -3.9123 -2.334 . . . . . GOOD 190 het 13 Likely Benign . .
160 chr3 186944313 186944313 G C splicing MASP1 NM_001879:exon12:c.1442-5C>G . . . rs138989954 0.0055 0.0018 . . . 0.005 0.0041 0.0036 0.0008 0.0032 . 0.0007 0.0043 0.0037 0.0055 0.0025 0.0005 0.0036 . . . . . . . . . . . . . -0.8646 -1.388 . . . . . GOOD 265 het 13 Likely Benign . .