In the below awk using the tab-delimited input, I am trying count the - symbol in $5 and output the count as well as the renamed condition ins . I am also count the - symbol in $6 and output the count as well as the renamed condition del . I am also count the tomes that in $5 and $6 there are actually letters in both, and output the count as well as the renamed condition snp .
input
Index Mutation Call Start End Ref Alt Func.refGene Gene.refGene ExonicFunc.refGene Sanger
13 c.[1035-3T>C]+[1035-3T>C] 166170127 166170127 T C intronic SCN2A
16 c.[2994C>T]+[=] 166210776 166210776 C T exonic SCN2A synonymous SNV
19 c.[4914T>A]+[4914T>A] 166245230 166245230 T A exonic SCN2A synonymous SNV
20 c.[5109C>T]+[=] 166245425 166245425 C T exonic SCN2A synonymous SNV
21 c.[5139C>T]+[=] 166848646 166848646 G A exonic SCN1A synonymous SNV
22 c.3152_3153insAACCACT 166892841 166892841 - AGTGGTT exonic SCN1A frameshift insertion TP
23 c.2044-5delT 166898947 166898947 A - intronic SCN1A
25 c.1530_1531insA 166901684 166901684 - T exonic SCN1A frameshift insertion FP
Sorry to say but I am not able to understad it, following are some questions on this.
i- What you mean here by renamed ins and del here?
ii- Are you trying to fill any field with above metioned keywords?
iii- I could see string del and ins on 23rd and 25th lines respectively, so is it related to it? Though it is second column where I could see it(considering field seprator is space or tab here).
Request you to please post more meaningful data samples and meaningful output samples too, so that we could try to help you in same.
awk -F'\t' '$5=="-"{count++} # check for - in $5
$6=="-"{count++} # check for - in $6
END{print "Category","Count"; # replace null with zero
print "indel",count+0}' out |
column -t > count
# print tab-delimited
i- since I am just counting - , I am renaming that based on which field was used
For example, .
if $5 was used to count the - , then the - is renamed or printed as ins
if $6 was used to count the - , then the - is renamed or printed as del
if $5 and $6 had letters in them and were used to count then that is renamed or printed as snp
ii- I am not filling the fields with data, rather using the data already there to output the result.
iii- those keywords are in that field $2 in this example but that is not always the case.