Hi all,
I'm dealing with a bash script to merge the elements of a set of files and counting how many times each element is present. The last field is the file name.
Sample files:
head -5 *.tab
==> 3J373_P15Ac1y2_01_LS.tab <==
chr1 1956362 1956362 G A hom 3J373_P15Ac1y2_01_LS.tab
chr1 1957037 1957037 T C hom 3J373_P15Ac1y2_01_LS.tab
chr1 1960926 1960926 T C hom 3J373_P15Ac1y2_01_LS.tab
chr1 17359676 17359676 C A hom 3J373_P15Ac1y2_01_LS.tab
chr1 17371152 17371152 T C het 3J373_P15Ac1y2_01_LS.tab
==> 7D300_P15Ac1y2_01_GATK.tab <==
chr1 1956362 1956362 G A het 7D300_P15Ac1y2_01_GATK.tab
chr1 1957037 1957037 T C het 7D300_P15Ac1y2_01_GATK.tab
chr1 1959107 1959107 G C het 7D300_P15Ac1y2_01_GATK.tab
chr1 1959699 1959699 G A het 7D300_P15Ac1y2_01_GATK.tab
chr1 17359676 17359676 C A hom 7D300_P15Ac1y2_01_GATK.tab
.
.
.
Up to several dozens of files...
Here is my code:
cat *.tab \
| awk 'BEGIN {FS="\t";OFS="\t"} {s[$1":"$2"-"$3";"$4"/"$5]=$0; c[$1":"$2"-"$3";"$4"/"$5]++} END {for (i in s) print i,c,$7}' \
| sort -V \
> CommonVariants.bed
Output file:
cat CommonVariants.bed
chr1:1956362-1956362;G/A 36 7D300_P15Ac1y2_01_LS.tab
chr1:1957037-1957037;T/C 36 7D300_P15Ac1y2_01_LS.tab
chr1:1957112-1957112;C/T 2 7D300_P15Ac1y2_01_LS.tab
chr1:1959107-1959107;G/C 2 7D300_P15Ac1y2_01_LS.tab
chr1:1959138-1959138;G/C 2 7D300_P15Ac1y2_01_LS.tab
chr1:1959549-1959549;G/A 2 7D300_P15Ac1y2_01_LS.tab
chr1:1959699-1959699;G/A 4 7D300_P15Ac1y2_01_LS.tab
chr1:1959789-1959789;A/G 3 7D300_P15Ac1y2_01_LS.tab
chr1:1960674-1960674;C/T 6 7D300_P15Ac1y2_01_LS.tab
chr1:1960926-1960926;T/C 18 7D300_P15Ac1y2_01_LS.tab
chr1:1961144-1961144;C/T 2 7D300_P15Ac1y2_01_LS.tab
chr1:1961408-1961408;C/T 6 7D300_P15Ac1y2_01_LS.tab
chr1:1961466-1961466;C/T 2 7D300_P15Ac1y2_01_LS.tab
chr1:17359676-17359676;C/A 36 7D300_P15Ac1y2_01_LS.tab
I can create the index and count the lines. However I can't figure out how to append the file names into the $7 column.
I guess I have to replace "$7" with an array in the awk statement, but this is too much for me.
I really appreciate any help.
Thank you in advance