In the bash
below the unique headers of each vcf.gz
are stored in a text file with the same name. That is if 16-0000-file.vcf.gz
was used the header text file would be 16-0000-file_header.txt
.
There can be multiple vcf.gz
in a directory, usually 3, that I need to fix the header in each file before further processing it. My question is how can I match each text file with its vcf.gz
and pass the stored variables of each to the reheader
code ?
In the below I strip off the unique numerical prefix 16-0000
from both the vcf.gz
and text file, but am not sure how to match the two files
IAm=${0##*/}
InDir1='/home/cmccabe/Desktop/NGS/test'
InDir2='/home/cmccabe/Desktop/NGS/test'
OutDir='/home/cmccabe/Desktop/NGS/test'
cd "$InDir1"
for file1 in *.txt
do # Grab file prefix.
p=${file1%%_*}
# Find matching file2.
file2=$(printf '%s' "$InDir2/$p"_*.vcf.gz)
if [ ! -f "$file2" ]
then printf '%s: No single file matching %s found.\n' "$IAm" \
"$file1" >&2
continue
fi
# store matches
out=${file1##*/} && ${file2##*/}
vcf.gz in directory (file2)
16-0000-file1.vcf.gz
16-0001-file2.vcf.gz
16-0002-file3.vcf.gz
matching text file in directory (file1)
16-0000-file1_header.txt
16-0001-file2_header.txt
16-0002-file3_header.txt
So the contents of 16-0000.txt
would be used to update 16-0000.vcf.gz
using the code below.
reheader code
# edit the header
logfile=/home/cmccabe/Desktop/NGS/test/process.log
for f in /home/cmccabe/Desktop/NGS/test/*.vcf.gz ; do
echo "Start vcf add header creation: $(date) - file: $f"
bname=`basename $f`
pref=${bname%%.vcf.gz}
bcftools reheader -h $file1 $file2 > ${pref}_fixed.vcf.gz
echo "End add header creation: $(date) - file: $f"
done >> "$logfile"