In the below there are two generic .vcf files (genome.S1.vcf and genome.S2.vcf) in a directory. There wont always be two genaric files but I am trying to use bash to rename each of these generic files with specfic text (unique identifier) within in each .vcf . The text will always be different, but it will always be in the same position (after the word FORMAT) on the same line (that starts with #). Each .vcf is tab-delimited , not sure if my attempt is the best way, but hopefully it helps. Thank you :).
genome.S1.vcf
...
...
...
##FILTER=<ID=NotGenotyped,Description="Locus contains forcedGT input alleles which could not be genotyped">
##FILTER=<ID=PloidyConflict,Description="Genotype call from variant caller not consistent with chromosome ploidy">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NAME1_S1
chr10 323215 . A . . LowGQX END=323313;BLOCKAVG_min30p3a GT:GQX:DP:DPF:MIN_DP .:.:0:0:0
chr10 323314 . C . . LowGQX;LowDepth END=323397;BLOCKAVG_min30p3a GT:GQX:DP:DPF:MIN_DP 0/0:3:1:0:1
genome.S2.vcf
...
...
...
##FILTER=<ID=NotGenotyped,Description="Locus contains forcedGT input alleles which could not be genotyped">
##FILTER=<ID=PloidyConflict,Description="Genotype call from variant caller not consistent with chromosome ploidy">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 11-1111-ID_S5
chr10 323215 . A . . LowGQX END=323313;BLOCKAVG_min30p3a GT:GQX:DP:DPF:MIN_DP .:.:0:0:0
chr10 323314 . C . . LowGQX;LowDepth END=323385;BLOCKAVG_min30p3a GT:GQX:DP:DPF:MIN_DP .:.:0:0:0
desired (each vcf in directory renamed with unique identifier)
NAME1_S1.vcf
...
...
...
##FILTER=<ID=NotGenotyped,Description="Locus contains forcedGT input alleles which could not be genotyped">
##FILTER=<ID=PloidyConflict,Description="Genotype call from variant caller not consistent with chromosome ploidy">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NAME1_S1
chr10 323215 . A . . LowGQX END=323313;BLOCKAVG_min30p3a GT:GQX:DP:DPF:MIN_DP .:.:0:0:0
chr10 323314 . C . . LowGQX;LowDepth END=323397;BLOCKAVG_min30p3a GT:GQX:DP:DPF:MIN_DP 0/0:3:1:0:1
11-1111-ID_S5.vcf
...
...
...
##FILTER=<ID=NotGenotyped,Description="Locus contains forcedGT input alleles which could not be genotyped">
##FILTER=<ID=PloidyConflict,Description="Genotype call from variant caller not consistent with chromosome ploidy">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 11-1111-ID_S5
chr10 323215 . A . . LowGQX END=323313;BLOCKAVG_min30p3a GT:GQX:DP:DPF:MIN_DP .:.:0:0:0
chr10 323314 . C . . LowGQX;LowDepth END=323385;BLOCKAVG_min30p3a GT:GQX:DP:DPF:MIN_DP .:.:0:0:0
bash
cd /path/to/files
for f in *.vcf ; do # loop through all vcf files
new="$(head -1 "$f" | awk '{print $10}').vcf" # store value of $10 in new
if [ ! -f "$new" ]; then # if original file doesn't match new
echo -e Renaming $f to $new # log rename
mv "$f" "$new" # rename original to new
fi # close if
done # close loop
Note: -n (no clobber) for the mv command is non-standard extension to the POSIX standard. Alternatively, try using -i for interactive use (but some systems ignore -i when used in a non-interactive manner, so test this also), or (better) try testing for file existence beforehand.
For your this solution, if you ask me IMHO we could avoid using renaming of a Input_file with system while reading Input_file itself could cause issues. Since Input_file is being read and we are renaming it.
IMHO, I would go with approach where will check for string FORMAT in line and print the rename shell command(I am hoping each Input_file should have only 1 rename because once Input_file which is being read is renamed can't be find again in system, since no same name file is existing now).
So what I am doing here is I am printing shell commands by same condition used in your provided code as follows:
As long as a mv operation is performed on the same file system - as is the case here - that should not pose a problem, since mv then only manipulates directory data: A file name is nothing more than a directory entry, a pointer (a hard link) to the file itself.
When a process opens a file for reading, the operation system creates an entry (file descriptor) to represent that file and stores information about that opened file in memory. So then the directory entry is no longer used.
The mv operation is thus free to manipulate the directory entry.
So for the process that has opened and is reading the file, nothing changes as the directory data is being changed.
When it is done reading it just closes the file descriptor.
Also, the file list expanded by the glob is expanded before being passed to the awk script, so new file names are not passed to the script.
Even if the file is moved between file systems, there should not be a problem as long as the file is kept open. Even though the file and it's contents ARE moved by the mv command, the OS keeps the file readable until it is closed and unlinked. See below, using nezabudka's one liner extended by mv 's -v ( --verbose ) option
Even though attributed "deleted", the file's contents is still available and readable. Of course, once unlinked, the file can't be reopened / reused in its original location.