Find matching file in bash with variable file names but consisent prefixs

As part of a bash the below line strips off a numerical prefix from directory 1 to search for in directory 2 .

for file in /home/cmccabe/Desktop/comparison/missing/*.txt
do
    file1=${file##*/}    # Strip off directory
    getprefix=${file1%%_*.txt}
    file1="/home/cmccabe/Desktop/comparison/final/${file1%%_*.txt}_*.vcf"  # look for matching file
    if [[ -f "$file1" ]]

files with prefix -- directory 1

F113.txt
H456.txt

files searched for prefix

F113_epilepsy.vcf
H456_marfan.vcf

In the files above F113.txt matches F113_epilepsy.vcf and H456.txt matched H456_marfan.vcf

Currently, I am not getting a match returned presumably because the portion after the _ is variable, but I am not sure if the above bash will find the file using the * . Thank you :).

Filename expansion doesn't happen inside double-quotes or on assignments. Try changing:

    file1="/home/cmccabe/Desktop/comparison/final/${file1%%_*.txt}_*.vcf"  # look for matching file

to:

    file1=$(printf '%s\n' "/home/cmccabe/Desktop/comparison/final/${file1%%_*.txt}_"*.vcf) # look for matching file

Note that the asterisk that is being used as a filename matching character has been moved out of the quoted portion of the string, and the expansion is done in an argument to a utility ( printf in this case) instead of in a string being assigned to a variable. Without the command substitution, you would be assigning a literal asterisk to the variable instead of expanding it to a pathname of an existing file (or files).

1 Like

What would your code need to do if there was more than one file that matches your patterns? The value of file1 could be list. Would that be a problem for you?

Robin

1 Like

The wildcard globbing can match none, one, or several files.
Perhaps you don't need the wildcard globbing at all? The following simply strips off the .txt and appends .vcf

 file1="/home/cmccabe/Desktop/comparison/final/${file1%%.txt}.vcf"  # look for matching file
1 Like

Take a look at post #1 in this thread again. Using the above code won't allow the code to take the filenames:

F113.txt
H456.txt

in one directory and find the corresponding files:

F113_epilepsy.vcf
H456_marfan.vcf

in the other directory.

Using the code suggested in post #2 will have the if statement:

    if [[ -f "$file1" ]]

take the then branch if there is exactly one matching file and take the else branch (if there is one) if there are no matching files and if there is more than one matching file. But, of course, we weren't shown the then or else branches so we can't make an educated guess at whether or not matching multiple files is an issue that the code is prepared to handle.

2 Likes

Thank you all :slight_smile:

Each filename prefix is unique and will only occur once. :slight_smile: