Rename text file with a specific pattern in directory

cmccabe · February 9, 2017, 1:09pm

I am trying to rename all text files in a directory that match a pattern. The current command below seems to be using the directory path in the name and since it already exists, will not do the rename . I am not sure what I am missing? Thank you :).

Files to rename in /home/cmccabe/Desktop/test/vcf/overall/annovar

16-0000_File-A_variant_strandbias_readcount.vcf.hg19_multianno_dbremoved_removed_final_index_inheritence_import_classify.txt
16-0002_File-B_variant_strandbias_readcount.vcf.hg19_multianno_dbremoved_removed_final_index_inheritence_import_classify.txt
16-0005_File-C_variant_strandbias_readcount.vcf.hg19_multianno_dbremoved_removed_final_index_inheritence_import_classify.txt

desired output

16-0000_File-A_hg19multianno.txt
16-0002_File-B_hg19multianno.txt
16-0005_File-C_hg19multianno.txt

rename 's/(.*?_[^_]+).*/${1}_hg19multianno.txt/g' /home/cmccabe/Desktop/test/vcf/overall/annovar/*_classify.txt

drysdalk · February 10, 2017, 11:20am

Hi,

I've come up with the following script which does what you need, I think:

#!/bin/bash

pattern1="variant_strandbias_readcount.vcf."
pattern2_old="hg19_multianno"
pattern2_new="hg19multianno"
pattern3="_dbremoved_removed_final_index_inheritence_import_classify"

for file in `/bin/ls *.txt`
do
        newname=`echo $file | /bin/sed s/$pattern1//g | /bin/sed s/$pattern2_old/$pattern2_new/g | /bin/sed s/$pattern3//g`
        /bin/mv -fv $file $newname
done

If I run it, this is what I get:

$ ls -1
16-0000_File-A_variant_strandbias_readcount.vcf.hg19_multianno_dbremoved_removed_final_index_inheritence_import_classify.txt
16-0002_File-B_variant_strandbias_readcount.vcf.hg19_multianno_dbremoved_removed_final_index_inheritence_import_classify.txt
16-0005_File-C_variant_strandbias_readcount.vcf.hg19_multianno_dbremoved_removed_final_index_inheritence_import_classify.txt
script.sh
$ ./script.sh 
'16-0000_File-A_variant_strandbias_readcount.vcf.hg19_multianno_dbremoved_removed_final_index_inheritence_import_classify.txt' -> '16-0000_File-A_hg19multianno.txt'
'16-0002_File-B_variant_strandbias_readcount.vcf.hg19_multianno_dbremoved_removed_final_index_inheritence_import_classify.txt' -> '16-0002_File-B_hg19multianno.txt'
'16-0005_File-C_variant_strandbias_readcount.vcf.hg19_multianno_dbremoved_removed_final_index_inheritence_import_classify.txt' -> '16-0005_File-C_hg19multianno.txt'
$ ls -1
16-0000_File-A_hg19multianno.txt
16-0002_File-B_hg19multianno.txt
16-0005_File-C_hg19multianno.txt
script.sh
$

Hope this helps.

Don_Cragun · February 10, 2017, 6:39pm

drysdalk:

Hi,

I've come up with the following script which does what you need, I think:

#!/bin/bash

pattern1="variant_strandbias_readcount.vcf."
pattern2_old="hg19_multianno"
pattern2_new="hg19multianno"
pattern3="_dbremoved_removed_final_index_inheritence_import_classify"

for file in `/bin/ls *.txt`
do
   newname=`echo $file | /bin/sed s/$pattern1//g | /bin/sed s/$pattern2_old/$pattern2_new/g | /bin/sed s/$pattern3//g`
   /bin/mv -fv $file $newname
done

If I run it, this is what I get:

$ ls -1
16-0000_File-A_variant_strandbias_readcount.vcf.hg19_multianno_dbremoved_removed_final_index_inheritence_import_classify.txt
16-0002_File-B_variant_strandbias_readcount.vcf.hg19_multianno_dbremoved_removed_final_index_inheritence_import_classify.txt
16-0005_File-C_variant_strandbias_readcount.vcf.hg19_multianno_dbremoved_removed_final_index_inheritence_import_classify.txt
script.sh
$ ./script.sh 
'16-0000_File-A_variant_strandbias_readcount.vcf.hg19_multianno_dbremoved_removed_final_index_inheritence_import_classify.txt' -> '16-0000_File-A_hg19multianno.txt'
'16-0002_File-B_variant_strandbias_readcount.vcf.hg19_multianno_dbremoved_removed_final_index_inheritence_import_classify.txt' -> '16-0002_File-B_hg19multianno.txt'
'16-0005_File-C_variant_strandbias_readcount.vcf.hg19_multianno_dbremoved_removed_final_index_inheritence_import_classify.txt' -> '16-0005_File-C_hg19multianno.txt'
$ ls -1
16-0000_File-A_hg19multianno.txt
16-0002_File-B_hg19multianno.txt
16-0005_File-C_hg19multianno.txt
script.sh
$

Hope this helps.

Note that several short cuts can be used to speed up the above script and reduce chances for failures due to system limits on the size of argument lists allowed when exec ing a file...

The command:

for file in `/bin/ls *.txt`

produces exactly the same list of files to be processed as:

for file in *.txt

as long as you don't have any directories located in the current working directory with names ending in .txt (which, although possible, would be unconventional); none of the selected files have names containing any <space>, <tab>, or <newline> characters (which cause "file not found" errors when using ls *.txt , but will work correctly when just using *.txt ); and, if there are a lot of file, /bin/ls *.txt can fail if the shell runs out of memory producing the list of filenames matching *.txt or the list exceeds the ARG_MAX system limit while just using *.txt will only fail if the shell runs out of memory producing the list of filenames. In cases where the given pattern does match a directory name, /bin/ls pattern will give you a list of the unhidden files in directories matching pattern while just using pattern will give you the names of the directories instead of the names of the files in the directories.

The command:

        newname=`echo $file | /bin/sed s/$pattern1//g | /bin/sed s/$pattern2_old/$pattern2_new/g | /bin/sed s/$pattern3//g`

invokes three copies of /bin/sed when only one is needed. That takes more time, more swap space, more memory, more ... Try the following instead:

        newname=`echo $file | /bin/sed -e s/$pattern1//g -e s/$pattern2_old/$pattern2_new/g -e s/$pattern3//g`

to get exactly the same results taking less time, less swap space, less memory, less ... Note also that there is no need to to the g flag on these substitutions since you are only trying to remove one copy of each of these patterns, each of these substitutions will fail if any of the variables being used in these substitutions contain any <space>, <tab>, or <newline> characters. And, if any of the filenames being modified starts with a <hyphen> or contains any <space>, <tab>, <newline>, or <backslash> characters, echo might not produce the results you want. Therefore, I would suggest using:

        newname=`printf '%s\n' "$file" | /bin/sed -e "s/$pattern1//" -e "s/$pattern2_old/$pattern2_new/" -e "s/$pattern3//"`

instead. Note also that with most modern shells all of these changes could be performed in the shell with various variable substitutions instead of using command substitution to invoke sed . But, since we don't know that operating system or shell are being used by the submitter of this thread, I won't go there.

And, for safety in case of some of the characters listed above might appear in filenames, I would also change:

        /bin/mv -fv $file $newname

to:

        /bin/mv -fv "$file" "$newname"

cmccabe · February 21, 2017, 8:35am

Thank you both, works perfectly. Sorry for the delay I was out of the country, any just out of curiosity why did the rename not work as expected? Thank you :).