How best to remove certain characters from filenames and folders recursively

hello,

I'm trying to figure out which tool is best for recursively renaming and files or folders using the characters \/*?�<>| in their name. I've tried many examples that use Bash, Python and Perl, but I'm not much of a programmer I seem to have hit a roadblock.

Does anyone have any examples on how best to do this?

Thanks in advance!

Ryan

First, I'm pretty sure a / cannot be in a filename. Second, what would you replace the * and ? and " with?

I would approach that like this (GNU tools specific):

find /targetdir -type f -print0 | xargs -rl0 rename_bad_filename

where rename_bad_filename is a perl script that is executable and in your PATH...

#!/usr/bin/perl
$newf = $oldf=$ARGV[0];
# replace any non alpha-numeric characters, excluding / . and - with an underscore
$newf =~ s/[^a-zA-Z0-9.\/-]/_/g;
exit 0 if $newf eq $oldf;
print "Renaming \"$oldf\" to \"$newf\"...";
rename $oldf,$newf ;
print ($! ? "Error: $!\n" : "OK\n");

Note: no error checking; existing files are clobbered.

Example output:

$ rename_bad_filename  2l4kj2l^%.
Renaming "2l4kj2l^%." to "2l4kj2l__."...Error: No such file or directory
$ rename_bad_filename  2l4kj2l.
$ rename_bad_filename  2l4kj2l.\!
Renaming "2l4kj2l.!" to "2l4kj2l._"...Error: No such file or directory
$ rename_bad_filename  \[2-7\]-ldap-reordered
Renaming "[2-7]-ldap-reordered" to "_2-7_-ldap-reordered"...OK

Hey Otheus,

Thanks for the response and the help. I'm able to get your script working for some reason. Don't I need to specifically specify perl in the command line? A script I found that seems to be kind of working. I'm just not sure how to define which variables to replace. The file types that Box doesn't support are \/*?�<>| Like you, I don't think I've ever see a forward slash in a file name, but Box said to make sure it's not there.

#!/usr/bin/perl

use strict;

sub processDir
{
    my $dir=shift;
    opendir(DIR, $dir);
    my @files=grep {! /^\.\.?$/} readdir(DIR);
    closedir(DIR);
    foreach my $file(@files)
    {
        if(-d "$dir/$file") 
        {
            processDir("$dir/$file");
        }
        my $newfile=$file;
        $newfile=~ s/[,& '\(\)]/_/g; #Search for ',','&','<','>','*','?','|','"',':', "'", '(', ')' in filenames and replace them with underscores.
        if( $newfile ne $file )
        {
            print "Renaming \"$dir/$file\" to \"$dir/$newfile\"\n";
            rename "$dir/$file","$dir/$newfile" or warn("Problems renaming $dir/$file --> $dir/$newfile: $!\n"); 
        }
    }
}

my $dir=shift;
if(!defined($dir))
{
    $dir=".";
}
processDir($dir);

At the command prompt I entered perl rename2.pl Desktop/BoxMigraiton/FILESHARE

Problem with renaming files and folders recursively is if you start renaming your folders, then the paths you're looking inside are changing on the go... So you need to process the insides first.

#!/bin/bash


find . -depth -name "*[,&<>*?|\":'()]*" |     # Find all files or folders containing 'bad' characters.
while read FILEDIR                            # Read them line-by-line.
do
        DIR="${FILEDIR%/*}"                   # Get the folder its inside
        FILE="${FILEDIR/*\/}"                 # Get the plain name.
        NEWFILE="${FILE//[,&<>*?|\":\'()]/_}" # Substitute _ for bad things.
        echo mv "$DIR/$FILE" "$DIR/$NEWFILE"  # Rename it.
done

Remove the 'echo' once you're sure it does what you want.

1 Like

This shell script looks awesome. Just what I was looking for. Thanks again.

So if I wanted to only replace these characters:

\/*?�<>| 

which line would I define that in?

There's two expressions actually. Look for the comments with 'bad' in them.

The one in 'find' has * on the beginning and end because the expression has to match the entire file/dir name for find to print it... The bash expression on the other hand only matches the part you want to change.

I think I understand. So both lines, should look like:

find . -depth -name "*\/*?�<>|*" 

|
and

NEWFILE="${FILE//[\/*?�<>|]/_}"

Did I get that right?

Then I should run

sh renamefiles.sh BoxMigraiton/FILESHAREWITHBADFILENAMES

, correct?

You removed the [] from the find expression, which will break it. It needs to be a character range just like the bash one.

Why not try it, with the echo intact, to see what it does?

Sorry about missing the brackets. This is why I can never be a coder.

I'm finally getting somewhere though!

When I ran the script, it seems like it's replacing more than I want it to. For instance, I ouput the results of the script to a file, and I'm seeing lots of entries like:

mv ./DESTIN LAB/Data/Collaborations/Daphna/619/PsychScience/Two Study/Destin_Oyserman_Feared_Selves.doc ./DESTIN LAB/Data/Collaborations/Daphna/619/PsychScience/Two Study/_estin_Oyserman_Feared__elves.doc

It's replacing the first D for some reason as well as added a double underscore at the end.

Here is the script so far:

#!/bin/bash

# Find all files or folders containing 'bad' characters.
find . -depth -name "*[~$*\/*?�<>|]*" |
# Read them line-by-line.
while read FILEDIR
do
       # Get the folder its inside
       DIR="${FILEDIR%/*}"
       # Get the plain name.
       FILE="${FILEDIR/*\/}"
       # Substitute _ for bad things.
       NEWFILE="${FILE//[\/~$*?�<>|]/_}"
       # Rename it.
       echo mv "$DIR/$FILE" "$DIR/$NEWFILE"
done

I'm not sure but I think you might need to escape (prefix with \) the ~ and the $. Also, I'm quite sure you cannot have filenames with / in it.

Also, I'd go with the more thorough scrubbing. Replace the stuff inside both [...] with:

[^a-zA-Z0-9._-]

and in the find command, drop the "*" before the open bracket ([).

Awesome!

[^a-zA-Z0-9._-]

worked terrific. If it's ok to leave the spaces in the name, what would I add or subtract out of that line to make it happen.

Thank you both again for your help!

To tell it that spaces are okay, add a space to the expression before the ] .

Is that in both lines? I tried it in both and it didn't work.

#!/bin/bash

# Find all files or folders containing 'bad' characters.
find . -depth -name "*[^a-zA-Z0-9._- ]*" |
# Read them line-by-line.
while read FILEDIR
do
       # Get the folder its inside
       DIR="${FILEDIR%/*}"
       # Get the plain name.
       FILE="${FILEDIR/*\/}"
       # Substitute _ for bad things.
       NEWFILE="${FILE//[^a-zA-Z0-9._- ]/_}"
       # Rename it.
       mv "$DIR/$FILE" "$DIR/$NEWFILE"
done
  • is a special character inside [ ]. If you mean it to match a literal -, give it \-

backslash before dash works? cool. i put it at the end which also works

try this

awk '{a=$0;gsub(/[^[:alnum:]]/,"_");print "mv " a " " $0;}' filename | sh