Need help for a Shell script to rename multiple files

Hi!

I need help to create a shell script to search inside a file and then copy a portion of the search result as the new file name.

Basically I was hacked over the weekend and the genius wipe out my drive from my server. I was able to recover alot of files, but biggest problem Is now the files have no name and make it impossible to sort.

I have found the filename inside the files themself and need to search for it and then rename the file with the results.
Approx 12 000 files to be done.

Here is a sample of the source of the files I'm talking about:

%!PS-Adobe-3.1 EPSF-3.0
%ADO_DSC_Encoding: MacOS Roman
%%Title: carte_finale_GestionPhocusout.eps
%%Creator: Adobe Illustrator(R) 14.0
%%For: Porky
%%CreationDate: 09-11-09
%%BoundingBox: 0 0 360 489
%%HiResBoundingBox: 0 0 359.9981 488.5669
%%CropBox: 0 0 359.9981 488.5669
%%LanguageLevel: 3

This is the filename inside the file itself right after the "%%Title: "

%%Title: carte_finale_GestionPhocusout.eps

Now with this info, I want to rename the file itself to exactly this:

carte_finale_GestionPhocusout.eps

Can it be done?

Many thanks for taking the time to read and help me with this!

What are the files currently called?

# assuming the files are all called file_ something
for fn in `ls file*`
  do
# display first ten lines
  head -10 $fn
# or, more of what you need
  newnam=`head -10 $fn | grep "%%Title" | cut -d" " -f2"`
# now rename it 
  mv $fn $newnam
done
Illustrator 9-10-04003
Illustrator 9-10-04004
Illustrator 9-10-04005
Illustrator 9-10-04006
and so on

---------- Post updated at 04:47 PM ---------- Previous update was at 04:38 PM ----------

[/COLOR]here is the output of the ur script

 for fn in `ls *`
>   do
> # display first ten lines
>   head -10 $fn
> # or, more of what you need
>   newnam=`head -10 $fn | grep "%%Title" | cut -d" " -f2"`
> # now rename it 
>   mv $fn $newnam
> done
head: Illustrator: No such file or directory
-sh: command substitution: line 1: unexpected EOF while looking for matching `"'
-sh: command substitution: line 2: syntax error: unexpected end of file
usage: mv [-f | -i | -n] [-v] source target
       mv [-f | -i | -n] [-v] source ... directory
head: 9-10-04003.eps: No such file or directory
-sh: command substitution: line 1: unexpected EOF while looking for matching `"'
-sh: command substitution: line 2: syntax error: unexpected end of file
usage: mv [-f | -i | -n] [-v] source target
       mv [-f | -i | -n] [-v] source ... directory

borrowing from joey:

# assuming the files are all called file_ something
for fn in `ls file*`
  do
# display first ten lines
  head -10 "$fn"
# or, more of what you need
  newnam=`head -10 "$fn" | grep "%%Title" | cut -d" " -f2"`
# now rename it 
  mv "$fn" "$newnam"
done

Here is the output

root# for fn in `ls illustrator*`;   do   head -10 "$fn"   newnam=`head -10 "$fn" | grep "%%Title" | cut -d" " -f2"`   mv "$fn" "$newnam"; done
ls: illustrator*: No such file or directory
root# ls
.DS_Store			Illustrator 9-10-04003.eps

I think both forms of the loops are choking because the whitespace in the file names. The shell uses the value of IFS to break apart fields, and this is usually a white space. Using a shell 'for' loop in the form of

for file in $( ls Illustrator* ) #modern form of: for file in `ls Illustrator*`

will break all the files names apart on white spaces (or more accurately on the values in $IFS). You will get the error of "[utility]: No such file or directory [file fragment]"

the other 'for' loop form of:

for file in '$( ls Illustrator* )' #or for file in "$( ls Illustrator* )"

also does not work because the shell loop returns a single monolithic paragraph with ALL the files names in it.

You can a) rename all the files with no white space, or b) use a form of loop that can use asciz strings for the loop, c) use Perl.

The general form of loop 'b' that can use ASCIZ strings in the loop is like this:

find [directory] -type f [-maxdepth X] -print0 | while read -d $'\0' file
do
   #rename the files
done

Note the "-print0" and the "while read -d $'\0'". The first reads the file names and puts them in a string that is terminated by a ASCII NUL; the second sets a while loop that reads the list with NUL terminators...

Good luck!

What would the script look like?

---------- Post updated at 05:47 PM ---------- Previous update was at 05:41 PM ----------

# assuming the files are all called file_ something
find [directory] -type f [-maxdepth X] -print0 | while read -d $'\0' file
do
# display first ten lines
  head -10 "$fn"
# or, more of what you need
  newnam=`head -10 "$fn" | grep "%%Title" | cut -d" " -f2"`
# now rename it 
  mv "$fn" "$newnam"
done

Gives me:
find: [-maxdepth: unknown option

Hello, all:

I see several solutions using the following snippet...

for fn in `ls file*`

... and just wanted to suggest not to do that. After the command substitution, filenames will undergo word splitting. If any of them include whitespace (assuming a default IFS value), the value of fn will not be set correctly for each filename (such filenames will be assigned to fn piecemeal over multiple loop iterations).

for fn in file*

... will accomplish the task without word splitting issues.

Illustrative example:

$ touch without_space with\ space

#Incorrect
$ for i in `ls w*`; do echo "$i"; done
with
space
without_space

#Correct
$ for i in w*; do echo "$i"; done
with space
without_space

Regards,
Alister

So sorry -- I was assuming that you knew how to use find.

When I put something in brackets, that is an optional thing. Are all the files in a single directory? If so, you would use "-maxdepth 1" so that find won't go returning all the files contained in the starting directory.

Here is what I am guessing: You a Mac, you used an unerase tool that did not recover the names, and all the files are in a single directory in the form of:

Illustrator 9-10-04003
Illustrator 9-10-04004
Illustrator 9-10-04005
Illustrator 9-10-04006

If so, the loop might look like this:

#!/bin/bash

#hacker_renamer

cd [PUT THE DIRECTORY HERE]

find . -type f -maxdepth 1 -print0 | while read -d $'\0' file
do
  head -10 "$file"   # use the loop variable for the name

  newnam=`head -10 "$file" | grep "%%Title" | cut -d" " -f2"`

  printf "mv -n %s %s\n" "$file" "$newnam"
done

Try and inspect that the individual "mv" statements are good.

Once you are satisfied, pipe the output back to shell like so:

./hacker_renamer | /bin/bash

That error means exactly what it says. You are trying to loop over a list of file names that begin with the word "illustrator" but there are none (at least not in the current working directory).

Try "Illustrator*" instead. UNIX is case sensitive. "illustrator*" is not the same as "Illustrator". If that was an issue, the following should print out a paged list of the filenames:

for f in Illustrator*; do echo "$f"; done | more

If that seems to work, you can try:

for f in Illustrator*; do name=$(sed -n '/^\(%%Title: \)/{s///p;q;}' "$f"); [ "$name" ] && echo mv "$f" "$name"; done

... and if the list of mv commands looks good, you can remove the "echo". Execute at your own risk :slight_smile:

Alister

You are only partially correct. It is deceptive, but your second form is also incorrect in this case.

Try this:

for i in `ls`; do ls -l "$i"; done

that fixes the space problem, but breaks with file names with other characters, such as common ones of single or double quotes or mean ones like tab or CR.

The other problem is if you use more complex command substitution, such as this:

for i in "`ls | grep '^D'`"; do  echo "$i"; done

the return is:

Desktop
Documents
Downloads
Drive Util

which seems correct. However, now run:

for i in "`ls | grep '^D'`"; do wc -l "$i"; done

The return now is:

wc: Desktop\nDocuments\nDownloads\nDrive Util: open: No such file or directory

This is because BASH has passed all the file names with carriage returns to wc and wc thinks it is a single file name. Bad result especially with mv etc...

The only robust way I have found is with a ASCIZ termination and only while loops in bash support that...

---------- Post updated at 03:35 PM ---------- Previous update was at 03:30 PM ----------

He is on a Mac, and the default is NOT case sensitive. Yours is better form however. On Mac BASH you can have case sensitive string comparisons tell you one thing and the file system do something disastrously different...

This seems to works, will try on a small batch and keep you guys posted!

---------- Post updated at 07:09 PM ---------- Previous update was at 06:52 PM ----------

ok did a small test, and works well, but I just had to be french (lol) and need to support all the french special caracters like ���� and one of the file contained a "�" and resulted in a "%8" instead! Can this be resolve?

Hi, drewk:

Nope, it's not. Everything that I said in that post is 100% correct.

That does not fix any whitespace problems. If anything, that code is terribly broken and suffers from the shortcomings (unwanted word splitting breaking filenames into pieces) that I spoke of earlier.

Proof:

#Create two files in an empty directory, each of whose name contains a space
$ touch '1 2' '3 4'

#Your code that supposedly fixes the space problem
$ for i in `ls`; do ls -l "$i"; done
ls: 1: No such file or directory
ls: 2: No such file or directory
ls: 3: No such file or directory
ls: 4: No such file or directory

#The correct way to do that
$ for i in *; do ls -l "$i"; done
-rw-r--r--   1 xxxx  xxxx  0 Mar  4 12:16 1 2
-rw-r--r--   1 xxxx  xxxx  0 Mar  4 12:16 3 4

Regarding...

...the issues you present have absolutely nothing to do with injected carriage returns, but with the fact that the double quoted command substitution will ALWAYS evaluate to a single word regardless of how many files are in the directory. Those for loops are pointless and equivalent to:

echo "`ls | grep '^D'`"
wc -l "`ls | grep '^D'`"

If you can't see that and you still think I'm mistaken, perhaps the 'Useless Use of' links @ http://www.unix.com/shell-programming-scripting/131346-compiling-multiple-c-files-starting-xxx.html\#post302400942 will help. If that still doesn't do it, read sh man page and/or posix sh documentation (particularly the sections on word splitting, quoting, and command substitution) and experiment until you see it.

Cheers,
Alister

P.S. By the way, since the tone of a message can easily be misinterpreted online, I just wanted to make it clear that I responded to your post in detail to help you understand and to ensure that no one else who has read this thread makes the same mistakes. It was not intended as an "i must win this Internet argument" type of response. :wink: I hope it helped.

You are correct, I typed faster than I was thinking, and I certainly did not mean to offend in any way. I appreciate your detailed response. My examples of the for loop were not good -- granted and I acknowledge that your examples work as advertised.

You did not comment on the ASCIZ string form that I stated was an alternative. I do think that ASCIZ strings are better in many cases, especially when using find in the form of:

find . -type f -maxdepth X -print0 | while read -d $'\0' file
do
   do stuff
done

Hi, drewk:

No offense taken. I was just making sure that my response didn't come across as petty. So, now that we're done being nice, back to business :wink:

I don't use bash very often so I've never used that read option before. Usually, in similiar situations I make do with xargs. However, looking at it, I spot two bugs (ironically, one of them is a field splitting bug) in that code. Check it out:

#Create a directory whose name begins with leading spaces
$ mkdir '        8spaceslater'

#Place two files in that directory, with the first one ending in a backslash
$ touch \ \ \ \ \ \ \ \ 8spaceslater/1\\
$ touch \ \ \ \ \ \ \ \ 8spaceslater/2

#Now let's try that code
$ find '        8spaceslater' -type f -print0 | while read -d $'\0' file; do echo "$file"; done         
8spaceslater/1

What happened to the leading spaces? The leading spaces are lost during the field splitting step. Let's disable field splitting by setting IFS to an empty string:

$ find '        8spaceslater' -type f -print0 | while IFS='' read -d $'\0' file; do echo "$file"; done
        8spaceslater/1

Now that we got our leading white space through intact, what about the trailing backslash of "1\" and the second file, "2"? The backslash is consumed as part of an escape sequence which results in a null byte in $file, which is why the second file does not appear in the output (a nullbyte marks the end of a string in C). This backslash escape sequence processing can be disabled by passing read the -r option, which enables raw mode.

For maximum "robustness":

$ find '        8spaceslater' -type f -print0 | while IFS='' read -rd $'\0' file; do echo "$file"; done
        8spaceslater/1\
        8spaceslater/2

In this particular example, the only advantage to using null byte delimiters is proper handling of filenames containing linefeeds. If that's not an issue, nothing is gained.

Regards,
Alister

Wow! That was immensely educational and valuable to me.

There are inconsistencies in the BASH handling of leading spaces when using a "while read -d" as you pointed out. I had not been using the "while IFS = '' read -rd" form and so leading spaces in files names would not have been handled. I had always assumed that it was the same as using xargs.

It is not just leading spaces that are at issue. Consider:

#create three challenging directories:
$ mkdir {'   3spacesthere','dir?name','spaces in middle'}

#create challenging file names in the directories:
$ touch {'   3spacesthere','dir?name','spaces in middle'}/{1,1\\,2}

#find the '1's -- my earlier example:
$ find . -type f -maxdepth 2 -name "1*" -print0 | while read -d $'\0' file; do printf "%s\n" "$file"; done
./   3spacesthere/1
./   3spacesthere/1
./dir?name/1

#WHOOPS! No backslash printed and no 'dir?name' or 'space in middle' files! 83% failure...

#try in 'raw' mode:
$ find . -type f -maxdepth 2 -name "1*" -print0 | while read -rd $'\0' file; do printf "%s\n" "$file"; done

./   3spacesthere/1
./   3spacesthere/1\
./dir?name/1
./dir?name/1\
./spaces in middle/1
./spaces in middle/1\

#mostly there, except:

$ find '   3spacesthere' 'dir?name' 'spaces in middle'  -type f -maxdepth 2 -name "1*" -print0 | while read -rd $'\0' file; do printf "%s\n" "$file"; done
3spacesthere/1
3spacesthere/1\
dir?name/1
dir?name/1\
spaces in middle/1
spaces in middle/1\
#No Leading spaces on '   3spacesthere' rendering the string unusable for a file name...

# Try with IFS set to ''
$ find  '   3spacesthere' 'dir?name' 'spaces in middle' -type f -maxdepth 2 -name "1*" -print0 | while IFS='' read -rd $'\0' file; do printf "%s\n" "$file"; done
   3spacesthere/1
   3spacesthere/1\
dir?name/1
dir?name/1\
spaces in middle/1
spaces in middle/1\
#OK....

#xargs works fine:
$ find '   3spacesthere' 'dir?name' 'spaces in middle' -type f -maxdepth 2 -name "1*" -print0 | xargs -0 printf "%s\n"
   3spacesthere/1
   3spacesthere/1\
dir?name/1
dir?name/1\
spaces in middle/1
spaces in middle/1\

The form of "for" loop with globbing is more robust than I have given it credit for. Consider:

$ for f in {'   3spacesthere','dir?name','spaces in middle'}/1*; do printf "%s\n" "$f"; done
   3spacesthere/1
   3spacesthere/1\
dir?name/1
dir?name/1\
spaces in middle/1
spaces in middle/1\
#Just what it is supposed to be the first time...

The PROBLEM one is:
$ for f in `find '   3spacesthere' 'dir?name' 'spaces in middle' -type f -maxdepth 2 -name "1*" -print0`; do printf "%s\n" "$f"; done
3spacesthere/1
3spacesthere/1\
dir?name/1
dir?name/1
spaces
in
middle/1
spaces
in
middle/1\
#disater...

I have been using 'while read -d' form of loop for a long time thinking it was the best around for this situation. I also have a personal preference for "while" instead of "for" if the looping quantity seems more ethereal. I have also had a mild suspicion of the 'for i in *' as being less robust than you have demonstrated it to be since I knew that ` produced disaster.

Thanks...