Renaming files

Dolph · October 21, 2010, 10:41am

Hello

Please can someone help.
We are being sent a directiory full of images.
The names of these images can vary in length and have spaces in them.
example:
nn 999999 nnnnn nnnn nnnn nn nnn nnn nn nnnn.pdf
what we want to do is rename all the images. Take the first two fields nn 999999 and replace the space with and underscore.
nn_999999

DGPickett · October 21, 2010, 12:29pm

This is an overkill version, handling every usually invisible byte for the low 128 except NULL, just for the educational value. A file can be named almost anything, but it can be referenced as ./'name' even if it start with -, unless it contains ', and that can be escaped "'".

Single quoting is most literal, has no metacharacters but itself, is faster for the shell to process, and should always be the first choice.

Unfortuantely, single and double quotes concatenated are hard to read in this font. In a single quoted string like my inline single quoted string as sed script, a single quote is single-double-single-double-single, single to get out of single quoting, add a single in double quotes, and single to get back into single quoting.

This code makes a script for you to review before running in the target dir. The script makes a new dir of files to review before moving them in place of the originals. Making a new dir means there is no risk until you replace the files manually. If duplicate names are generated, you may want to tinker with the list of acceptable file name characters, or just hack the output script and rerun.

(
echo '#!/usr/bin/ksh
rm -rf /tmp/rename_out.$LOGNAME
mkdir /tmp/rename_out.$LOGNAME
'

ls *.pdf| sed '
  s/'"'"'/"&"/g
 ' | sed '
  :loop
  /\.pdf$/!{
    N
    s/\n/'"'"'?'"'"'/
    b loop
   }
  p
  s/"'"'"'"/_/g
  s/'"'"'?'"'"'/_/g
  s/[^A-Za-z0-9_.]/_/g
  s/___*/_/g
 ' | sed '
  N
  s/\(.*\)\n/cp -p '.\/\1' \/tmp\/rename_out.'$LOGNAME'/
 ' 
 chmod u+x /tmp/rename_out_$LOGNAME.sh
) >/tmp/rename_out_$LOGNAME.sh

Narrative: Using parentheses to concatenate output as follows into a script in /tmp with your id in the name:
echo commands to build a ksh script that will first:
Destroy any old tmp output dir,
Make empty new tmp output dir.
Generate cp commands to make files with old content and new names:
list all pdf files in the current dir and pass (one file per line unless line feed in file name) to sed #1, which
wraps any ' in ", and then pipe carries that to sed #2, which
if no .pdf suffix on line, loops picking up all of any file name with linefeeds in it, and in place of the embedded linefeed puts the one character glob ? outside anticipated single quotes,
spits out this as the odd (source name) line to sed #3,
reworks that line to build the target name:
replace the embedded single quotes with underscode,
replace the embedded linefeeds with underscore,
replace every character not a letter, number, underscore or dot to be one underscore,
replace multiple underscores with one,
and passes it on to sed #3 as even lines,
sed #3 gets the odd-even pair of lines in the buffer,
change the two lines to cp -p ./'first_line' /tmp/outdir/second_line
changes the generated script to user executable.

methyl · October 21, 2010, 1:34pm

Another approach.
Assuming that the final filenames will be "nn_999999.pdf" rather than something else.

Check thoroughly before running on live data. Remove "echo" on the "mv" line when sure.
Note that a script of this nature is a "one off" and cannot be run a second time on the same directory.

ls -1 *\.pdf | while read old_filename
do
        part1=`echo "${old_filename}"|awk '{print $1}'`
        part2=`echo "${old_filename}"|awk '{print $2}'`
        new_filename="${part1}_${part2}.pdf"
        # Remove echo when tested
        if [ ! -f "${new_filename}" ]
        then
                if [ -f "${old_filename}" ]
                then
                       echo mv "${old_filename}" "${new_filename}"
                fi
        else
                echo "Error: Duplicate new filename: ${old_filename}"
        fi
done

DGPickett · October 21, 2010, 1:55pm

Well, mv is always a bit riskier than cp, as you have no undo, but you could make that a cp script so easy. This is a simple spaces to underscores shell-ish solution. With cp, you do not have to worry there are no spaces in some names:

ls *.pdf|while [ 1 ]
do
 zf=$(line)
 zfn=$( echo "$zf" |tr ' ' '_' )
 cp -p "$zf" "$newdir/$zfn"
done

ctsgnb · October 21, 2010, 1:57pm

ls -1 *.pdf | while read a
do
b=$(echo "$a" | sed 's|  *| |g;s|^\([^ ][^ ]*\) \([^ ][^ ]*\) .*|\1_\2\.pdf|')
eval echo 'mv "$a" "$b"' >>rename.sh
done

check rename.sh and if the content is OK , run it

sh rename.sh

As stated by DGnitPick, you can change mv by cp in my code for more safty

DGPickett · October 21, 2010, 2:07pm

Does awk consider every space char is a field sep, or does this implicity crush many spaces to one _, and only one set of spaces per line?

methyl:

Another approach.
Assuming that the final filenames will be "nn_999999.pdf" rather than something else.

Check thoroughly before running on live data. Remove "echo" on the "mv" line when sure.
Note that a script of this nature is a "one off" and cannot be run a second time on the same directory.
ls -1 *\.pdf | while read old_filename
do
   part1=`echo "${old_filename}"|awk '{print $1}'`
   part2=`echo "${old_filename}"|awk '{print $2}'`
   new_filename="${part1}_${part2}.pdf"
   # Remove echo when tested
   if [ ! -f "${new_filename}" ]
   then
   if [ -f "${old_filename}" ]
   then
   echo mv "${old_filename}" "${new_filename}"
   fi
   else
   echo "Error: Duplicate new filename: ${old_filename}"
   fi
done

---------- Post updated at 01:59 PM ---------- Previous update was at 01:57 PM ----------

I hesitated to use read, and used line, as read can remove whitespace.

ctsgnb:

ls * | while read a
do
b=$(echo "$a" | sed 's|^$[^ ][^ ]*$ $[^ ][^ ]*$ .*|\1_\2\.pdf|')
eval echo 'mv "$a" "$b"' >rename.sh
done
check rename.sh and if the content is OK , run it

sh rename.sh

---------- Post updated at 02:07 PM ---------- Previous update was at 01:59 PM ----------

Eyeballing a mv file might miss some. Nobody suggested ln and an output dir on the same mount in place of cp. After all, mv within a device is ln (link()) + rm (unlink()). If you like newdir, you can rename the dirs and keep the original in case there are concerns.

rm -rf ../newdir
mkdir ../newdir
ls *.pdf|while [ 1 ]
do
 zf=$(line)
 zfn=$( echo "$zf" |tr ' ' '_' )
 ln "$zf" ../newdir/"$zfn"
done

This is becoming a reference on the subject of renaming files.

ctsgnb · October 21, 2010, 2:15pm

I used double quote in my code

# read a
titi toto tutu          tata    tutut
# echo $a
titi toto tutu tata tutut
# echo "$a"
titi toto tutu          tata    tutut
#

---------- Post updated at 08:13 PM ---------- Previous update was at 08:11 PM ----------

Note that i meanwhile changed the code you embbeded

ls -1 *.pdf | while read a
do
b=$(echo "$a" | sed 's|  *| |g;s|^\([^ ][^ ]*\) \([^ ][^ ]*\) .*|\1_\2\.pdf|')
eval echo 'mv "$a" "$b"' >>rename.sh
done

---------- Post updated at 08:15 PM ---------- Previous update was at 08:13 PM ----------

i didn't know about that $(line) stuff : could you tell me more ?

while [ 1 ]   <--- does it read only line 1 or does it goes until it cannot read anymore ?
do
read $(line)

done

Scrutinizer · October 21, 2010, 3:06pm

# Assuming there are no other kind of pdf files in that directory.

for i in *.pdf; do
  t=${i% * * * * * * * *.pdf}
  mv "$i" "${t% *}_${t#* }.pdf"
done

DGPickett · October 21, 2010, 3:17pm

Interesting, gotta try that:

$ t='1 2 3 4 5 6 7 8 9 0'
$ echo ${t% * * *}
1 2 3 4 5 6 7
$

looks destructive!

Comparing read and $(line); line is a c program that does read(buf,0,1) until it finds a line feed, and then prints the line. It does not take more off the input stream than the one line, unlike read which uses a FILE* gets()/fgets() that reads ahead. Then, read trims white space when assigning the fields to one or more variables. However, $(line) costs a fork and exec, while read is ksh builtin and buffered. Apparently, it only affects edge whitespace:

$ echo ' a  b  c '|read z;echo "X${z}X"
Xa  b  cX
$

Scrutinizer · October 21, 2010, 3:27pm

It's just parameter expansion with lazy pattern matching...

ctsgnb · October 21, 2010, 3:27pm

@Scru1Linizer

Doesn't you code assume there is a fix number of single space in the name of the initial file to move ?

Scrutinizer · October 21, 2010, 3:30pm

Ow, you're right, I thought is was a fixed pattern.

DGPickett · October 21, 2010, 3:32pm

Yes, but once you lose parts of i as you create t, how do you get them back?

t="$i"
while [ "$t" != "${t# }" ]
do
 t=${t% *}_${t##* }
done

tr is certainly easier.

Scrutinizer · October 21, 2010, 3:34pm

Retry, let's take it from the other side:

Still assuming there are no other kind of pdf files in that directory.

for i in *.pdf; do
  t=${i%%${i#* * }}; t=${t% }
  mv "$i" "${t% *}_${t#* }.pdf"
done

DGPickett · October 21, 2010, 3:40pm

scrutinizer:

Retry, let's take it from the other side:

Still assuming there are no other kind of pdf files in that directory.
for i in *.pdf; do
  t=${i%%${i#* * }}; t=${t% }
  mv "$i" "${t% *}_${t#* }.pdf"
done

The characteristics of symmetry are?

$ t='1 2 3 4 5 6 7 8 9 0'              
$ echo ${t#* * * }                     
4 5 6 7 8 9 0
$

You lose either way!

Now, nesting, that is interesting, too! Gotta try!

$ t='1 2 3 4 5 6 7 8 9 0'
$ echo ${t%%${t#* * }}   
1 2
$

Also very destructive! I'd swear off more than 1 * per, when trying to preserve data, if I were you!

Scrutinizer · October 21, 2010, 3:40pm

You don't lose $i, it is there all the time. Anyway it is always a good idea to first do a dry run with echo "mv...

DGPickett · October 21, 2010, 3:46pm

You cannot mv twice, the original is mv's away, so no dry runs. Use your UNIX undo.

Scrutinizer · October 21, 2010, 4:03pm

It is just a rename. You can see if it looks ok with an echo / printf statement. Maybe copy a couple of files to a tmp directory first and do a test run.

If you use cp, better use cp -p, there might be sparse file problems, there may be matters of disk space, you have to create an extra directory otherwise all the files are mixed up in the same directory plus you then have to be careful about deleting the originals, plus cp is slow, mv is fast.

In the end it is just a matter of preference and confidence in the method used..

ctsgnb · October 21, 2010, 4:06pm

You assume it happen on the same file system..

Scrutinizer · October 21, 2010, 4:08pm

I would most certainly not mv to another file system.