Question about reading and parsing text file

polomora · July 4, 2010, 2:49pm

Hello,

I'm just getting started with BASH programming. I would like to write a script to solve a file renaming problem I have. I received a directory containing a collection (>2000) of files whose names are in DOS 8.3 format, and woild like to rename the filenames to a longer and more descriptive text names. The directory also contains a text file that serves as a translation "table of contents" for these files. The text file consists of a list of entries, one per line, listing each file by its filename in 8.3 format followed by the longer descriptive text:
<DOS 8.3 filename 1> <File descriptive name 1>
<DOS 8.3 filename 2> <File descriptive name 2>
<DOS 8.3 filename 3> <File descriptive name 3>
....

My question is: Does anyone have a script that can read the contents of this text file, parse it, and use the parsed fields to rename the other filenames in the directory from the DOS 8.3 name to the descriptive text name?

Many thanks,
Paul

bartus11 · July 4, 2010, 2:59pm

Try:

awk '{system("mv "$1" "$2)}' filename_toc

pseudocoder · July 4, 2010, 3:29pm

If the description contains whitespaces, you'll want to try this:

while read line; do
mv -v "$(echo $line | cut -d' ' -f1)" "$(echo $line | cut -d' ' -f2-)"
done <file_desc

alister · July 4, 2010, 4:08pm

A much simpler and more efficient way of accomplishing the same thing:

while read f1 f2; do
    mv "$f1" "$f2"
done < file_desc

Both versions assume that filenames do not contain whitespace or backslashes; if either assumption is invalid, the code will break.

Regards,
Alister

pseudocoder · July 4, 2010, 4:17pm

Interesting. I did not know that f2 stands for rest of the line...

Scrutinizer · July 4, 2010, 4:18pm

Actually Alister, since the 8.3 filenames do not contain spaces, f2 - being the last variable in the read statement - always picks up the remainder of the line including whitespace, so in fact your code should work for whitespace in the second part IMO.

S.

alister · July 4, 2010, 4:37pm

Hi, Scrutinizer:

You are absolutely correct, with the exception of trailing whitespace which would be lost. In that case, pseudocoder's `cut -d' ' -f2-` would behave correctly (assuming that the whitespace is part of the name). So the solutions aren't exactly interchangeable.

It goes without saying, though, that filenames with trailing spaces are extremely uncommon (usually the sign of a script or input error), but I considered it for the sake of thoroughness. Personally, I believe that anyone who uses leading/trailing whitespace (or newlines anywhere) in a filename deserves whatever administrative misery befalls them.

Cheers,
Alister

Scrutinizer · July 4, 2010, 5:08pm

Indeed for trailing whitespace we would have to do something like this:

while IFS= read -r line
do     
  mv "${line%%[[:space:]]*}" "${line#*[[:space:]]}"
done < file_desc

(But I think it is best to leave that out to protect an unsuspecting admin from mystery whitespace.. It is far more likely anyway, that a hand-edited input file contains unintended tabs or spaces at the end of the line. )

---------- Post updated at 23:08 ---------- Previous update was at 22:58 ----------

BTW. the cut statement would compress spaces and leave out trailing space since -f2- combines the remaining fields with one space in between. This behaviour is probably not as intended, so I think your solution really is best suited for this application.

polomora · July 6, 2010, 4:08am

Many thanks all.
Problem solved.