Manipulating Filenames

Hi Folks,
I'm looking for some ideas on how to change some file names. I'm pretty sure I need to use sed or awk but they still escape me. The files I have are like:

VOD0615 NEW Blades R77307.pdf or
VOD0615_NEW_Blades_R77307.pdf

and what I want after processing is:

R77307 NEW Blades.pdf

Essentially taking the first code off and moving the last code to the front. The description in the middle can be any amount of characters. The codes are usually seperated by a space or an underscore. The first code is always "VOD" then 4 numbers. The Last code is a single alpha with 5 numbers.

Any help appreciated.

Assuming you pipe the output of ls to this awk, it will generate the move commands to rename the files. It's bare bones and doesn't do any error checking, so validate the commands it generates before running them.

ls VOD[0-9]*.pdf | awk '
    NF > 1 {        # assume spaces
        of = $0;
        split( $NF, a, "." );
        n = split( $0, b, " " );
        b[n] = a[1];
        sep = " ";
    }

    NF == 1 {       # assume underbars
        of = $1;
        split( $1, a, "." );
        n = split( a[1], b, "_" );
        sep = "_";
    }

    {
        printf( "mv \"%s\"  \"%s", of, b[n] );
        for( i = 2; i < n; i++ )
            printf( "%s%s", sep, b );
        printf( ".%s\"\n", a[2] );
    }

' #| ksh 

Remove the comment to pipe directly into kshell to move, but validate things first.

1 Like
for i in VOD*.*pdf; do mv "$i" "$(echo "$i"|sed 's/[^_ ]*[_ ]*\([^_ ]*\)[_ ]*\([^_ ]*\)[_ ]*\([^_ ]*\)\(\..*\)/\3 \1 \2\4/')"; done

Keep in mind,if you have both "VOD0615 NEW Blades R77307.pdf" and "VOD0615_NEW_Blades_R77307.pdf" files in the same dir,this code try to overwrite to last one.

regards
ygemici

Thanks heaps agama that has worked a treat. I can barley understand it but I'll work on it a bit more. Quick question, why does it need to be piped to the kshell?

Hi ygemici,
Thanks for your help. I tried this but it only works on files that have two words between the codes so running on files with more or less messes things up. Examples below:
VOD0001 Test M00000.pdf
VOD0001 Test three words M00003.pdf
VOD0001 Test two M00001.pdf
VOD0615 NEW Blades R77307.pdf

ends up with:
Test M00000.pdf
words M00003 three.pdf
M00001 Test two.pdf
R77307 NEW Blades.pdf

Cheers

Not sure if your question was literally why is must be piped to a shell or if you meant must it be kshell -- would bash work. So, here are both answers:

The awk is generating the move commands, but needs kshell to execute them. It could be piped to bash, I just prefer Kshell so that's the way I tested it.

I'll add some comments to the code and maybe that will help you understand it a bit better.

---------- Post updated at 21:57 ---------- Previous update was at 21:45 ----------

Some additional info:

ls VOD[0-9]*.pdf | awk '
    NF > 1 {        # input line will have more than one field if filename has spaces
        of = $0;                    # save the original filename
        split( $NF, a, "." );       # split the last part of filename into array a using dot as seperator
        n = split( $0, b, " " );    # easy way to get all of the fields into an array
                                    # we put the fields into an array so we can treat both cases identically later
        b[n] = a[1];                # replace last field xxxx.yyyy with just xxxx
        sep = " ";                  # seperator to use when building the move to file
    }

    NF == 1 {                       # if just one field, asssume a filename without spaces
        of = $1;                    # save the original name
        split( $1, a, "." );        # split the name (xxx_yyy_zzzz.eee) on the dot xxx_yyy_zzzz goes into a[1] eee into a[2]
        n = split( a[1], b, "_" );  # split the leading lead part into array b using _ as separator
        sep = "_";                  # seperator to use when building the move to file
    }

      {                               # this block executed for all files; assumes array b has the filename components and n is the size of b
        printf( "mv \"%s\"  \"%s", of, b[n] );      # print command (mv) original name and  the last component of the name
          for( i = 2; i < n; i++ )   # starting with second component in the name print up to, but not including the last component
            printf( "%s%s", sep, b );
        printf( ".%s\"\n", a[2] );      # add the extension (.xxx) and a newline
    }
' #| ksh

Hi again Agama,
Wow, thanks heaps for all the comments. I've been advised that there some extra variables for the file names so I'll use those comments to try and modify myself. I'll get back to you if I need a hand if that's OK.

Thanks heaps