Pattern Matching Syntax

Hi,

I am trying to write a script to rename a batch of computer files.

The format of the files can appear in the following ways.

Title_Title2_Title3_ABCD0123_Title4.doc
Title title2 DEFG5678 Title3 Title4.doc
XYZA1234-Title.doc

The one constant that I am interested in the file is highlighted in bold. I want to be able capture those details which are always in the following format of 4 letters and 4 numbers then rename the file and move four letters and numbers to the end of the files title.

so the output I want would end up being

title_title2_title3_title4[ABCD1234].doc
title_title2_title3[DEFG5678].doc
title-[XYZA1234].doc

Thank you
For any suggestions you can provide.

Try...

ls *.doc | awk 'BEGIN{OFS=FS=".";q="\047"}
     match($1,/[A-Z][A-Z][A-Z][A-Z][0-9][0-9][0-9][0-9][-_ ]?/) {
          cmd = "mv " q $0 q " " q \
                substr($1,1,RSTART-1) \
                substr($1,RSTART+RLENGTH) \
                "[" substr($1,RSTART,8) "]" OFS $2 q
          print cmd
          #system (cmd)
     }'

Uncomment the system command if it does what you want.

Result from samples gives...

mv 'Title_Title2_Title3_ABCD0123_Title4.doc' 'Title_Title2_Title3_Title4[ABCD0123].doc'
mv 'Title title2 DEFG5678 Title3 Title4.doc' 'Title title2 Title3 Title4[DEFG5678].doc'
mv 'XYZA1234-Title.doc' 'Title[XYZA1234].doc'
tr '_' ' ' | egrep -w -o [A-Z]{4}[0-9]{4}

Almost one command but I had to use tr first to get rid of the underscore because it's seen as part of the word in the first line. Darn. What it does is grep the line for a word that consists of capital letters (4) and numbers (4). -o displays only the match and not the whole line.

test:

# echo 'Title_Title2_Title3_ABCD0123_Title4.doc
Title title2 DEFG5678 Title3 Title4.doc
XYZA1234-Title.doc
extra testing lines:
1234
abcd1234
abcd123
abc1234
1234abcd
1234ABCD' | tr '_' ' ' | egrep -w -o [A-Z]{4}[0-9]{4}
ABCD0123
DEFG5678
XYZA1234
# 

It didn't grep "abcd1234" because it's not capital letters. If you need case insensitive, change the [A-Z] to [aA-zZ]

#! /usr/bin/sh

ls *.doc|while read file
do
  new=$(echo $file |sed 's/\(.*\)\([A-Z]\{4\}[0-9]\{4\}\)\([ -_]\)\(.*\)\(\..*\)/\1\4\3[\2]\5/')
  echo "mv \"$file\" \"$new\""
done 
mv "Title title2 DEFG5678 Title3 Title4.doc" "Title title2 Title3 Title4 [DEFG5678].doc"
mv "Title_Title2_Title3_ABCD0123_Title4.doc" "Title_Title2_Title3_Title4_[ABCD0123].doc"
mv "XYZA1234-Title.doc" "Title-[XYZA1234].doc"

---------- Post updated at 02:14 PM ---------- Previous update was at 01:59 PM ----------

In one day, two posts to use the match function. Gr8

Another is

Thanks guys
I tried Ygors code first and that worked a treat.

It only broke when it encountered a file that had multiple periods but this was something I didn't specify in my example and those were easy to manually fix.

  #!/bin/bash
#bash 3+ 
shopt -s nocasematch 
for i in [a-z]*[0-9][0-9][0-9][0-9][-_.\ ]*.doc
do 
 [[ $i =~ '([a-z]{4}[0-9]{4})([_-\.\ ]+)' ]] 
 code=${BASH_REMATCH[0]}
 newfile=${i//$code/} 
 newfile=${newfile/.doc/[$code].doc} 
 mv "$i" "$newfile"
done