Pattern Matching Syntax

domsmith · July 13, 2010, 9:14pm

Hi,

I am trying to write a script to rename a batch of computer files.

The format of the files can appear in the following ways.

Title_Title2_Title3_ABCD0123_Title4.doc
Title title2 DEFG5678 Title3 Title4.doc
XYZA1234-Title.doc

The one constant that I am interested in the file is highlighted in bold. I want to be able capture those details which are always in the following format of 4 letters and 4 numbers then rename the file and move four letters and numbers to the end of the files title.

so the output I want would end up being

title_title2_title3_title4[ABCD1234].doc
title_title2_title3[DEFG5678].doc
title-[XYZA1234].doc

Thank you
For any suggestions you can provide.

Ygor · July 13, 2010, 10:52pm

Try...

ls *.doc | awk 'BEGIN{OFS=FS=".";q="\047"}
     match($1,/[A-Z][A-Z][A-Z][A-Z][0-9][0-9][0-9][0-9][-_ ]?/) {
          cmd = "mv " q $0 q " " q \
                substr($1,1,RSTART-1) \
                substr($1,RSTART+RLENGTH) \
                "[" substr($1,RSTART,8) "]" OFS $2 q
          print cmd
          #system (cmd)
     }'

Uncomment the system command if it does what you want.

Result from samples gives...

mv 'Title_Title2_Title3_ABCD0123_Title4.doc' 'Title_Title2_Title3_Title4[ABCD0123].doc'
mv 'Title title2 DEFG5678 Title3 Title4.doc' 'Title title2 Title3 Title4[DEFG5678].doc'
mv 'XYZA1234-Title.doc' 'Title[XYZA1234].doc'

fubaya · July 13, 2010, 11:53pm

tr '_' ' ' | egrep -w -o [A-Z]{4}[0-9]{4}

Almost one command but I had to use tr first to get rid of the underscore because it's seen as part of the word in the first line. Darn. What it does is grep the line for a word that consists of capital letters (4) and numbers (4). -o displays only the match and not the whole line.

test:

# echo 'Title_Title2_Title3_ABCD0123_Title4.doc
Title title2 DEFG5678 Title3 Title4.doc
XYZA1234-Title.doc
extra testing lines:
1234
abcd1234
abcd123
abc1234
1234abcd
1234ABCD' | tr '_' ' ' | egrep -w -o [A-Z]{4}[0-9]{4}
ABCD0123
DEFG5678
XYZA1234
#

It didn't grep "abcd1234" because it's not capital letters. If you need case insensitive, change the [A-Z] to [aA-zZ]

rdcwayx · July 14, 2010, 12:14am

#! /usr/bin/sh

ls *.doc|while read file
do
  new=$(echo $file |sed 's/\(.*\)\([A-Z]\{4\}[0-9]\{4\}\)\([ -_]\)\(.*\)\(\..*\)/\1\4\3[\2]\5/')
  echo "mv \"$file\" \"$new\""
done

mv "Title title2 DEFG5678 Title3 Title4.doc" "Title title2 Title3 Title4 [DEFG5678].doc"
mv "Title_Title2_Title3_ABCD0123_Title4.doc" "Title_Title2_Title3_Title4_[ABCD0123].doc"
mv "XYZA1234-Title.doc" "Title-[XYZA1234].doc"

---------- Post updated at 02:14 PM ---------- Previous update was at 01:59 PM ----------

ygor:

Try...

ls *.doc | awk 'BEGIN{OFS=FS=".";q="\047"}
   match($1,/[A-Z][A-Z][A-Z][A-Z][0-9][0-9][0-9][0-9][-_ ]?/) {
   cmd = "mv " q $0 q " " q \
   substr($1,1,RSTART-1) \
   substr($1,RSTART+RLENGTH) \
   "[" substr($1,RSTART,8) "]" OFS $2 q
   print cmd
   #system (cmd)
   }'

Uncomment the system command if it does what you want.

Result from samples gives...

mv 'Title_Title2_Title3_ABCD0123_Title4.doc' 'Title_Title2_Title3_Title4[ABCD0123].doc'
mv 'Title title2 DEFG5678 Title3 Title4.doc' 'Title title2 Title3 Title4[DEFG5678].doc'
mv 'XYZA1234-Title.doc' 'Title[XYZA1234].doc'

In one day, two posts to use the match function. Gr8

Another is

domsmith · July 14, 2010, 1:02am

Thanks guys
I tried Ygors code first and that worked a treat.

It only broke when it encountered a file that had multiple periods but this was something I didn't specify in my example and those were easy to manually fix.

kurumi · July 14, 2010, 2:19am

  #!/bin/bash
#bash 3+ 
shopt -s nocasematch 
for i in [a-z]*[0-9][0-9][0-9][0-9][-_.\ ]*.doc
do 
 [[ $i =~ '([a-z]{4}[0-9]{4})([_-\.\ ]+)' ]] 
 code=${BASH_REMATCH[0]}
 newfile=${i//$code/} 
 newfile=${newfile/.doc/[$code].doc} 
 mv "$i" "$newfile"
done