file name transformation

vrms · May 22, 2008, 6:55am

I've got a multitude of text data files that carry exactly the same kind of data. Unfortunately some of them have a different filename format

some are: 'category'_'month'-'year'_act.txt

an example being: daf_Apr-1961_act.txt

and some are: 'category'_ 'year'-'month'_act.txt

an example being: daf_1961-04_act.txt

any suggestions for transforming the former to the latter?

nua7 · May 22, 2008, 7:10am

Shell script is the only option here.Try using case.Following can be tried..

case $name in
          01 ) $month=Jan
              ;;
          02 ) $month=Feb
              ;;
             ....... so on
      esac
  done

vrms · May 22, 2008, 7:20am

Thanks Nua7

What about rearranging the sequence of the file title = swapping the month and the year around. I was more concerned if that was possible.

nua7 · May 22, 2008, 7:22am

Yes that should be possible, Split the file name in variables as $category, $date, $month and rearrange as you wish..

Thanks!
nua7

Franklin52 · May 22, 2008, 7:36am

Something like this?

$ echo "daf_1961-04_act.txt"|sed 's/\(.*_\)\(.*\)-\(.*\)\(_.*\)/\1\3-\2\4/'
$ daf_04-1961_act.txt


$ echo "daf_Apr-1961_act.txt"|sed 's/\(.*_\)\(.*\)-\(.*\)\(_.*\)/\1\3-\2\4/'
$ daf_1961-Apr_act.txt

Regards

nua7 · May 22, 2008, 7:39am

This is Fantastic Franklin52..!!

vrms , that should work..

vrms · May 22, 2008, 8:07am

franklin52:

Something like this?

$ echo "daf_1961-04_act.txt"|sed 's/\(.*_\)\(.*\)-\(.*\)\(_.*\)/\1\3-\2\4/'
$ daf_04-1961_act.txt


$ echo "daf_Apr-1961_act.txt"|sed 's/\(.*_\)\(.*\)-\(.*\)\(_.*\)/\1\3-\2\4/'
$ daf_1961-Apr_act.txt

Regards

Thanks guys this is great stuff!!

However, this might be me being very dumb, but is there a way of 'saving' this new file name to the original. And could this then be automated to do more than one file at a time. Sorry I'm a bit new to this.

Franklin52 · May 22, 2008, 1:42pm

Use a loop, for instance:

for file in *.txt
do
  sed 's/\(.*_\)\(.*\)-\(.*\)\(_.*\)/\1\3-\2\4/' "$file" > "$file"_new
  mv "$file"_new "$file"
done

Regards

vrms · May 23, 2008, 6:25am

Hi guys the above script looks perfect (even to my untrained eye)

and the previous one

$ echo "daf_1961-04_act.txt"|sed 's/$.*_$$.*$-$.*$$_.*$/\1\3-\2\4/'
$ daf_04-1961_act.txt

has shown me that the sed script works

however when I run it in a loop

I see the new files being created (They are quire big so they take a while)

However the file names produced as a result of the loop aren't changed (they come out exactly as they went in). It's really making me scratch my head. I think the problem's 99% sorted but somethings' wrong.

Thanks

rana_d · May 23, 2008, 6:36am

remove the move (mv) command and you will get the desired result.

vrms · May 23, 2008, 7:05am

I'm afraid that hasn't had the desired effect rana_d

But thanks for the suggestion

vrms

Franklin52 · May 23, 2008, 8:04am

I'm sorry for the misunderstanding, this should give the command to move the files:

echo "daf_1961-04_act.txt"|sed 's/\(.*_\)\(.*\)-\(.*\)\(_.*\)/mv & \1\3-\2\4/'

Try it in a loop to see if the you get the desired output to mv the selected files:

for file in *.txt
do
  sed 's/\(.*_\)\(.*\)-\(.*\)\(_.*\)/mv & \1\3-\2\4/' "$file"
done

If the output is correct you can pipe the output of the sed command to sh to mv the files in the loop as follow:

for file in *.txt
do
  sed 's/\(.*_\)\(.*\)-\(.*\)\(_.*\)/mv & \1\3-\2\4/' "$file" | sh
done

Regards

vrms · May 23, 2008, 10:22am

When I run it sends the whole content of the files in the directory to the screen

[quote]
If the output is correct you can pipe the output of the sed command to sh to mv the files in the loop as follow:

for file in *.txt
do
  sed 's/\(.*_\)\(.*\)-\(.*\)\(_.*\)/mv & \1\3-\2\4/' "$file" | sh
done

Regards

[quote]

When I run it, this comes up and the filenames are unchanged

sh: llllll: not found
sh: l: not found
sh: lllllllllllllllllll: not found
sh: rrrrrrrrrrrrrrrr: not found
sh: rrrrrrrrrrrrrrrrr: not found
sh: rrrrrrrrrrrrrrrrr: not found
sh: rrrrrrrrrrrrrr: not found

Where llllllllllllllll and rrrrrrrrrrrrrrrrrr etc are the contents of some dummy files I'm using to verify the scripts

So I'm afraid something is not quite right again

Thanks

vrms

Franklin52 · May 23, 2008, 10:52am

I was assuming that you have only .txt files with names in the same format.
To select filenames with a format like "daf_1961-04_act.txt" you can do something like:

for file in `ls | grep ".*_....-.._.*"`
do
  sed 's/\(.*_\)\(.*\)-\(.*\)\(_.*\)/mv & \1\3-\2\4/' "$file"
done

Regards

vrms · May 23, 2008, 11:06am

When I do the above on directory containing :

daf_2001-03_act.txt
daf_2001-04_act.txt

The content of the files are put on the screen as before.
When I do an ls I get the following files

daf_2001-03_act.txt daf_2001-03_act.txt~ daf_2001-04_act.txt daf_2001-04_act.txt~

As can see there are two new files with a ~ following them. But the filenames are the same.

Hmmm. This is a tough cookie isn't it!

Franklin52 · May 23, 2008, 3:49pm

Those files look like temporary files created by a program.
If you want to exclude those files you can extend the grep command e.g.:

for file in `ls | grep ".*_....-.._.*[^~]$"`
do
  sed 's/\(.*_\)\(.*\)-\(.*\)\(_.*\)/mv & \1\3-\2\4/' "$file"
done

Regards

vrms · May 27, 2008, 8:49am

Firstly thanks to everyone who posted on this thread.
I've finally got it working correctly!!

The code I used is as follows:

for file in $(ls -1)
do
  newfile="$(echo $file | sed 's/\(.*_\)\(.*\)-\(.*\)\(_.*\)/\1\3-\2\4/')"
  mv $file $newfile
done