To write a shell script which groups files with certain pattern, create a tar and zip

Hi Guru's,

I have to write a shell script which groups file names based upon the certain matching string pattern, then creates the Tar file for that particular group of files and then zips the Tar file created for the respective group of files.

For example, In the given directory these files of different patterns are present:

Pattern 1 (File name containing the numbers only)
---------
9319813.xls, 36713.xls, 5467.xls, 978.xls, 99813.xls, ... so on

Pattern 2
---------
1790006PosAc.doc, 34556PosAc.doc, 279226PosAc.doc, ... so on

Pattern 3
---------
NotFound_076957.xls, NotFound_2367957.xls, NotFound_7957.xls, ... so on

Pattern 4
---------
Total_3457947.rtf, Total_1347956.rtf, Total_0767957.rtf .... so on

[LEFT]So out of all the files present in the directory, the script should group the files based upon a particular pattern(i.e. pattern1, pattern 2, pattern 3, pattern4) and create a Tar file for a particular pattern and then zip the Tar file created.
[/LEFT]

Hence output of the script should be these 4 zipped tar files:
1)<some name>.xls.tar.gz
2) <some name>PosAc.doc.tar.gz
3) NotFound_<some name>.xls.tar.gz
4) Total_<some name>.rtf.tar.gz

My development environment is SunOS 5.9

Please help me out, as this is very important for me.

Thanks.

Something like -

find . -name "[0-9]*.xls" | xargs tar cvf pattern1.tar; gzip pattern1.tar

for the first pattern, assuming that your files are in current directory.
And then all you have to do is figure out the regular expressions for the remaining patterns. Which is not that difficult if you study this example closely.

tyler_durden

Hi Tyler,

Thanks for your reply.

The problem with this approach
(find . -name "[0-9]*.xls" | xargs tar cvf pattern1.tar; gzip pattern1.tar)

is, It will also select those file names which contain any alphabets after the numbers. eg. it will also select those file names such as 32323TotalBalance.xls, 5656NegativeBalance.xls etc.

I need to select only those file names through the script whose name contains only numbers such as 43434.xls, 6767.xls etc.

Hoping for the reply.

Thanks.

With shell globbing it might be difficult to get what you need; use find and a finer filtering tool, egrep for example:

find . -name '[0-9]*[0-9].xls' | egrep  '/[0-9]+\.xls$' | xargs ...