shell script preserving last 2 versions of files

synergy_texas · October 28, 2008, 6:43pm

I need some help with the logic and syntax for a shell script (ksh) that will search a directory and look for similar files and save only the last two versions. The version number is in the file name. However, the files are of varying name lengths and may have 1 or many files, with no limit to the number of files. I am not sure that using the find command for date/timestamp is a good idea because these are adhoc files that get created.

For example:
Directory may have files like below

apps_V01.xml
betarelease_V01.xml
betarelease_V02.xml
betarelease_V03.xml
test_V01.xml
test_V02.xml
test_V03.xml
test_V04.xml
testing_V01.xml
testing_V(..).xml Representing all numbers 2 -99
testing_V100.xml

The result should be:
apps_V01.xml
betarelease_V02.xml
betarelease_V03.xml
test_V03.xml
test_V04.xml
testing_V99.xml
testing_V100.xml

I thought about putting the listing into a text file and then substringing the names using awk, but don't know how I would handle the number of similar files. My thought is to output the listing to a file, read the file until it reaches a new file creating an array of files and then save the last two in the array. Then read for the next set of files. But again, not sure how to do that. A problem also occurs when I only have 1 version of a file. I welcome any sed, awk or ksh commands. I don't know enough about Perl or any other language in order to do this. Some help would be greatly appreciated. I have searched more than 300 postings and not coming up with anything fairly close to what I need to accomplish.

joeyg · October 28, 2008, 7:28pm

Do you want to delete then the extra versions?
For example, somehow you want to execute
>rm betarelease_V01.xml
as a file out of condition?

synergy_texas · October 28, 2008, 7:35pm

That is correct. I want to keep only the last two versions of each file and delete all others. My sample output shows the results I would get if the script works correctly. That part would not be hard. If I could get all the other files in a text file, then I could run a "for loop" that would delete all the files from the directory that exist in the text. That I can do.

for i in `cat removefiles.txt`
do
rm temp/$i
done

Lakris · October 28, 2008, 9:03pm

Hi, I thought about it, and even though I may do it the hard way, I think it gets the job done... and to be on the safe side, make working copies, this one moves the wanted files into their own directory.

I tried it and it works on my machine

#!/bin/bash
#Assuming Your current working directory is ok to work in and the files
#are located in the directory verfiles (create working copies there)

#First sort, if the naming is consistent, that should be a good start
ls -1 verfiles | sort > verfiles.srt

#split the filenames on each "firstname" so they will have their own file
prefix=""
while read filename; do
#Determine the "prefix"
curprefix=$(expr match "$filename" '\(^[a-Z]*_\)')
#Has it been read before?
if [ x$curprefix != x$prefix ] ; then
	#We have a new "firstname"
	prefix=$curprefix
fi
echo $filename >> $prefix.lst
done < verfiles.srt
#cleanup
rm verfiles.srt

#Where to keep the "last two" of each...
mkdir saved-verfiles

#Now for each .lst file, manipulate it a little for easier numerical sorting
for x in *.lst; do
#And this next bit is VERY lazy of me, but as I said, 
#I ASSUME that the naming is consistent ;)
myarr=($(tr V " " < $x | tr . " " |sort -n -k2 | tail -2))

mv verfiles/${myarr[0]}V${myarr[1]}.${myarr[2]} saved-verfiles
mv verfiles/${myarr[3]}V${myarr[4]}.${myarr[5]} saved-verfiles
done

Hope it helps...

/Lakris

joeyg · October 28, 2008, 9:36pm

The following are the files I worked with:

> cat file100
apps_V01.xml
betarelease_V01.xml
betarelease_V02.xml
betarelease_V03.xml
test_V01.xml
test_V02.xml
test_V03.xml
test_V04.xml
testing_V01.xml
testing_V96.xml
testing_V97.xml
testing_V98.xml
testing_V99.xml
testing_V100.xml

And here is the script:
(could probably be better, but it is getting late!)

> cat keep_two
#! /usr/bin/bash

# to force/create necessary test files
while read filename; do touch $filename; done<file100

cut -d"_" -f1 file100 | sort | uniq -c >file101
cat file100 | cut -d"." -f1 | awk -F"_" '{print $2" "$1}' | cut -c2- >file102
rm file103

while read filecnt filename
   do
   if [ $filecnt -gt 2 ]
     then
     filekill=$((filecnt-2))
     grep "${filename}" file102 | head -"$filekill" >>file103
   fi
done <file101

while read filenum filename
   do
   rm ${filename}_V${filenum}.*
done <file103

The script creates the files in that early step based on the contents of file100. That way, I could test the delete capability.

summer_cherry · October 29, 2008, 5:54am

Hi,

Blow code will print out the file name which should not be deleted, in other words the last two versions for each category.

I am stop here to find any easy way, you may use your smart to polish [echo $file] to address your issue.

for file in `ls *.xml | sort -r | nawk -F"[V|.]" '{if(_[$1]<2){print $0;_[$1]++}}' `
do
	echo $file
done

synergy_texas · October 29, 2008, 12:58pm

Thanks for the responses. All were great ideas. Summer_cherry had the most compact code and easiest to manipulate for an inexperienced shell programmer like me. However, I did like all your responses and will try to learn from each of your approaches. Thank you all very much.