Keep only last 10 backup files for multiple equipment files UNIX

sharong · November 17, 2024, 9:37am

i have directory for routers backup files from different dates multiple date\ip-address files
how do i keep only last 10 for each router IP address for example only 3 backup for each IP

CPE-444_10.10.30.30_08-12-24.txt
CPE-333_10.10.20.20_08-12-24.txt
CPE-222_10.10.10.10_08-12-24.txt
CPE-444_10.10.30.30_08-11-24.txt
CPE-333_10.10.20.20_08-11-24.txt
CPE-222_10.10.10.10_08-11-24.txt
CPE-444_10.10.30.30_08-10-24.txt
CPE-333_10.10.20.20_08-10-24.txt
CPE-222_10.10.10.10_08-10-24.txt

i need only last 2 newest from each ip-address

CPE-222_10.10.10.10_08-11-24.txt
CPE-222_10.10.10.10_08-12-24.txt
CPE-333_10.10.20.20_08-11-24.txt
-CPE-333_10.10.20.20_08-12-24.txt
CPE-444_10.10.30.30_08-11-24.txt
CPE-444_10.10.30.30_08-12-24.txt

itry to live only the good configuration (files that have more then 3000 b)

find . -type f -name "*.txt" -size -3000c -delete

im expecting to live only last x good configuration seperate by _ so i get the ip adress

ls -lt | awk -F_ '!seen[$2]++'

thanks

hicksd8 · November 17, 2024, 12:01pm

Different people will have different solutions for housekeeping jobs like this.

If you are generating backup files on a regular basis (like one per week for example) my approach would be to have a script that calculates a timestamp for n days ago and then use 'touch' to amend the timestamp on a timestamp file that you keep in that directory. Then use 'find' with a '!-newer ' switch to find and delete all files older than the calculated timestamp.

This way it doesn't matter how many router backups you are trying to rotate and saves having to manipulate specific filenames/dates.

Other users on this site may well have other ideas.

MadeInGermany · November 17, 2024, 12:53pm

When there are daily backups then a daily deletion job could be

find . -type f -name "*.txt" -size -3000c -mtime +10

or

find . -type f -name "*.txt" \( -size -3000c -o -mtime +10 \)

One is AND and one is OR condition. Both count the days from today.
(Append -delete to really delete them.)

If you want to delete the files regardless of the current date, then pick the dates from the file names

ls -r *.txt | awk -F_ '++cnt[$2] > 10' | xargs -d'\n' printf "%s\n"

(Replace the printf command by an rm to really delete them.)
But awk cannot get the file sizes. You can run your find command in addition; this is an OR condition.
Another way, for an AND condition, is ls -s and a filtering while loop:

ls -sr *.txt | awk -F_ '++cnt[$2] > 10' | while read -r blks fn; do [ $blks -lt 3 ] && echo rm "$fn"; done

(Remove the echo to really delete them.)

Rusty · November 17, 2024, 3:41pm

man logrotate might be useful.

sharong · November 18, 2024, 11:48am

hello
the backup files are creating on regular basis (like one per week)
i have 5K devises to backup some times i cannot reach them that is the reason that i am deleting files small them 3000b
i need to keep last x files from each IP
thanks

sharong · November 18, 2024, 11:59am

hello
the backup files are creating on regular basis (like one per week)
i have 5K devises to backup some times i cannot reach them that is the reason that i am deleting files small them 3000b

i need to keep last x files from each IP that is the reason
why i cannot del file via -mtime +10
some time i cannot access device for 3 weeks or more andi need to keep the last x files (with more then 3000b size) for this device
thanks

sharong · November 18, 2024, 12:00pm

hello logrotate will keep the small no good files
thanks

Neo · November 18, 2024, 12:51pm

Hi @sharong

I tested a quick Ruby script for you, but not completely because I do not want to spend time creating files of various sizes, sorry:

Here’s the equivalent script written in Ruby, partially tested only:

Ruby Script (First Draft)

require 'fileutils'

# Directory containing backup files
backup_dir = '.'

# Maximum number of files to keep per IP
max_files = 2

# Step 1: Filter files larger than 3000 bytes and group by IP
files_by_ip = Dir.glob(File.join(backup_dir, '*.txt'))
                 .select { |file| File.size(file) > 3000 } # Filter by size
                 .group_by do |file|
                   file.split('_')[1] # Extract the IP (second part of the filename)
                 end

# Step 2: Sort files by modification time for each IP and keep the newest `max_files`
files_to_delete = []

files_by_ip.each do |ip, files|
  # Sort files by modification time (newest first)
  sorted_files = files.sort_by { |file| File.mtime(file) }.reverse

  # Collect files beyond the newest `max_files`
  files_to_delete.concat(sorted_files[max_files..]) if sorted_files.size > max_files
end

# Step 3: Delete extra files
files_to_delete.compact.each do |file|
  puts "Deleting: #{file}"
  FileUtils.rm(file)
end

puts 'Cleanup complete!'

How It Works

Filter Files by Size:
- File.size(file) > 3000 ensures only files larger than 3000 bytes are considered.
Group Files by IP:
- group_by { |file| file.split('_')[1] } groups the files based on the IP address extracted from the filename.
Sort and Retain Newest Files:
- Files for each IP are sorted by their modification time (File.mtime) in descending order.
- Files beyond the newest max_files are added to the files_to_delete array.
Delete Extra Files:
- FileUtils.rm(file) deletes the files that are no longer needed.

Example Input Files

Directory Structure Before Running

CPE-444_10.10.30.30_08-12-24.txt
CPE-333_10.10.20.20_08-12-24.txt
CPE-222_10.10.10.10_08-12-24.txt
CPE-444_10.10.30.30_08-11-24.txt
CPE-333_10.10.20.20_08-11-24.txt
CPE-222_10.10.10.10_08-11-24.txt
CPE-444_10.10.30.30_08-10-24.txt
CPE-333_10.10.20.20_08-10-24.txt
CPE-222_10.10.10.10_08-10-24.txt

Files Retained After Running

CPE-222_10.10.10.10_08-12-24.txt
CPE-222_10.10.10.10_08-11-24.txt
CPE-333_10.10.20.20_08-12-24.txt
CPE-333_10.10.20.20_08-11-24.txt
CPE-444_10.10.30.30_08-12-24.txt
CPE-444_10.10.30.30_08-11-24.txt

Files Deleted

CPE-444_10.10.30.30_08-10-24.txt
CPE-333_10.10.20.20_08-10-24.txt
CPE-222_10.10.10.10_08-10-24.txt

Customization

Change max_files = 2 to adjust how many files per IP are retained.
Replace backup_dir = '.' with the path to your backup directory.

Running the Script

Save the script as cleanup_backups.rb and run it:

ruby cleanup_backups.rb

Sorry, it's only a draft for you @sharong and not a complete solution, so you need to further test it and then for more enhancements, please go for it.

sharong · November 18, 2024, 3:15pm

hello Neo
thanks for detailed answer "WOW"
i am trying to convert it to BASH i having some problem with Ruby
thanks
Sharon

EmersonPrado · November 18, 2024, 10:26pm

My 2 cent:
Add a purge section at the end of your backup routine so that, after each router backup, it would list files with the the router name/IP as a prefix, sort by date, select all but last 10, filter out the too small, then remove.
This would have 2 advantages over a general script which would purge files for all devices:

No need to fiddle with router name/IPs and backup dates at the same time to filter files. The backup routine already has name/IP, so the purge routine only has to deal with dates.
Purging happens right after a new file gets created, minimizing the presence of excess files.

trifo75 · November 20, 2024, 1:42pm

My bet would be the following, using bash, as requested by OP:

BKPDIR=/where/my/files/are

cd $BKPDIR

ls | cut -d"_" -f1-2 | sort -u | while read PATTERN
do
   FILESTODELETE=$( ls -t ${PATTERN}* | tail +10 )
   echo "Deleting $FILESTODELETE"
   echo "$FILESTODELETE" | xargs rm
done

As far as I see, the backup file names are formatted like UNIT_IP_DATE.txt so I list them and filter them by the first two underscore separated fields. Then I go through all the UNIT_IP pairs and list the files in timely order to delete all but the first 10.

Although the logrotate or time based solutions might be better.

Matt-Kita · November 20, 2024, 4:48pm

tail +10 would cause it to delete all but first 9 files - the N-th line is always included in the output of tail +N.

When working with \n delimited inputs such as this, it's better to use xargs -L1,...

which is still not ideal, because if there ever are fewer than N files present in the directory (matching the pattern of first found file), this will cause rm to attempt removing unnamed files / empty lines (basically returning with error rm: missing operand), as the FILESTODELETE strings created by command substitution will be "" in this case. Instead of such command substitution (containing | tail +N), I'd go with an array, and remove only files represented by values from N-th index upwards (and only in case an array for specific pattern contains more than N elements).

trifo75 · November 21, 2024, 12:49pm

OK, you are right, I shold have written tail +11, but it is quite marginal according to the OP's problem.

For the xargs, yes there should some error handling if the FILESTODELETE variable is empty. Like this

if [ "x$FILESTODELETE" == "x" ]; then
   echo "$FILESTODELETE" | xargs rm
fi

Or - as a makeshift solution - you can touch a flagfile and include its name in the deletion list, like this:

touch short_life_file
echo short_life_file $FILESTODELETE | xargs rm

Both can work.

MadeInGermany · November 22, 2024, 7:35am

Isn't it `xargs -I{} command {}` that handles embedded space characters correctly?
In GNU xargs -d'\n' is best.

The unquoted variable is subject to substitutions. Consider

echo "short_life_file
$FILESTODELETE" | xargs -I{} rm {}

or

printf "%s\n" short_life_file "$FILESTODELETE" | xargs -I{} rm {}

GNU xargs has -r to skip a null input:

printf "%s\n" "$FILESTODELETE" | xargs -rd'\n' rm