Delete all files if another files in the same directory has a matching occurrence of a specific word

he following are the files available in my directory

RSK_123_20141113_031500.txt
RSK_123_20141113_081500.txt
RSK_126_20141113_041500.txt
RSK_126_20141113_081800.txt
RSK_128_20141113_091600.txt

Here, "RSK" is file prefix and 123 is a code name and rest is just timestamp of the file when its created.

File prefix will not change, however the code name changes.

How to delete the old files ( RSK_123_20141113_031500.txt, RSK_126_20141113_041500.txt) alone relevant to that code name.

I am looking for solution to delete these old file using script.

Any help on this much appreciated. Many thanks in advance.

I am using Linux.

Not clear. Why wouldn't

CODE=123
rm RSK_${CODE}_*.txt

work?

EDIT: Ah - I think I got you. Do you want the oldest files per code deleted and keep only the modst recent one?

Yes, I want the oldest files per code needs to be deleted and retain only the most recent one related to that code.

Try:

ls RSK_* | awk -F'_' '
$2 == last2 {
	printf("rm -f \"%s\"\n", last0)
}
{	last0 = $0
	last2 = $2
}'

If that correctly lists the rm commands you want to run, change the last line of the script to:

}' | sh

to execute the commands instead of just printing them.

With your sample list of files, it produces the output:

rm -f "RSK_123_20141113_031500.txt"
rm -f "RSK_126_20141113_041500.txt"

If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk , /usr/xpg6/bin/awk , or nawk .

1 Like

:b::b::b:

Thanks a lot Don. It works perfectly.

I having minor issue if my directory name contains "_" symbol

ls system_inbound/RSK_* | awk -F'_' '
$2 == last2 {
	printf("rm -f \"%s\"\n", last0)
}
{	last0 = $0
	last2 = $2
}'

Here "system_inbound" is my directory name, the above code is not producing the intended result if the directory name contains "_" ( field separator).

This field separator may occur in my directory name more than once like ( system_inb_123 or user_inbound_124 ).

I tried using NF in awk, but its not producing any output at all.

ls ${inbound_target_folder}/${inbound_data_file_prefix}_* | awk -F'_' '$NF == last2 { printf("%s\n%s\n", last0, NF-2)} {last0 = $0} {last2 = $3}'

Any help is much appreciated. Many Thanks in advance.

New to UNIX shell scripting.

The obvious, simple thing to do to fix this is to change the line:

ls system_inbound/RSK_* | awk -F'_' '

to:

cd system_inbound && ls RSK_* | awk -F'_' '

Then the awk script is dealing with filenames in a directory (as it was designed to do) instead of pathnames that contain an arbitrary number of underscores.

If you're trying to use this script to handle multiple directories in a single invocation using something like:

ls */RSK_* | awk -F'_' '
...

then you need to restate your requirements so we know what is supposed to happen if the number after RSK_ has matches in multiple directories. This would be a lot more complex than dealing with matches in a single directory.

1 Like

Thanks a lot Don