Need to delete large set of files (i.e) close to 100K from a directory based on the input file

Hi all,

I need a script to delete a large set of files from a directory under / based on an input file and want to redirect errors into separate file.

I have already prepared a list of files in the input file.

Kndly help me.

Thanks,
Prash

xargs echo rm < list 2> errlog

Remove the 'echo' once you've tested and are sure it really does what you want.

I have run the below command but i need to print to a file where it shows what files have been deleted?

sudo xargs rm <file

Btw all the files are in directory under "/" . But the input file is something like below (i.e) it dont have any path to the file. So i need a command which deletes these files without storing the input file inside the directory under / where the original files reside.

more file
0012301170
0000013300
0000014100
-
-
-

Thanks for the quick reply.

---------- Post updated at 02:18 PM ---------- Previous update was at 02:13 PM ----------

Corona,

Can you please suggest whether i can use the below script to delete the files?

for i in `cat file`;
do
     rm -f $i
done

If yes i want an echo statement to be printed on what files have been deleted and what file are not found in the directory?

P.S: I need to delete close to 100K files and the file anmes are listed in the input file.

Thanks

---------- Post updated at 04:10 PM ---------- Previous update was at 02:18 PM ----------

Could you please help me in deleting the large set of files without moving the input file to a directory where original files reside.

Thanks

If your rm has the -v (verbose) option you could do this, assumption is your file_list.txt and logs are in the /tmp directory.

cd /your_dir
xargs rm -v < /tmp/file_list.txt > /tmp/removed.list 2> /tmp/errors.list

for i in `cat file`; This will be especially bad for what you want because it will try -- and fail -- to cram all 100,000 filenames into one thing and die with 'too many arguments'.

Thanks for the post. so if use the below script it is giving me illegal option --v errors, so i have taken -v option and have run the command it worked fine. But is there a way to capture all the files which were deleted in this operation?

I can see the data for which files were not found from errors.list

sudo xargs rm -v < /tmp/file > /tmp/removed.list 2> tmp/errors.list

Thanks

Well any file in the original filelist that isn't in the error list was removed.

Can you output the first few lines of the error file ( head /tmp/errors.list ) file so we can see the format, a simple awk script should be able to produce the removed files list.

Below is the output from the errors.list file

12345678.jpg: No such file or directory
12348765.jpg: No such file or directory
87654321.jpg: No such file or directory
87651234.jpg: No such file or directory
-
-
-

Thanks

try:

awk -F: 'NR==FNR{d[$1]++;next} !($0 in d)' errors.list filelist.txt > removed.list

Getting below error and btw please include something for file not found with this script.

awk -F: 'NR==FNR{d[$1]++;next} !($0 in d)' /tmp/error.txt /tmp/file.txt
awk: syntax error near line 1
awk: bailing out near line 1

Thanks

---------- Post updated at 05:02 PM ---------- Previous update was at 05:00 PM ----------

Below is the full command with piping the output file

awk -F: 'NR==FNR{d[$1]++;next} !($0 in d)' /tmp/error.txt /tmp/file.txt > /tmp/removed.list
awk: syntax error near line 1
awk: bailing out near line 1

Thanks

If your on Solaris you will need to use nawk instead of awk

Thank you for the update. I have used nawk as suggested but i dont see anything in the below files as they are empty

0 Aug 20 22:05 error.txt
0 Aug 20 22:06 removed.list

nawk -F: 'NR==FNR{d[$1]++;next} !($0 in d)' /tmp/error.txt /tmp/file.txt > /tmp/removed.list

So in the below command first file will be the errors file and the second one will be the list of files to be deleted and the third one will be the deleted files list.

can you confirm?

Thanks

---------- Post updated at 05:18 PM ---------- Previous update was at 05:11 PM ----------

Btw if i run the below command it wont delete the files as required.

nawk -F: 'NR==FNR{d[$1]++;next} !($0 in d)' /tmp/error.txt /tmp/file.txt > /tmp/removed.list

Thanks

Yes that is correct, the awk command is just to produce a list of what was removed by the xargs command run earlier.

Unfortunately your rm doesn�t support -v (verbose), so the best way to get a deleted files list is to process the errorlog after the fact. We can deduce that any file listed in the remove list that is not in the errorlog was removed

The complete script would be:

xargs rm < /tmp/file.list 2> /tmp/error.txt
nawk -F: 'NR==FNR{d[$1]++;next} !($0 in d)' /tmp/error.txt /tmp/file.list > /tmp/removed.list
1 Like

Thanks for the quick response.

---------- Post updated at 05:36 PM ---------- Previous update was at 05:33 PM ----------

Btw even the nawk command worked for me in Solaris and what can we use for linux?

Is there a way that we can combine these two commands in a shell script?

xargs rm < /tmp/file.list 2> /tmp/error.txt
nawk -F: 'NR==FNR{d[$1]++;next} !($0 in d)' /tmp/error.txt /tmp/file.list > /tmp/removed.list

Thanks

On linux use the code from posting #4

Yes the two commands can be combined into a script perhaps something like this:

#!/bin/sh
#
# rm_list.sh
#
# Usage:  rm_list.sh list_file error_file  removed_file
#
if [ $# -ne 3 ]
then
   echo "Usage:"
   echo "   rm_list.sh list_file error_file removed_file"
   exit 1
fi
 
LIST=$1
ERR=$2
REM=$3
 
if ! [ -f $LIST ]
then
    echo "List file $LIST not found"
    exit 2
fi
 
xargs rm < $LIST 2> $ERR
nawk -F: 'NR==FNR{d[$1]++;next} !($0 in d)' $ERR $LIST > $REM

Thank you for the requested script and everything is working as expected. But i am getting below exception when running the script.

./rm_list.sh list_file error_file  removed_file
./rm_list.sh: !: not found

Please advise.

Thanks

Check for typos/misalignments somewhere around this code:

if ! [ -f $LIST ]
then
    echo "List file $LIST not found"
    exit 2
fi

I suspect you missed a space or two somewhere.

Not sure. I have tested in many ways and i am still getting the same error.

Thanks

---------- Post updated at 02:32 PM ---------- Previous update was at 10:50 AM ----------

I am seeing this error in solaris but not on linux.

Odd. I think that's supposed to be valid syntax, but Solaris does have a really old and noncompliant shell...

Try putting the ! inside the [ ]

if [ ! -f $LIST ]
1 Like

Thank you for the quick response. It worked like a GEM!!:slight_smile:

---------- Post updated 08-22-12 at 10:41 AM ---------- Previous update was 08-21-12 at 03:02 PM ----------

I have a similar type of files that needs to be removed from the below directory and sub-directory

cd /photos

find . -name "*12345678*" -ls
156040   58 -rw-r--r--   1 nobody   nobody      58389 Jul 27  2011 ./12345678.jpg
703506    2 -rw-r--r--   1 nobody   nobody       1309 Jul 27  2011 ./thumbnail/12345678-tn.jpg

So by providing the unique content in both files into a input file (i.e) 12345678 , how can we add a line to this script to take care of this operation. There are more than 100k records in the photos directory and as well as thumbnail directory which is the sub directory under photos.

Bottom line i only want to provide 8 digit number and that should delete the files under photos directory and its sub directory (i.e) thumbnails

Thanks