Count the number of files to delete doesnt match

Good evening, need your help please

Need to delete certain files before octobre 1 2016, so need to know how many files im going to delete, for instance

ls -lrt file_20160*.lis!wc -l

but using grep -c to another file called bplist which contains the list of all files backed up doesn match the count

grep -c file_20160 bplist.txt

the first query gives me 568 files and the second query returns 1120 records-

So before deleting ive got to make sure ive got the right amount of files to delete, so waht am i doing wrong.

After matching the amount of files to delete i need to add a new file

1 using grep to bplist.txt to clasify files to delete this way:

for file in $(ls -lrt file_20160*.lis!awk '{print $9}')
do
grep $file bplist.txt >>filetodelete.txt
done 

is any better and faster way to do that ?

Id appreciate your help in adavnced

Moderator comments were removed during original forum migration.

First: Note that elements of a pipeline are separated by pipe symbols ( | ); not exclamation points ( ! ). So the code you showed us in post #1 in this thread can't possibly produce the output you described.

Second: We have absolutely no idea what the format is for the data in bplist (or bplist.txt , depending on which part of your post we are to believe). We have absolutely no idea what the format is for the filenames (or pathnames) being processed.

Third: You have not explained why you need to count files to be removed instead of just identifying files to be removed and removing them.

Fourth: You have not given us any indication whether there are duplicates in one or both of your lists, whether files in one list are different than files in the other list, nor if there is any indication that there is a problem with the contents of either list (other than that the line counts are different).

Fifth: Why use the complicated:

for file in $(ls -lrt file_20160*.lis!awk '{print $9}')

which involves creating a subshell and invoking two utilities and can fail miserably if there are any whitespace characters in any of your filenames, when:

for file in file_20160*.lis

would be MUCH faster and, if you properly quoted the expansion (i.e., "$file" ) in your for loop, suffers none of the problems possible in your current loop.

Further to Don's remarks, if you are using file name expansion with the for loop,

The first attempt might look like this:

for file in file_20160*.lis
do
  grep "$file" bplist.txt >>filetodelete.txt
done

It is important to test for the case when there are zero files that fit the pattern, otherwise you end up with an a variable that contains file_20160*.lis , which would then become a regular expression, since that is what grep, like so:

grep file_20160*.lis bplist.txt >>filetodelete.txt

which would then delete any file names that start with "file_2016" followed by zero or more zeroes and ".lis" from the file..

Now probably those files do not exist in your case, but it is best to avoid a possible loop hole altogether, by testing if a file exists and use string matching instead of regex matching, using grep's -F parameter. To avoid partial file name matches (where the pattern or string that grep is looking for is a subset of the filename) another important parameter would be the -x option, which forces line matches. A third thing would be to avid the possibility that files that start with a - sign could be interpreted as an option flag to grep. One way to stop this is by using the -- flag. Because you file pattern starts with file that will not be an issue here, but it is good practice to do that anyways, so that in future if you ever change the pattern so that it starts with an * , this will not break things.

So then it becomes:

for file in file_20160*.lis
do
  if [ -f "$file" ]; then 
    grep -Fx -- "$file" bplist.txt >>filetodelete.txt
  fi
done

Now that last thing here is that you are appending to the file here, probably out of necessity, otherwise the file would be overwritten with very loop. An alternative would be to redirect the loop itself so the file would only be opened once and you do not have to delete the file prior to running the loop:

for file in file_20160*.lis
do
  if [ -f "$file" ]; then
    grep -Fx -- "$file" bplist.txt
  fi
done > filetodelete.txt

One last thing. This is still an expensive way to do it because an external program in a subshell is used to perform the operations for every iteration in the for loop, which is resource intensive.

An alternative would be to use a pipe ( | ) and grep's - operator for stdin, which most grep's (but not all) will honor, together with the file flag -f

ls file_20160*.lis | grep -Fxf -  -- bplist.txt > filetodelete.txt

if there are not too many files in the directory.

Or use the more robust:

for file in file_20160*.lis
do
  if [ -f "$file" ]; then
    printf "%s\n" "$file"
  fi
done |
grep -Fxf -  -- bplist.txt > filetodelete.txt

Since the - operator for stdin is not universally supported in grep, another way would be to use process substitution ( <( ... ) ) that is used in for modern bash, ksh93 or zsh:

grep -Fxf <(ls file_20160*.lis) -- bplist.txt > filetodelete.txt

if there are not too many files in the directory.

Or again the more robust variety:

grep -Fxf <(
  for file in file_20160*.lis
  do
    if [ -f "$file" ]; then
      printf "%s\n" "$file"
    fi
  done
) -- bplist.txt > filetodelete.txt

OK, i will take into account your Recommendations. Thank you very much everyone of you for the options you gave me to reach out what i want.

First you were right bplist file had duplicated lines, thats why number of records didn match,so by removing duplicates i typed:

sort List-prosclbt00c-xpbatch-01112016_Q607965.txt|uniq > List-prosclbt00c-xpbatch-01112016_Q607965_nd.txt

this file has this format:

-r   1452 Nov 01 10:10 /produccion/explotacion/xpbatch/SHELL_PLAN_FAMILIA-SEA_MOVIMIENTO/logs/LogMonitoreoSeaMovimiento20150601_2200
00.log
-r--r--r-- xpbatch   explotaci          40 Nov 01 10:10 /produccion/explotacion/xpbatch/CreacionEnvioContratos/jdk1.8.0_45/LICENSE
-r--r--r-- xpbatch   explotaci          40 Nov 01 10:10 /produccion/explotacion/xpbatch/CreacionEnvioContratos/jdk1.8.0_45/jre/LICEN
SE
-r--r--r-- xpbatch   explotaci          46 Nov 01 10:10 /produccion/explotacion/xpbatch/CreacionEnvioContratos/jdk1.8.0_45/jre/READM
E
-r--r--r-- xpbatch   explotaci         113 Nov 01 10:10 /produccion/explotacion/xpbatch/local.cshrc
-r--r--r-- xpbatch   explotaci         159 Nov 01 10:10 /produccion/explotacion/xpbatch/CreacionEnvioContratos/jdk1.8.0_45/README.ht
ml
-r--r--r-- xpbatch   explotaci         580 Nov 01 10:10 /produccion/explotacion/xpbatch/local.profile
-r--r--r-- xpbatch   explotaci         607 Nov 01 10:10 /produccion/explotacion/xpbatch/local.login
-r--r--r-- xpbatch   explotaci         632 Nov 01 10:10 /produccion/explotacion/xpbatch/CreacionEnvioContratos/jdk1.8.0_45/jre/lib/c
mm/GRAY.pf
-r--r--r-- xpbatch   explotaci         955 Nov 01 10:10 /produccion/explotacion/xpbatch/CreacionEnvioContratos/jdk1.8.0_45/jre/Welco
me.html
-r--r--r-- xpbatch   explotaci        1044 Nov 01 10:10 /produccion/explotacion/xpbatch/CreacionEnvioContratos/jdk1.8.0_45/jre/lib/c
mm/LINEAR_RGB.pf
-r--r--r-- xpbatch   explotaci        2856 Nov 01 10:10 /produccion/explotacion/xpbatch/CreacionEnvioContratos/jdk1.8.0_45/jre/lib/m
anagement/jmxremote.password.template
-r--r--r-- xpbatch   explotaci        3144 Nov 01 10:10 /produccion/explotacion/xpbatch/CreacionEnvioContratos/jdk1.8.0_45/jre/lib/c
mm/sRGB.pf

Now i wanted to search some files in bplist, but it is likely grep options are not supported (Sun operating system), it yields error,ie:

for Archivo in log_HistoricoRecargas??062016*.log*
do
  if [ -f "$Archivo" ]; then
    grep  -Fx -- "$Archivo" List-prosclbt00c-xpbatch-01112016_Q607965_nd.txt
  fi
done > Archivotodelete.txt

grep: illegal option -- F
grep: illegal option -- x
Usage: grep -hblcnsviw pattern file . . .
grep: illegal option -- F

So i remove grep options and run the shell but id didn work because it didnt find records

for Archivo in log_HistoricoRecargas??062016*.log*
do
  if [ -f "$Archivo" ]; then
    grep  "$Archivo" List-prosclbt00c-xpbatch-01112016_Q607965_nd.txt
  fi
done > Archivotodelete.txt
SCEL /SCEL/logs1/xpbatch #ls -lrt Archivotodelete.txt
-rw-r--r--   1 xpbatch  explotacion       0 Nov 25 21:47 Archivotodelete.txt

Thats because grep is literally looking the pattern ?? and * characters:

But listing those files does really exist

ls log_HistoricoRecargas??062016*.log*|wc -l
24220

A appreciate your help in advanced

On Solaris/SunOS systems use /usr/xpg4/bin/grep -Fx or fgrep -x instead of grep -Fx .

Thanks, but using grep or fgrep doesnt interpret special characters like *,??
for example this search doesnt list anything

fgrep -x log_HistoricoRecargas??062016*.log*  List-prosclbt00c-logs1-01112016_Q607965_sindupli.txt 

I appreciate your help in advanced

---------- Post updated 11-27-16 at 01:39 AM ---------- Previous update was 11-26-16 at 09:35 PM ----------

or should i use egrep that supports wildcard patterns instead?

On Solaris/SunOS systems support egrep ?

Thanks for your support in advanced

I sincerely apologize for trying to tell you how to access standard grep -Fx behavior on Solaris systems. Since that was what you were using in your post, I assumed that that was what you wanted. Your script used the command:

    grep  -Fx -- "$Archivo" List-prosclbt00c-xpbatch-01112016_Q607965_nd.txt

and failed with diagnostics saying that the grep utility you had invoked didn't recognize the -F and -x options. The command grep -F can be replaced by fgrep , but if the grep in your $PATH doesn't recognize -x , fgrep isn't likely to either. But, /usr/xpg4/bin/grep -Fx should work IF AND ONLY IF the other operands you supply make sense. So, if $Archivo expands to the name of a regular file that appears in the file List-prosclbt00c-xpbatch-01112016_Q607965_nd.txt as an entire line of text with no other characters on that line, then your command makes sense and should print the expansion of $Archivo if that filename appears as an entire line in that file. But, the contents of the file you showed us in post #5 seems to be a perverted list of absolute pathnames of files sorted with keys being the file mode, the file owner, the file group, the file size, the file last modification time's abbreviated month, the file last modification time's day-of-month, the file last modification time's (hour and minute) or (year), and the file's absolute pathname. But, the pattern you are looking for in this file is the last component of a pathname (which can never happen).

If you are looking for files that might match what is in a file like you showed us in post #5, you might want to try something more like:

printf '.*/%s\n' log_HistoricoRecargas??062016*.log |
    /usr/xpg4/bin/grep  -xf - List-prosclbt00c-xpbatch-01112016_Q607965_nd.txt > Archivotodelete.txt

Note that we are not looking for fixed strings; we are looking for a filename at the end of a line containing other text (so we don't use fgrep and we don't use grep -F ). Since the filenames you're looking for contain <period>s, there is a chance that there will be false matches, but I will make the wild assumption that if you're looking for a file with a name like log_HistoricoRecargas12062016sometext.log it would be unlikely that you would also have a file named log_HistoricoRecargas12062016othertextXlog where sometext followed by a <period> and othertextX are arbitrary but different strings.