Scan and remove if file infected using bash

The below bash runs clamav on all files in DIR and produces virus-scan.log . My question is the portion in bold is supposed to move the infected files, lines not OK , to /home/cmccabe/quarantine . Does the bash look correct? Thank you :).

virus-scan.log

Mon Jan 16 14:39:05 CST 2017
/home/cmccabe/Desktop/NGS/API/R_2017_01_13_14_46_04_user_S5-00580-25-Medexome/IonXpress_008_xx-xxx_R_2017_01_13_14_46_04_user_S5-00580-25-Medexome.bam.bai: OK
/home/cmccabe/Desktop/NGS/API/R_2017_01_13_14_46_04_user_S5-00580-25-Medexome/IonXpress_007_xx-xxx_R_2017_01_13_14_46_04_user_S5-00580-25-Medexome.bam: OK
/home/cmccabe/Desktop/NGS/API/R_2017_01_13_14_46_04_user_S5-00580-25-Medexome/IonXpress_007_xx-xxx_R_2017_01_13_14_46_04_user_S5-00580-25-Medexome.bam.bai: OK
#!/bin/bash

DIR=/home/cmccabe/Desktop/NGS/API
cd $DIR
line_no=$(ls | awk -F . '{print $NF}' | sort | uniq -c | awk '{print $2,$1}') # count folder type and store as variable
echo "The folders detected are:
$line_no"

# Get rid of old log file
rm $HOME/virus-scan.log 2> /dev/null
 
for FILE in $DIR;
do
     # check file length is nonzero otherwise commands may be repeated
     if [ -s $FILE ]; then
          date > $HOME/virus-scan.log
          clamscan -r $FILE >> $HOME/virus-scan.log
if grep -iq "OK" "${file}"; then
        echo "echo nothing detected by scan"
    else
        if [[ -f "$f" ]]; then
               mv -f "$f" /home/cmccabe/Desktop/API/$filename /home/cmccabe/quarantine
            # rm -f "$f"
            echo "The files infected have been moved to the folder at /home/cmccabe/quarantine"
        fi
     fi
done

Hi cmccabe, I think the script will need work.

First the script goes in to the directory $DIR and then iterates in a for loop over one single value, the contents of $DIR, which is the name of the parent directory: /home/cmccabe/Desktop/NGS/API . Probably because clamscan also takes directories as an argument, the command will eventually work, but no thanks to the script.

Likewise, [ -s $FILE ] tests that directory again so that also serves no purpose and the condition will always be true.

Then a grep is performed on the same directory as if it were a regular file and it test for the case insensitive ok (which in itself is a very bad test since it will easily give false positives). This will fail, since since it is not a file, but an empty string (the uninitialized variable file is empty that does not contain the characters OK.

So then it tests with [[ -f "$f" ]] if the empty string (uninitialized variable f is empty) is a file, which is not the case, so fortunately the rest of the code will be skipped, otherwise it would have move the entire directory /home/cmccabe/Desktop/API to /home/cmccabe/quarantine .

1 Like

Using some helpful suggestions from @MadeInGermany as well as yourself. Not sure how to address the grep Thank you very much :).

#!/bin/bash
DIR=/home/cmccabe/Desktop/NGS/API
log=$HOME/virus-scan.log

{
echo "The extensions are"
ls | awk -F'\.' 'NF>1 {ext[$NF]++} END {for (i in ext) print ext,i}'
} > $log

scanned=0
for FILE in "$DIR"/*
do
     # check file length is nonzero otherwise commands may be repeated
     if [ -s "$FILE" ]; then
          {
          date
          clamscan -r "$FILE"
          } >> $log
          ((scanned++))
     if grep -iq "OK" "${file}"; then
        echo "echo nothing detected by scan"
    else
        if [[ -f "$f" ]]; then
               mv -f "$f" /home/cmccabe/Desktop/API/$filename /home/cmccabe/quarantine
            # rm -f "$f"
            echo "The files infected have been moved to the folder at /home/cmccabe/quarantine"
        fi
     fi
done
[ $scanned -eq 0 ] && echo "nothing detected by scan" >> $log

What would happen with an infected file called This_file_OK_and_not_infected ? I would suggest that your grep will ignore it.

I have this section of code reading the output:-

        while read line
        do
           line="${line% FOUND}"
           virus_name="${line#* }"
           file_name="${line%: *}"
           ((virus_count=$virus_count+1))

           printf "  %s\n" "${file_name}"            # Output to screen
           printf "%s\n" "${file_name}" >&3          # Output to log_file
        done < <(grep " FOUND$" $scan_log) 3>log_file

Obviously the scan_log is defined earlier and written to by clamav

This then gives me output to screen and in the file log_file with a list of infected files, which I then deal with.

Does this help?

Robin

1 Like

So if I am following correctly, something more like:

#!/bin/bash
DIR=/home/cmccabe/Desktop/NGS/API
log=$HOME/virus-scan.log

{
echo "The extensions are"
ls | awk -F'\.' 'NF>1 {ext[$NF]++} END {for (i in ext) print ext,i}'
} > $log

scanned=0
for FILE in "$DIR"/*
do
     # check file length is nonzero otherwise commands may be repeated
     if [ -s "$FILE" ]; then
          {
          date
          clamscan -r "$FILE"
          } >> $log
          ((scanned++))
          while read line
          do
              line="${line% FOUND}"
              virus_name="${line#* }"
              file_name="${line%: *}"
              ((virus_count=$virus_count+1))

              printf "  %s\n" "${file_name}"            # Output to screen
              printf "%s\n" "${file_name}" >&3          # Output to log
          done < <(grep " FOUND$" $scan_log) 3>log
          echo "The files infected have been moved to the folder at /home/cmccabe/quarantine"
        fi
     fi
done
[ $scanned -eq 0 ] && echo "nothing detected by scan" >> $log

Thank you for your help :).

I'm not sure why you have the loop for for FILE in "$DIR"/* when you follow it up with clamscan -r "$FILE"

The -r flag asks clamscan to recursively search. This will call clamscan once for each item in the directory. Can you not just clamscan -r "$DIR" instead? I find that running clamscan has a several second overhead as it loads up the definitions. You could be scanning for hours just on calling the process repeatedly. An alternate might be to list the files into another file and use that as input with the -f flag, e.g. clamscan -if /tmp/file_list.txt

I've added the -i flag to only list infected files, which might make reading the output easier.

You have the basis of some good code here though, keep going :wink:

Do you have a virus signature to test this with?

Robin