Bash to verify each line in input for specific pattern

In the bash below the out put of a process is written to input . What I am trying to do is read each line in the input and verify/check it for specific text (there are always 6 lines for each file and the specific text for each line is in the description). There will always be 6 lines in each specific file in input , however the file number can vary. In this example there are 3 specific files (each color block is a file with 6 lines), but the next time there may only be two. If each line in the file is a match to description then the file is verified/good, but if it does not then the file is not.

I hope the below is a start and have commented each line. Thank you :).

input

Start import validation creation: Wed May 17 06:55:34 CDT 2017
/home/cmccabe/Desktop/validate/file1.txt found expected header
/home/cmccabe/Desktop/validate/file1.txt found expected order of fields
/home/cmccabe/Desktop/validate/file1.txt R_Index is a number
/home/cmccabe/Desktop/validate/file1.txt PopFreqMax is valid
/home/cmccabe/Desktop/validate/file1.txt Quality is a character
/home/cmccabe/Desktop/validate/file1.txt HGMD and Sanger are valid
/home/cmccabe/Desktop/validate/file2.txt found expected header
/home/cmccabe/Desktop/validate/file2.txt found expected order of fields
/home/cmccabe/Desktop/validate/file2.txt R_Index is a number
/home/cmccabe/Desktop/validate/file2.txt PopFreqMax is valid
/home/cmccabe/Desktop/validate/file2.txt Quality is a character
/home/cmccabe/Desktop/validate/file2.txt HGMD and Sanger are valid
/home/cmccabe/Desktop/validate/file3.txt found expected header
/home/cmccabe/Desktop/validate/file3.txt found expected order of fields
/home/cmccabe/Desktop/validate/file3.txt R_Index is a number
/home/cmccabe/Desktop/validate/file3.txt PopFreqMax is valid
/home/cmccabe/Desktop/validate/file3.txt Quality is a character
/home/cmccabe/Desktop/validate/file3.txt HGMD and Sanger are valid
End import validation creation: Wed May 17 06:55:34 CDT 2017
#!/bin/bash
while read line; do   # read each line in input
    if  echo "$line" | grep -q "Found expected header"; then echo "LINE IS GOOD"            # read line 1 
    if  echo "$line" | grep -q "Found expected order of fields"; then echo "LINE IS GOOD"   # read line 2
    if  echo "$line" | grep -q "R_Index is a number"; then echo "LINE IS GOOD"              # read line 3
    if  echo "$line" | grep -q "PopFreqMax is valid"; then echo "LINE IS GOOD"              # read line 4
    if  echo "$line" | grep -q "Quality is a character"; then echo "LINE IS GOOD"           # read line 5
    if  echo "$line" | grep -q "HGMD and Sanger are valid"; then echo "LINE IS GOOD"        # read line 6
    fi
done < file
   file="home/cmccabe/Desktop/validate/input"   # define path to input
   string="LINE IS GOOD"                        # define string to look for in each line
   count=$(grep -c "$string" "$file")           # count string occurences
               if [[ count -gt 6 ]]; then       # if count = 6
                    echo "$string has occurred 6 times"  # string is in each file x times
                    echo "FILENAME is verified"          # specific file is verified or good
               fi                    
                 else
                    echo "FILENAME not verified"         # specific file not verified
                 fi

Description

1="Found expected header"
2="Found expected order of fields"
3="R_Index is a number"
4="PopFreqMax is valid"
5="Quality is a character"
6="HGMD and Sanger are valid"

Hello cmccabe,

It is not clear, could you please put more information to your post. Also please always show us expected sample Output too.

Thanks,
R. Singh

1 Like

File 1 has 6 lines in it:

Line 1 is /home/cmccabe/Desktop/validate/file1.txt found expected header
Line 2 is /home/cmccabe/Desktop/validate/file1.txt found expected order of fields
Line 3 is /home/cmccabe/Desktop/validate/file1.txt R_Index is a number
Line 4 is/home/cmccabe/Desktop/validate/file1.txt PopFreqMax is valid
Line 5 is /home/cmccabe/Desktop/validate/file1.txt Quality is a character
Line 6 is /home/cmccabe/Desktop/validate/file1.txt HGMD and Sanger are valid
 Line 1 matches the expected pattern in description so "LINE IS GOOD"
Line 2 matches the expected pattern in description so "LINE IS GOOD"
Line 3 matches the expected pattern in description so "LINE IS GOOD"
Line 4 matches the expected pattern in description so "LINE IS GOOD"
Line 5 matches the expected pattern in description so "LINE IS GOOD"
Line 6 matches the expected pattern in description so "LINE IS GOOD"

Since "LINE IS GOOD" = 6 then File1 is verified (desired output), but if the pattern is anything else then the "LINE IS GOOD" will be less than 6 so the File is not verified.
File 2 has 6 lines in it:

Line 1 is /home/cmccabe/Desktop/validate/file1.txt found expected header
Line 2 is /home/cmccabe/Desktop/validate/file1.txt found expected order of fields
Line 3 is /home/cmccabe/Desktop/validate/file1.txt R_Index is a number
Line 4 is/home/cmccabe/Desktop/validate/file1.txt PopFreqMax is valid
Line 5 is /home/cmccabe/Desktop/validate/file1.txt Quality is a character
Line 6 is /home/cmccabe/Desktop/validate/file1.txt HGMD and Sanger are valid
 Line 1 matches the expected pattern in description so "LINE IS GOOD"
Line 2 matches the expected pattern in description so "LINE IS GOOD"
Line 3 matches the expected pattern in description so "LINE IS GOOD"
Line 4 matches the expected pattern in description so "LINE IS GOOD"
Line 5 matches the expected pattern in description so "LINE IS GOOD"
Line 6 matches the expected pattern in description so "LINE IS GOOD"

Since "LINE IS GOOD" = 6 then File1 is verified (desired output), but if the pattern is anything else then the "LINE IS GOOD" will be less than 6 so the File is not verified.
File 3 has 6 lines in it:

Line 1 is /home/cmccabe/Desktop/validate/file1.txt found expected header
Line 2 is /home/cmccabe/Desktop/validate/file1.txt found expected order of fields
Line 3 is /home/cmccabe/Desktop/validate/file1.txt R_Index is a number
Line 4 is/home/cmccabe/Desktop/validate/file1.txt PopFreqMax is valid
Line 5 is /home/cmccabe/Desktop/validate/file1.txt Quality is a character
Line 6 is /home/cmccabe/Desktop/validate/file1.txt HGMD and Sanger are valid
 Line 1 matches the expected pattern in description so "LINE IS GOOD"
Line 2 matches the expected pattern in description so "LINE IS GOOD"
Line 3 matches the expected pattern in description so "LINE IS GOOD"
Line 4 matches the expected pattern in description so "LINE IS GOOD"
Line 5 matches the expected pattern in description so "LINE IS GOOD"
Line 6 matches the expected pattern in description so "LINE IS GOOD"

Since "LINE IS GOOD" = 6 then File1 is verified (desired output), but if the pattern is anything else then the "LINE IS GOOD" will be less than 6 so the File is not verified.
desired output

/home/cmccabe/Desktop/validate/file1.txt is verified
/home/cmccabe/Desktop/validate/file2.txt is verified
/home/cmccabe/Desktop/validate/file3.txt is verified

It is also possible that there could only be 1 or two files, but there will always be 6 lines in each file. I use FILENAME to represent each file instead of hardcoding it in.

Does this help and thank you :).

Hello cmccabe,

Still not 100% sure, could you please try following and let me know if this helps you.

awk 'FNR==1 && /found expected header/{VAL++} FNR==2 && /found expected order of fields/{VAL++} FNR==3 && /R_Index is a number/{VAL++} FNR==4 && /PopFreqMax is valid/{VAL++} FNR==5 && /Quality is a character/{VAL++} FNR==6 && /HGMD and Sanger are valid/{VAL++} END{if(VAL==6){print FILENAME " is verified."}}'  Input_file*

You could mention in above as file* if you have only files with digits in them else you could change the regex to file[0-9] etc depending upon your files.
EDIT: Adding a non-one liner for of solution too successfully now.

awk 'FNR==1 && /found expected header/{
                                        VAL++
                                     }
     FNR==2 && /found expected order of fields/{
                                                VAL++
                                              }
     FNR==3 && /R_Index is a number/{
                                        VAL++
                                   }
     FNR==4 && /PopFreqMax is valid/{
                                        VAL++
                                   }
     FNR==5 && /Quality is a character/{
                                        VAL++
                                      }
     FNR==6 && /HGMD and Sanger are valid/{
                                                VAL++
                                         }
     END{
                if(VAL==6){
                                print FILENAME " is verified."
                          }
        }
    '   Input_file
 

Thanks,
R. Singh

1 Like

I am not sure what you mean by changing the regex, but each file is a block of 6 lines within input.

input.txt (file that has the output of the process)

 Start import validation creation: Wed May 17 06:55:34 CDT 2017 -header-
/home/cmccabe/Desktop/validate/00-0000-l,f.txt found expected header
/home/cmccabe/Desktop/validate/00-0000-l,f.txt found expected order of fields
/home/cmccabe/Desktop/validate/00-0000-l,f.txt R_Index is a number
/home/cmccabe/Desktop/validate/00-0000-l,f.txt PopFreqMax is valid
/home/cmccabe/Desktop/validate/00-0000-l,f.txt Quality is a character
/home/cmccabe/Desktop/validate/00-0000-l,f.txt HGMD and Sanger are valid
/home/cmccabe/Desktop/validate/00-0001-l,f.txt found expected header
/home/cmccabe/Desktop/validate/00-0001-l,f.txt found expected order of fields
/home/cmccabe/Desktop/validate/00-0001-l,f.txt R_Index is a number
/home/cmccabe/Desktop/validate/00-0001-l,f.txt PopFreqMax is valid
/home/cmccabe/Desktop/validate/00-0001-l,f.txt Quality is a character
/home/cmccabe/Desktop/validate/00-0001-l,f.txt HGMD and Sanger are valid
/home/cmccabe/Desktop/validate/00-0002-l,f.txt found expected header
/home/cmccabe/Desktop/validate/00-0002-l,f.txt found expected order of fields
/home/cmccabe/Desktop/validate/00-0002-l,f.txt R_Index is a number
/home/cmccabe/Desktop/validate/00-0002-l,f.txt PopFreqMax is valid
/home/cmccabe/Desktop/validate/00-0002-l,f.txt Quality is a character
/home/cmccabe/Desktop/validate/00-0002-l,f.txt HGMD and Sanger are valid
End import validation creation: Wed May 17 06:55:34 CDT 2017 -footer-
 

individual files within input.txt (in this example there are 3, but it is possible to have only 1 or 2)

 /home/cmccabe/Desktop/validate/00-0000-l,f.txt 
 /home/cmccabe/Desktop/validate/00-0001-l,f.txt 
 /home/cmccabe/Desktop/validate/00-0002-l,f.txt 
 
 awk 'NR==2 && /found expected header/{
                                        VAL++
                                     }
     NR==3 && /found expected order of fields/{
                                                VAL++
                                              }
     NR==4 && /R_Index is a number/{
                                        VAL++
                                   }
     NR==5 && /PopFreqMax is valid/{
                                        VAL++
                                   }
     NR==6 && /Quality is a character/{
                                        VAL++
                                      }
     NR==7 && /HGMD and Sanger are valid/{
                                                VAL++
                                         }
     END{
                if(VAL==6){
                                print FILENAME " is verified."
                          }
        }
' input.txt > verify.txt
 input.txt is verified   --- output for the input file not the individual (is this what you mean by change the regex)?
 

I changed the NR== to skip the header (not sure if that the best). Also would adding print FILENAME " is not verified." capture any negative results where the files did not meet the expected lines (had different values)?

desired result

 /home/cmccabe/Desktop/validate/00-0000-l,f.txt is verified
 /home/cmccabe/Desktop/validate/00-0001-l,f.txt is verified
 /home/cmccabe/Desktop/validate/00-0002-l,f.txt is verified
 

Thank you very much :).

Your description is rather confusing, I think the following may come closer to doing what you want:

awk '
NR == 1 {
	next
}
NR % 6 == 2 && /found expected header/ ||
NR % 6 == 3 && /found expected order of fields/ ||
NR % 6 == 4 && /R_Index is a number/ ||
NR % 6 == 5 && /PopFreqMax is valid/ ||
NR % 6 == 0 && /Quality is a character/ ||
NR % 6 == 1 && /HGMD and Sanger are valid/ {
	VAL++
}
NR % 6 == 1 {
	print " " $1, (VAL == 6) ? "is verified" : "is not verified"
	VAL = 0
}
' input.txt > verify.txt

If input.txt contains the sample input you provided in post #5 in this thread, the text produced by the above script in verify.txt exactly matches the output you said you wanted.