How to find list of missing files based on the file format?

Hi All,

In the file names we have dates.
Based on the file format given by the user,
if any file is not existed for a particular date with in a given interval we should consider that file is missing.

I have the below files in the directory /bin/daily/voda_files.

asr_spir_2017-05-10-150325_2017-05-10-112227_2017-05-13-112227.txt
adb_voda_2017-05-11-150325_2017-05-10-112227_2017-05-13-112227.txt
adb_voda_2017-05-14-150325_2017-05-11-112227_2017-05-10-112227.txt
adb_voda_2017-05-12-150325_2017-05-12-112227_2017-05-11-112227
adb_voda_2017-05-16_2017-04-30_2017-05-01.txt
adb_voda_20170510.txt
adb_voda_2017-05-10.txt
2017-05-11
2017-05-10.txt
2017-05-12

If user entered

file_format=xxx_xxxx_YYYY-MM-DD-HHIISS_?????????????????_?????????????????.txt
prog_name="abd_voda_"
interval=10 (from current date -1 to 10 days back It should be from 2017-05-08 to 2017-05-17)

In this case it should consider the first date in the file.
The missing files output should be

adb_voda_2017-05-08
adb_voda_2017-05-09
adb_voda_2017-05-10
adb_voda_2017-05-12
adb_voda_2017-05-13
adb_voda_2017-05-15
adb_voda_2017-05-16
adb_voda_2017-05-17

If user entered

file_format=xxx_xxxx_?????????????????_?????????????????_YYYY-MM-DD-HHIISS.txt
prog_name="abd_voda_"
interval=10 (from current date -1 to 10 days back It should be from 2017-05-08 to 2017-05-17)

In this case it should consider the last date in the file.
The missing files output should be

adb_voda_2017-05-08
adb_voda_2017-05-09
adb_voda_2017-05-11
adb_voda_2017-05-12
adb_voda_2017-05-14
adb_voda_2017-05-15
adb_voda_2017-05-16
adb_voda_2017-05-17

If user entered

file_format=YYYY-MM-DD
prog_name=""
interval=10 (from current date -1 to 10 days back It should be from 2017-05-08 to 2017-05-17).

It should consider the files which are having only YYYY-MM-DD.
The missing files output should be

2017-05-08
2017-05-09
2017-05-10
2017-05-13
2017-05-14
2017-05-15
2017-05-16
2017-05-17

Please help me with the script.

Thanks in advance.

given your detailed description.... what have you tried and where exactly are you stuck?

Hi,

I have tried this but its not working as expected.

file_format=xxx_xxxx_?????????????????_?????????????????_YYYY-MM-DD-HHIISS.txt
prog_name="abd_voda_"
interval=10

 #checking before . extension
YearFormat=$(echo $file_format | sed 's/X//g;s/x//g;s/_*$//g;s/^_*//g' | awk 'BEGIN{FS=".";}{for (i = 1; i <= NF; i++){if ( $i ~ /YY/ ){print $i;}}}' | sed 's/X//g;s/x//g;s/_*$//g;s/^_*//g' | head -1)
YearFormat_count=$(echo $YearFormat | wc -l)
echo "YearFormat-$YearFormat"
if [[ $YearFormat_count -lt 1 ]]; then
echo "enter proper year format"
exit 0
fi
 #exit is missing here
month_count=0;y_count=0;d_count=0;hour_count=0;min_count=0;sec_count=0
for (( i=0; i<${#YearFormat}; i++ )); do
year=$(echo "${YearFormat:$i:1}")
if [[ $year == 'Y' ]];then
y_count=`expr $y_count + 1`
fi
if [[ $year == 'M' ]];then
month_count=`expr $month_count + 1`
fi
if [[ $year == 'D' ]];then
d_count=`expr $d_count + 1`
fi
if [[ $year == 'H' ]];then
hour_count=`expr $hour_count + 1`
fi
if [[ $year == 'I' ]];then
min_count=`expr $min_count + 1`
fi
if [[ $year == 'S' ]];then
sec_count=`expr $sec_count + 1`
fi
done

 #Remove the duplicates
cleandateFormat=$(echo $YearFormat | tr -s 'A-Z') #YmD_HMS
echo $cleandateFormat
if [[ $y_count -eq 2 ]]; then
cleandateFormat=$(echo $cleandateFormat | sed 's/Y/y/g;s/M/m/g;s/D/d/g;s/I/M/g')
elif [[ $y_count -eq 4 ]]; then
cleandateFormat=$(echo $cleandateFormat | sed 's/M/m/g;s/D/d/g;s/I/M/g')
else
echo "enter correct year format"
exit 0
fi
 #maintain format including special character as seperator
finalFormat=""
for (( i=0; i<${#cleandateFormat}; i++ )); do
f_year=$(echo "${cleandateFormat:$i:1}")
echo "char : $f_year"
if [[ $f_year == [a-zA-Z] ]];then
temp=$(echo $f_year | sed 's/^/%/g')
finalFormat="$finalFormat$temp"
echo "IF : $finalFormat"
else
finalFormat="$finalFormat$f_year"
fi
done

if [[ $check_mode -eq 1 ]]
then
missing_count=0
day_count=0
while [[ $interval -ne 0 ]]; do
finalFormat=$(echo "$finalFormat" | sed -r 's/[HMS]+//g;s/%*$//g;s/-*$//g;s/_*$//g')
start=`date +${finalFormat} -d "$interval day ago"`
IFS=$','
for path in ${file_path}; do
count=$(ls -l /bin/daily/voda_files/${prog_name}*${start}* 2>/dev/null | wc -l)
if [[ $count -gt 0 ]]
then
break
fi
done
unset IFS
if [[ $count -eq 0 ]]
then
missing_count=`expr $missing_count + 1`
file_name="${start}"
printf "$file_name\n"
fi
interval=`expr $interval - 1`
done >missingfiles.txt
fi

Please help me.
Thanks in advance.

"its not working as expected" doesn't really answer vgersh99's Q: "where exactly are you stuck".
And, in recent threads you received hints and examples of data & date manipulation which I don't find back in above. Were all those posts in vain?

Your requirements here (as in your previous three threads on this topic) are confusing and incomplete.

I have no idea how you expect to match a filename to the second (as required by the format strings you are using that specify not only year, month, and day but also hour minute and second) which you then compare to the year, month, day (for the previous 10 days) and the hour, minute, and second at the time at which you run your script. How will you guarantee that you are running your script at exactly 15:03:25 when you are looking for matches for the 1st dates in your filenames and at exactly 11:22:27 when you are looking for matches for the last dates in your filenames?

If your input filename samples:

asr_spir_2017-05-10-150325_2017-05-10-112227_2017-05-13-112227.txt
adb_voda_2017-05-11-150325_2017-05-10-112227_2017-05-13-112227.txt
adb_voda_2017-05-14-150325_2017-05-11-112227_2017-05-10-112227.txt
adb_voda_2017-05-12-150325_2017-05-12-112227_2017-05-11-112227
adb_voda_2017-05-16_2017-04-30_2017-05-01.txt
adb_voda_20170510.txt
adb_voda_2017-05-10.txt
2017-05-11
2017-05-10.txt
2017-05-12

are correct, and you want to match filenames starting with adb_voda_ with the dates marked in red, and ending with .txt , it would seem that the format you feed into your script should be:

adv_voda_YYYY-MM-DD-??????_????-??-??-??????_????-??-??-??????.txt

which your code would then convert to the date format string:

adv_voda_%Y-%m-%d-??????_????-??-??-??????_????-??-??-??????.txt

and date would then create a pathname matching pattern from that that would match the file(s) you want to select for a given date without a prefix pattern and without asterisks that have cause you problems in all of your previous threads (as well as in this thread).

All of the code you have that is stripping off _ s, and X s and ? s seems to be fighting against matching only the filenames you want to match.

Similarly, if you wanted to match the last date in those files (marked in blue), it would seem that you want the input format string to be:

adv_voda_????-??-??-??????_????-??-??-??????_YYYY-DD-MM-??????.txt

which your code would then convert to the date format string:

adv_voda_????-??-??-??????_????-??-??-??????_%Y-%m-%d-??????.txt

Hi don,

Thanks a lot for your response.

could you lease help me how to convert this

tadv_voda_YYYY-MM-DD-??????_????-??-??-??????_????-??-??-??????.txt 

to date format string as you said.

which your code would then convert to the date format string:

adv_voda_%Y-%m-%d-??????_????-??-??-??????_????-??-??-??????.txt

Thanks in advance.

---------- Post updated at 05:55 AM ---------- Previous update was at 04:30 AM ----------

Hi don,

I have done the code to convert to the date format string as below.

adv_voda_%Y-%m-%d-??????_????-??-??-??????_????-??-??-??????.txt

Now how can I search for missing dates.
Could you lease help me.

What changes I have to do in the below code.

missing_count=0
day_count=0
while [[ $interval -ne 0 ]]; do
finalFormat=$(echo "$finalFormat" | sed -r 's/[HMS]+//g;s/%*$//g;s/-*$//g;s/_*$//g')
start=`date +${finalFormat} -d "$interval day ago"`
IFS=$','
for path in ${file_path}; do
count=$(ls -l /bin/daily/voda_files/${prog_name}*${start}* 2>/dev/null | wc -l)
if [[ $count -gt 0 ]]
then
break
fi
done
unset IFS
if [[ $count -eq 0 ]]
then
missing_count=`expr $missing_count + 1`
file_name="${start}"
printf "$file_name\n"
fi
interval=`expr $interval - 1`
done >missingfiles.txt
fi

Please help me.
Thanks in advance.

What operating system (including release number) are you using?

What shell (including version number) are you using?

Do you have a ksh (version 93u+ or later) that I can use instead of whatever shell you're using to provide an example?

Hi Don,

Pleasde find the details as below.

What operating system (including release number) are you using?

Linux dev.voda.mp.com 2.6.18-400.1.1.el5 #1 SMP Sun Dec 14 06:01:17 EST 2014 x86_64 x86_64 x86_64 GNU/Linux

What shell (including version number) are you using?

ksh

Do you have a ksh (version 93u+ or later) that I can use instead of whatever shell you're using to provide an example?

version         sh (AT&T Research) 93u+ 2010-06-21

Please help me to convert to the user file format to date format string as below.

adv_voda_%Y-%m-%d-??????_????-??-??-??????_????-??-??-??????.txt

Thanks in advance

The following script is a quick and dirty demonstration producing results similar to what your script seemed to be trying to do. It takes the number of days to examine and the file pattern as command line arguments and prints the date in %Y-%m-%d format for dates that had no file matching the given pattern. It only uses ksh built-ins (without invoking awk , date , expr , head , sed , or wc ). The missing dates are printed to standard output from this script since the redirection to missingfiles.txt is commented out.

#!/bin/ksh
IAm=${0##*/}

check_mode=1
count=0

if [[ $# -ne 2 ]]
then	printf 'Usage: %s interval filename_format\n' "$IAm" >&2
	exit 1
fi

interval=$1
file_format=$2

date_pattern=${file_format//YYYY/%Y}
date_pattern=${date_pattern//YY/%y}
if [[ "$date_pattern" == "$file_format" ]]
then	printf '%s: Invalid date format: No "YYYY" or "YY" found:\n\t"%s"\n' \
	    "$IAm" "$file_format" >&2
	exit 2
fi
date_pattern=${date_pattern//MM/%m}
date_pattern=${date_pattern//DD/%d}
date_pattern=${date_pattern//HH/%H}
date_pattern=${date_pattern//II/%M}
date_pattern=${date_pattern//SS/%S}

printf '%s: Processing interval:%d & date_pattern:\n\t"%s"\nfrom file_format:\n\t"%s"\n' \
    "$IAm" "$interval" "$date_pattern" "$file_format"

if [[ $check_mode -eq 1 ]]
then	while [[ $interval -ne 0 ]]
	do	file_pattern=$(printf "%($date_pattern)T\n" "$interval day ago")
		for path in $file_pattern
		do	if [[ -f $path ]]
			then	count=1
				break
			fi
		done
		if [[ $count -eq 0 ]]
		then	printf '%(%Y-%m-%d)T\n' "$interval day ago"
		else	count=0
		fi
		interval=$((interval - 1))
	done # >missingfiles.txt
fi

In a directory containing the files:

total 32
-rw-r--r--  1 dwc  staff     0 May 18 10:36 2017-05-10.txt
-rw-r--r--  1 dwc  staff     0 May 18 10:36 2017-05-11
-rw-r--r--  1 dwc  staff     0 May 18 10:36 2017-05-12
-rw-r--r--  1 dwc  staff     0 May 18 10:36 adb_voda_2017-05-10.txt
-rw-r--r--  1 dwc  staff     0 May 18 10:36 adb_voda_2017-05-11-150325_2017-05-10-112227_2017-05-13-112227.txt
-rw-r--r--  1 dwc  staff     0 May 18 10:36 adb_voda_2017-05-12-150325_2017-05-12-112227_2017-05-11-112227
-rw-r--r--  1 dwc  staff     0 May 18 10:36 adb_voda_2017-05-14-150325_2017-05-11-112227_2017-05-10-112227.txt
-rw-r--r--  1 dwc  staff     0 May 18 10:36 adb_voda_2017-05-16_2017-04-30_2017-05-01.txt
-rw-r--r--  1 dwc  staff     0 May 18 10:36 adb_voda_20170510.txt
-rw-r--r--  1 dwc  staff     0 May 18 10:36 asr_spir_2017-05-10-150325_2017-05-10-112227_2017-05-13-112227.txt
-rwxr-xr-x  1 dwc  staff   704 May 19 04:04 driver
-rw-r--r--  1 dwc  staff  6471 May 18 10:40 problem
-rwxr-xr-x  1 dwc  staff  1108 May 19 03:46 tester

where the above script is named tester , the command:

./tester 10 "adb_voda_YYYY-MM-DD-??????_????-??-??-??????_????-??-??-??????.txt"

when run on May 19, 2017 produces the output:

tester: Processing interval:10 & date_pattern:
	"adb_voda_%Y-%m-%d-??????_????-??-??-??????_????-??-??-??????.txt"
from file_format:
	"adb_voda_YYYY-MM-DD-??????_????-??-??-??????_????-??-??-??????.txt"
2017-05-09
2017-05-10
2017-05-12
2017-05-13
2017-05-15
2017-05-16
2017-05-17
2017-05-18

and the command:

./tester 15 "YYYY-MM-DD"

produces the output:

tester: Processing interval:15 & date_pattern:
	"%Y-%m-%d"
from file_format:
	"YYYY-MM-DD"
2017-05-04
2017-05-05
2017-05-06
2017-05-07
2017-05-08
2017-05-09
2017-05-10
2017-05-13
2017-05-14
2017-05-15
2017-05-16
2017-05-17
2017-05-18

and the command:

./tester 10 "adb_voda_????-??-??-??????_????-??-??-??????_YYYY-MM-DD-??????.txt"

produces the output:

tester: Processing interval:10 & date_pattern:
	"adb_voda_????-??-??-??????_????-??-??-??????_%Y-%m-%d-??????.txt"
from file_format:
	"adb_voda_????-??-??-??????_????-??-??-??????_YYYY-MM-DD-??????.txt"
2017-05-09
2017-05-11
2017-05-12
2017-05-14
2017-05-15
2017-05-16
2017-05-17
2017-05-18

Hopefully, you can modify this to get something that works for you.

Hi Don,

Thanks a lot.