Identifying missing file dates

Hi Experts,
I have written the below script to check the missing files based on the date in the file name from current date to in a given interval of days.
In the file names we have dates along with some name. ex:jera_sit_2017-04-25-150325.txt.
The below script is working fine if we have only one date in the filename.
It's not working if we have more than one date in the file name for example: jera_sit_2017-04-28-150325_2017-04-29-112227.txt
My script is considering as two files one is for 2017-04-28 and 2017-04-29.
Since the logic is ls -l ${path}/*${start}*
It should consider file is existed only for the last date in the file name i.e. 2017-04-29.
Could you please help me with the logic.

 
 #!/bin/ksh
file_path=/bin/daily/dtefiles
file_format=XXXX_XXX_YYYY-MM-DD-HHIISS.TXT
check_mode=1
back_days_hr_min_hr_min=5
 #checking before . extension
YearFormat=$(echo $file_format | sed 's/X//g;s/x//g;s/_*$//g;s/^_*//g' | awk 'BEGIN{FS=".";}{for (i = 1; i <= NF; i++){if ( $i ~ /YY/ ){print $i;}}}' | sed 's/X//g;s/x//g;s/_*$//g;s/^_*//g' | head -1)
YearFormat_count=$(echo $YearFormat | wc -l)
echo "YearFormat-$YearFormat"
if [[ $YearFormat_count -lt 1 ]]; then
echo "enter proper year format"
exit 0
fi
 #exit is missing here
month_count=0;y_count=0;d_count=0;hour_count=0;min_count=0;sec_count=0
for (( i=0; i<${#YearFormat}; i++ )); do
year=$(echo "${YearFormat:$i:1}")
if [[ $year == 'Y' ]];then
y_count=`expr $y_count + 1`
fi
if [[ $year == 'M' ]];then
month_count=`expr $month_count + 1`
fi
if [[ $year == 'D' ]];then
d_count=`expr $d_count + 1`
fi
if [[ $year == 'H' ]];then
hour_count=`expr $hour_count + 1`
fi
if [[ $year == 'I' ]];then
min_count=`expr $min_count + 1`
fi
if [[ $year == 'S' ]];then
sec_count=`expr $sec_count + 1`
fi
done
if [[ $month_count -ne 0 && $month_count -ne 2 ]] ; then
echo "enter correct month format"
exit 0
fi
if [[ $d_count -ne 0 && $d_count -ne 2 ]]; then
echo "enter correct date format"
exit 0
fi
if [[ $hour_count -ne 0 && $hour_count -ne 2 ]]; then
echo "enter correct hour format"
exit 0
fi
if [[ $min_count -ne 0 && $min_count -ne 2 ]]; then
echo "enter correct min format"
exit 0
fi
if [[ $sec_count -ne 0 && $sec_count -ne 2 ]]; then
echo "enter corrrect sec format"
exit 0
fi
 #Remove the duplicates
cleandateFormat=$(echo $YearFormat | tr -s 'A-Z') #YmD_HMS
echo $cleandateFormat
if [[ $y_count -eq 2 ]]; then
cleandateFormat=$(echo $cleandateFormat | sed 's/Y/y/g;s/M/m/g;s/D/d/g;s/I/M/g')
elif [[ $y_count -eq 4 ]]; then
cleandateFormat=$(echo $cleandateFormat | sed 's/M/m/g;s/D/d/g;s/I/M/g')
else
echo "enter correct year format"
exit 0
fi
 #maintain format including special character as seperator
finalFormat=""
for (( i=0; i<${#cleandateFormat}; i++ )); do
f_year=$(echo "${cleandateFormat:$i:1}")
echo "char : $f_year"
if [[ $f_year == [a-zA-Z] ]];then
temp=$(echo $f_year | sed 's/^/%/g')
finalFormat="$finalFormat$temp"
echo "IF : $finalFormat"
else
finalFormat="$finalFormat$f_year"
fi
done
 #checking missing files
if [[ $check_mode -eq 1 ]]
then
missing_count=0
day_count=0
while [[ $back_days_hr_min_hr_min -ne 0 ]]; do
finalFormat=$(echo "$finalFormat" | sed -r 's/[HMS]+//g;s/%*$//g;s/-*$//g;s/_*$//g')
start=`date +${finalFormat} -d "$back_days_hr_min_hr_min day ago"`
IFS=$','
for path in ${file_path}; do
count=$(ls -l ${path}/*${start}* 2>/dev/null | wc -l)
if [[ $count -gt 0 ]]
then
break
fi
done
unset IFS
if [[ $count -eq 0 ]]
then
missing_count=`expr $missing_count + 1`
file_name="${start}"
printf "$file_name\n"
fi
back_days_hr_min_hr_min=`expr $back_days_hr_min_hr_min - 1`
done >missingfiles.txt
fi
 

Thanks in advance.

---------- Post updated at 05:26 AM ---------- Previous update was at 02:18 AM ----------

Hi All,

Could you please help me.

Thanks in advance.

---------- Post updated 05-04-17 at 12:20 AM ---------- Previous update was 05-03-17 at 05:26 AM ----------

Hi All,

Could any body please help me.

Thanks

If after more than a day nobody can answer your question, you should consider rephrasing it to provide more clarity and understanding and help people help you.

Please become accustomed to provide decent context info of your problem.
It is always helpful to support a request with system info like OS and shell, related environment (variables, options), preferred tools, and adequate (representative) sample input and desired output data and the logics connecting the two, to avoid ambiguities and keep people from guessing.

Hi Rudic,
I have explained the requirement with expected result.
My requirement:
I want to find the missing files based on the date in the file name from current date to in a given interval of days.

 
 Case1 : Want to find missing files in 10 days with one date in the file name.
For example:
I have the below files in the directory /bin/daily/dtefiles
jera_sit_2017-04-24-150325.txt
jera_sit_2017-04-25-150325.txt
jera_sit_2017-04-26-141232.txt
jera_sit_2017-04-29-122344.txt
jera_sit_2017-05-02-122344.txt
jera_sit_2017-05-03-122344.txt
 In this case the missing files are for the dates 2017-04-27,2017-04-28,2017-04-30,2017-05-01.
 

The script I posted is working fine for case1.

 
 Case2:Want to find missing files in 10 days with more than one date in the file name.
For example:
I have the below files in the directory /bin/daily/msn_files
jera_msn_2017-04-28-150325_2017-04-29-112227_2017-04-29-112227.txt
jera_msn_2017-04-24-150325_2017-04-24-112227_2017-04-25-112227.txt
jera_msn_2017-04-24-150325_2017-04-26-112227_2017-04-26-112227.txt
jera_msn_2017-04-25-150325_2017-04-26-112227_2017-04-27-112227.txt
jera_msn_2017-04-30-150325_2017-04-30-112227_2017-05-01-112227.txt
 

In the above files names we have more than one date. It should consider only last date in the file name.
It should not consider other dates.
In this case the missing files are for the dates 2017-04-24,2017-04-28,2017-04-30,2017-05-02.
My script is not working for case2 .

 
 For case 1 Parameter values are.
file_path=/bin/daily/dtefiles
file_format=XXXX_XXX_YYYY-MM-DD-HHIISS.TXT
check_mode=1
back_days_hr_min_hr_min=10
 For case 2 Parameter values are.
file_path=/bin/daily/dtefiles
file_format=XXXX_XXX_XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX_YYYY-MM-DD-HHIISS.TXT
check_mode=1
back_days_hr_min_hr_min=10
 

Please help me.
Thanks in advance.

I'm afraid I don't understand your script. Does this

for FN in *.txt; do echo ${FN#${FN%_*.txt}_}; done
2017-04-25-112227.txt
2017-04-26-112227.txt
2017-04-27-112227.txt
2017-04-29-112227.txt
2017-05-01-112227.txt
2017-04-24-150325.txt
2017-04-25-150325.txt
2017-04-26-141232.txt
2017-04-29-122344.txt
2017-05-02-122344.txt
2017-05-03-122344.txt

give you the desired dates from ALL file names?

Hi Rudic,
Thanks for your script. However this is not serve my purpose.
I Want to find missing files in 10 days with more than one date in the file name.
I have the below files in the directory /bin/daily/msn_files

jera_msn_2017-04-28-150325_2017-04-29-112227_2017-04-29-112227.txt
jera_msn_2017-04-24-150325_2017-04-24-112227_2017-04-25-112227.txt
jera_msn_2017-04-24-150325_2017-04-26-112227_2017-04-26-112227.txt
jera_msn_2017-04-25-150325_2017-04-26-112227_2017-04-27-112227.txt
jera_msn_2017-04-30-150325_2017-04-30-112227_2017-05-01-112227.txt

In the above files names we have more than one date. It should consider only last date in the file name.
It should not consider other dates.
In this case the missing files are for the dates

2017-04-24
2017-04-28
2017-04-30
2017-05-02

Thanks in Advance.

Did you grasp what was done in post#4? The "only last date in the file name" for ALL files was presented to you for further processing. e.g finding missing files as you do in your script.

Hi Rudic,
In my script start will give date based on the parameter value of back_days_hr_min_hr_min.
For example if back_days_hr_min_hr_min=10
then start would be from yesterday 10 days back i.e.2017-04-28 then it will loop with one incremental date.

start will be  2017-04-28,2017-04-29,2017-04-30 up to 2017-05-07.

In my script I am checking for file is existed or not with the dates giving by start parameter value.

ls -l ${path}/*${start}*

since in the file name we have three dates in the file name jera_msn_2017-04-28-150325_2017-04-29-112227_2017-04-29-112227.txt.
Script will check

ls -l *2017-04-28*
ls -l *2017-04-29*

Hence its showing file is existed for 2017-04-28 and 2017-04-29.
But it should only last date and say file is existed for 2017-04-29.
Please help me.
Thanks in advance.

Why not

ls -l ${path}/*${start}.txt

?

Hi Rudic,

This one won't work becoz we have time stamp in the date i.e.2017-04-24-150325.txt.And also the file format is not constant.

Please help me.

Thanks

Why not

ls -l ${path}/*${start}-??????.txt

?

Hi nalu,
You need to learn to indent your code so you (and people trying to help you) can have a better chance of seeing the structure of your code and have a better chance of understanding what it is doing.

Despite what you have said about your code working correctly to find missing dates in the last 10 days in the single date format filenames (and refusing to tell us what operating system and shell you're using), one might guess that you're using an operating system that uses the GNU utilities date utility, a sed utility that accepts a -r option, and a 1993 or later version of ksh and that your code only works to find missing dates and times (to the second) for the last 5 (not 10) days. One might also guess that since the likelihood of finding a file with a date and time stamp that matches to the second is not high, your code doesn't really work as well as you have indicated or there is something else going on here that you have not explained to us.

And, you have not explained why there are variables like check_date in your code (which you have said is the entire purpose of your code). And, why do you need all of the complicated reformatting of your date format string arguments when the strings defining your format string are included in your script (not read from an external source). Why not just define the format string (or strings) in your source to begin with?

You say you want 10 days of dates checked for filenames with 1 date in the filename and for filenames with 2 dates in the filename, but you have not been at all clear about whether you want two separate lists or if you just want one list of dates not included in either filename format.

PLEASE (as RudiC requested in post #2 in this thread):

  1. tell us what operating system you're using (including the release number),
  2. tell us what shell you're using (including the version number), and
  3. tell us clearly what output you hope to produce:
    [list=i]
  4. one list or two lists,
  5. date match or date and timestamp match,
  6. do all of the files you want to process have filenames that end with characters in the format YYYY-mm-dd-HHMMSS.txt (i.e. all files end with a date and timestamp for the date you're interested in followed by the string .txt ) or are there other filename formats that need to be processed,
  7. is the number of days to be processed always 10 or is it a parameter to your script, and
  8. is there anything else that we need to know about what you're trying to do?
    [/list]