Check file availability and place flag file

I have to check a directory on Linux (via shell Script which I am trying to build) for about 20 different source files with file patterns and if the files are made available in the directory, I should place flag files for which my other ETL jobs are waiting on to kick off. If the source files are not available in the desired directory, then the script should sleep for some time and then try looking for these files again. However, since the filenames changes every week with its week numbers or timestamps on the files eg filename04202014.CSV for this week and filename04272014.CSV for next week, I am having a bit of challenge to create this "Kicker script". Any good suggestions on how this can be achieved?

You are much more likely to get a response to your question if you fill in a lot more of the details:

  1. Where are these source file located?
  2. How is this script supposed to determine which source files it is trying to find?
  3. What does "place flag files" mean?
    [list=a]
  4. What is the name of a flag file?
  5. Where does the flag file go?
  6. What needs to be in the flag file?
    [/list]
  7. How long is "some time"?
  8. Why create a flag file instead of just kicking off the ETL job?
  9. How do you kick off an ETL job?

Our ETL tool is installed on Linux and currently many of the jobs such as (a)formatting the files received by us (as per our requirement) from different sources/users, (b) placing the indicator/flag file are done manually. We are now trying to automate this process. We do not know when these files will be made available, hence we keep looking for the files received and if the files arrive, we manually place the indicator/flag file saying that the files from these and these sources are available which in turn initiates the ETL jobs depending on the flag file placed.

One source/user can send multiple source files, hence the script should look for all the source files before running its corresponding job.

See the answers for your questions below:

Where are these source file located? These are dropped in a directory on Linux by different sources/users. This directory is created specifically for the source files and serves as a location for the ETL process to pick up these files.

How is this script supposed to determine which source files it is trying to find? I was planning to use the file pattern of all these files.

What does "place flag files" mean? A specific indicator file for different sets of source files which indicates that the source files are available and the ETL job can be started to load these files into our database.

What is the name of a flag file? This will depend on the sources from which we have received the file. Based on this file ETL will start the corresponding job.

Where does the flag file go? I have created a directory where all the flag files go. ETL job will be looking for this file before kicking off the corresponding processes.

What needs to be in the flag file? An empty flag file will do. These are currently created manually by our team. We just use a touch command to create these flag files.

How long is "some time"? It is the sleep time for the script. The script should wait for this time and then start looking for the source files again. Did not decide the sleep time yet, but I think for now 5 minutes will work.

Why create a flag file instead of just kicking off the ETL job? The files that we receive are not formatted properly. This part is currently being done manually, which I am trying to take care of in an automated shell script.

How do you kick off an ETL job? ETL tool is on Linux machine, so it is using different utilities provided by the tool (eg. PMCMD) to kick off the process.

Please let me know if you need further clarification on the above questions.

I wanted specifics. You provided generalities. With what you have given us, the script needs to be something like:

cd somewhere
while [ 1 ]
do      for file in somepattern
        do      if [ ! FlagFileExistsCorrespondingTo$file ]
                then    CreateFlagFileCorrespondingTo$file
                fi
        done
        if [ AllExpectedFilesFound ]
        then    break
        else    sleep 300
        fi
done
1 Like

@Don, I tried the script in the following way, but I am stuck in the middle part where I need to check the conditions for each request flag separately. Could you please help me out.

while [ 2 -ge 1 ]
do

        if [ -f /dir1/dir2/dir3/break.timx ]
        then
                echo "The Script is ending as a request has been made to break the script"
                exit $SUCCESS
        else
        req_file_list=`ls /dir1/dir2/dir3/ReqFiles`
                                echo $req_file_list

                        atleast_one_req_file_processed=0

        for req_file in $req_file_list
        do
                if [ $atleast_one_req_file_processed -eq 0 ]
                then
                        grep $req_file /dir1/dir2/dir3/flags_info.txt > /dir1/dir2/dir3/req_file_current_session.txt
		# /dir1/dir2/dir3/flags_info.txt is a fixed file and has all the information on the request flags and the conditions the script expects and should look on before placing the indicator/flag file. 
                        atleast_one_req_file_processed=`expr $atleast_one_req_file_processed + 1`
                        echo $atleast_one_req_file_processed
                else
                        grep $req_file /dir1/dir2/dir3/flags_info.txt >> /dir1/dir2/dir3/req_file_current_session.txt
                fi

        done
Stuck at this part, as I am not sure how to check conditions for each request file separately 
		sleep 300
        fi
done

Example of the /dir1/dir2/dir3/flags_info.txt is given below:

cat /dir1/dir2/dir3/flags_info.txt

request_flags|source_name|job_name|file_pattern|file_count|indicator_flag_file
req1.req|Sourcename1|jobname1|file_pattern_1|3|ind1.ind
req2.req|Sourcename2|jobname2|file_pattern_2|6|ind2.ind
req2.req|Sourcename2|jobname2|file_pattern_A|3|ind2.ind
req3.req|Sourcename3|jobname3|file_pattern_3|1|ind3.ind
req3.req|Sourcename3|jobname3|file_pattern_Z|1|ind3.ind
req4.req|Sourcename4|jobname4|file_pattern_4|28|ind4.ind
req5.req|Sourcename5|jobname5|file_pattern_5|4|ind5.ind

I'm afraid I cannot help you as the logics required escape me. I don't see the directories' structures, nor the request files / flags names, nor what they trigger, and when. The flags_info.txt file is fixed, I presume. Do you need/use its contents to schedule the ETL jobs?

Anyhow, there quite some opportunities in your script above:

  • change your while and if to until [ -f ...break.timx]
  • you don't need the req_file_list variable, nor the for loop. Depending on your shell, you might grep for the ls result immediately, and set the at_least_one variable according to its exit code.
    Try
until [ -f /dir1/dir2/dir3/break.timx ]
    do ls /dir1/dir2/dir3/ReqFiles |grep -f - /dir1/dir2/dir3/flags_info.txt >xx
       atleast_one_req_file_processed=$?
    done

If you describe your needs a bit more in detail, we might be able to add value to your problem.

1 Like

@Don, @RudiC - Thank you for the help. I will keep these points in mind while developing the script.