Should pick latest file within past 3 days using UNIX script and perform steps in message below.

sunnykamal59 · December 4, 2017, 4:26pm

Hi ,
Can anyone help me how do perform below requirement in unix.

Step1:we will receive multiple files weekly with same name(as below) in a folder(In folder we will have other files also def.dat,ghf.dat)

Filenames:

1) abc_20171204_052389.dat
2) abc_20171204_052428.dat

Step2:we should pick the file which has received date within the past 3 days.

step3:We should read the files which has text � S A M P L E � in the files starting at position 31 on line 19 .

step4:Once we identify the file which has text � S A M P L E � than pick that file and should look for ' GENERIC ' and copy all the data which has ' GENERIC ' to
other file.

Corona688 · December 4, 2017, 4:27pm

Step 1 is a problem since you can't have more than one file with the same name.

RudiC · December 4, 2017, 4:50pm

Welcome to the forum.

Please become accustomed to provide decent context info of your problem.

It is always helpful to carefully and detailedly phrase a request, and to support it with system info like OS and shell, related environment (variables, options), preferred tools, adequate (representative) sample input and desired output data and the logics connecting the two including your own attempts at a solution, and, if existent, system (error) messages verbatim, to avoid ambiguities and keep people from guessing.

sunnykamal59 · December 4, 2017, 5:01pm

Sorry filenames recieved will have timestamp as below.

Step1:we will receive multiple files with different time stamps weekly with same name(as below) in a folder.

Filenames:

1) abc_20171204_052389.dat
2) abc_20171204_052428.dat

Chubler_XL · December 4, 2017, 10:03pm

How about this:

MIDNIGHT=$(date -d 0:0 +'%s')
((THREEDAYSAGO=MIDNIGHT - 3*3600*24))

for file in *_*_*.dat
do
   FILESTAMP=${file#*_}
   FILESTAMP=${FILESTAMP%_*}
   if [[ $(date -d $FILESTAMP +'%s') -ge $THREEDAYSAGO ]]
   then
       awk '
           NR==19{
              if (substr($0,31,11) == "S A M P L E")
                 EC=1
              exit
           }
           END { exit EC }' $file

       if [ $? -eq 1 ] 
       then
           # copy all data that has GENERIC to "other file"
           grep "GENERIC" $file >> ./"other file"
       fi
   fi
done

sunnykamal59 · December 4, 2017, 10:35pm

Thanks for your reply .Can you please please explain(step by step) in detail?I'm new to scripting.

*When i try to execute MIDNIGHT=$(date -d 0:0 +'%s') the result is ' 1512363600 ' . what is this number?is it the date?

((THREEDAYSAGO=MIDNIGHT - 3*3600*24)) ..why we are doing 3*36*24 ?

*if we have 3 files selected in 3 days ,each file might have different string in line no#19 position 31,we should only select the file which has text � S A M P L E � in the file.how we are doing this?

*And also can you explain this logic:

for file in *_*_*.dat
do
   FILESTAMP=${file#*_}
   FILESTAMP=${FILESTAMP%_*}
   if [[ $(date -d $FILESTAMP +'%s') -ge $THREEDAYSAGO ]]
   then
       awk '
           NR==19{
              if (substr($0,31,11) == "S A M P L E")
                 EC=1
              exit
           }
           END { exit EC }' $file

RudiC · December 5, 2017, 4:59am

While the community in here are happy to help people, be it simple or complex questions, the main objective is to help them help themselves. Amongst others, man pages are invaluable sources of info, e.g. man date :

This would answer your first question: 1512363600 is the number of seconds since "the epoque", of that day's midnight.
2. question: how many seconds does an hour have? how many hours a day?
3. man awk :

By default, lines are the records for awk , so NR==19 detects the 19th line (as requested)

so substr($0,31,11) will extract exactly that part of the line that needs to be compared to your sample text.

I think the logics should be clear by now.

rovf · December 5, 2017, 10:27am

True, but if you look closely, you can see that the first file has a leading space in its name.

sunnykamal59 · December 5, 2017, 10:59am

Hi Rudic Thank you for explanation.I understood your logic except the below.

As mentioned We will have multiple files in a folder with different file names we should pick only the files which start with abc_*_*.dat which came with in last 3 days.Can you please tell how we are doing this?

And also filetimestamp and the $THREEDAYSAGO values/format is fifferent .how we are comparing both? in if condition?Can you explain with example.

Filenames in folder:

1)abc_20171204.052389.dat
2)abc_20171204.052428.dat
3)def_20171204.052440.dat
4)ghf_20171204.054340.dat

C:

for file in *_*_*.dat
do
   FILESTAMP=${file#*_}
   FILESTAMP=${FILESTAMP%_*}

RudiC · December 5, 2017, 1:22pm

My first and most important advice is that you apply utmost care when creating a post - watch your spelling, upper / lower case, punctuation, and data accuracy. Why should anybody care more for your posts than you do? Why should anybody want to guess what your problems are?
In your post#9:

what is the C: for?
you write "files which start with abc_*_*.dat" but post sample files like ghf_20171204.054340.dat - neither the abc nor the second _ appears; so: which one is valid? The code doesn't work with a wrong pattern.
with "filetimestamp" you mean FILESTAMP ? The latter is not directly compared to $THREEDAYSAGO but converted to seconds before the compare.

sunnykamal59 · December 5, 2017, 1:45pm

I agree.Below are the file names we receive in folder.

Filename in a folder:

abc_20171204.052389.dat
abc_20171204.052428.dat
def_20171204.052440.dat
ghf_20171204.054340.dat

Can you please explain how we are getting the 'FILESTAMP' from above filenames and converting into seconds?

and what will be the result for below code?

for file in *_*_*.dat
do
   FILESTAMP=${file#*_}
   FILESTAMP=${FILESTAMP%_*}

RudiC · December 5, 2017, 2:40pm

The result for the cited code is unpredictable as it depends on the number of files in that directory containing at least one _ character (disregarding the .dat ending). As said in post#10, the pattern does NOT match your target files.

Look into your shell's (which version you fail to mention, btw) man page for "Parameter expansion; Remove matching prefix / suffix pattern." to see how ${file#*_} and ${FILESTAMP%_*} are used to extract the time stamp from the file names. Also, use echo in the loop to print out the intermediate states of the extraction.
In post#6 you see date -d $FILESTAMP +'%s' used to convert the time stamp into epoch seconds for the comparison.

sunnykamal59 · December 5, 2017, 11:11pm

Hopefully this will be my last question .

1)If we have multiple set of files in a folder like below, how we are looping each file for doing comparison?

abc_20171204.052389.dat
abc_20171203.052428.dat
abc_20171202.052628.dat

2)for looping are we doing below logic in script?

for file in *_*_*.dat
do
   FILESTAMP=${file#*_}
   FILESTAMP=${FILESTAMP%_*}

3)if so can you please explain the above code with an example?

Don_Cragun · December 6, 2017, 2:53am

sunnykamal59:

Hopefully this will be my last question .

1)If we have multiple set of files in a folder like below, how we are looping each file for doing comparison?
abc_20171204.052389.dat
abc_20171203.052428.dat
abc_20171202.052628.dat
2)for looping are we doing below logic in script?
for file in *_*_*.dat
do
   FILESTAMP=${file#*_}
   FILESTAMP=${FILESTAMP%_*}
3)if so can you please explain the above code with an example?

In all of your early posts you said that files were named using the pattern abc_YYYYMMDD_hhmmss.dat and the code that RudiC suggested processes the files that match the pattern you supplied.

When you then try running the code he suggested in a directory that doesn't have any files that match that pattern, the code won't find any files that match the pattern you told us all that you wanted to process. This is a classic example of what happens when you supply sample data that doesn't match your real data.

Feel free to modify the code RudiC suggested to match the format of the real filenames you are trying to process. Do not blame RudiC for supplying code that looked for what you said should be looked for when no files in that format are present.

To make the code match the latest sample names you have provided, you'll need to change the filename matching pattern in the for statement and change the parameter expansion that is used to extract the date from the expansion of the shell FILESTAMP variable.

RudiC · December 6, 2017, 11:06am

To make it simply clear:

for file in *_*_*.dat won't match any of your files. Try *_*.*.dat instead.

FILESTAMP=${file#*_} will result in 20171204.052389.dat if $file contains abc_20171204.052389.dat

FILESTAMP=${FILESTAMP%%.*} will yield 20171204 from above. Watch the double %% sign, and the . in lieu of _ .

From your perpetuating identical questions I don't get the feeling you tried any analysing / understanding of the proposal you were given.

sunnykamal59 · December 6, 2017, 11:42pm

thank you. I ran the whole script .

output in the other file has 
"Binary file abc_20171204.052389.dat matches"

Script is not coping the contents that has 'GENERIC' to other file.

Chubler_XL · December 7, 2017, 12:24am

It looks like your file has some characters that are non-text and grep is assuming the file is binary.

You can force grep to treat your file as text with the -a option:

eg: replace your grep command in the script as follows:

       if [ $? -eq 1 ] 
       then
           # copy all data that has GENERIC to "other file"
           grep -a "GENERIC" $file >> ./"other file"
       fi

sunnykamal59 · December 7, 2017, 1:27pm

Thank you it worked and successfully created a file.

If we have multiple files in the directory which has text

'S A M P L E'

in the files starting at

position 31 on line 19

.What will happen?Will script picks the latest file and process it and than exits?Please guide.

Don_Cragun · December 7, 2017, 3:52pm

Note that if your *.dat files are not true text files (i.e., no lines longer than 2048 bytes including the terminating <newline> character, every line is terminated by a <newline> character, and contains no NUL characters), the awk script you're using may give you false negatives. If, as an example, each of your "lines" starts with a single 4-byte binary integer value, each of those four bytes could be misinterpreted by awk as a <newline> character. So, in this example, it is theoretically possible that the string S A M P L E might appear on what appears to awk to be any line between line 31 and line 155 in position 27, 28, 29, 30, or 31.

If we knew what operating system you're using, what shell you're using, and the actual format of the .dat files that are being processed by your script; there might be a way to write a shell script, an awk script, or a C program to reliably search those files for the strings you want to find.

Chubler_XL · December 7, 2017, 5:44pm

sunnykamal59:

Thank you it worked and successfully created a file.

If we have multiple files in the directory which has text
'S A M P L E' 
in the files starting at
position 31 on line 19
.What will happen?Will script picks the latest file and process it and than exits?Please guide.

The current form of the script will take all files that match your specified requirement (timestamp within 3 days and containing SAMPLE text) and extract the GENERIC text from them all this is appended to the 'other file'