Step2:we should pick the file which has received date within the past 3 days.
step3:We should read the files which has text � S A M P L E � in the files starting at position 31 on line 19 .
step4:Once we identify the file which has text � S A M P L E � than pick that file and should look for ' GENERIC ' and copy all the data which has ' GENERIC ' to
other file.
Please become accustomed to provide decent context info of your problem.
It is always helpful to carefully and detailedly phrase a request, and to support it with system info like OS and shell, related environment (variables, options), preferred tools, adequate (representative) sample input and desired output data and the logics connecting the two including your own attempts at a solution, and, if existent, system (error) messages verbatim, to avoid ambiguities and keep people from guessing.
MIDNIGHT=$(date -d 0:0 +'%s')
((THREEDAYSAGO=MIDNIGHT - 3*3600*24))
for file in *_*_*.dat
do
FILESTAMP=${file#*_}
FILESTAMP=${FILESTAMP%_*}
if [[ $(date -d $FILESTAMP +'%s') -ge $THREEDAYSAGO ]]
then
awk '
NR==19{
if (substr($0,31,11) == "S A M P L E")
EC=1
exit
}
END { exit EC }' $file
if [ $? -eq 1 ]
then
# copy all data that has GENERIC to "other file"
grep "GENERIC" $file >> ./"other file"
fi
fi
done
Thanks for your reply .Can you please please explain(step by step) in detail?I'm new to scripting.
*When i try to execute MIDNIGHT=$(date -d 0:0 +'%s') the result is ' 1512363600 ' . what is this number?is it the date?
((THREEDAYSAGO=MIDNIGHT - 3*3600*24)) ..why we are doing 3*36*24 ?
*if we have 3 files selected in 3 days ,each file might have different string in line no#19 position 31,we should only select the file which has text � S A M P L E � in the file.how we are doing this?
*And also can you explain this logic:
for file in *_*_*.dat
do
FILESTAMP=${file#*_}
FILESTAMP=${FILESTAMP%_*}
if [[ $(date -d $FILESTAMP +'%s') -ge $THREEDAYSAGO ]]
then
awk '
NR==19{
if (substr($0,31,11) == "S A M P L E")
EC=1
exit
}
END { exit EC }' $file
While the community in here are happy to help people, be it simple or complex questions, the main objective is to help them help themselves. Amongst others, man pages are invaluable sources of info, e.g. man date :
This would answer your first question: 1512363600 is the number of seconds since "the epoque", of that day's midnight.
2. question: how many seconds does an hour have? how many hours a day?
3. man awk :
By default, lines are the records for awk , so NR==19 detects the 19th line (as requested)
so substr($0,31,11) will extract exactly that part of the line that needs to be compared to your sample text.
Hi Rudic Thank you for explanation.I understood your logic except the below.
As mentioned We will have multiple files in a folder with different file names we should pick only the files which start with abc_*_*.dat which came with in last 3 days.Can you please tell how we are doing this?
And also filetimestamp and the $THREEDAYSAGO values/format is fifferent .how we are comparing both? in if condition?Can you explain with example.
My first and most important advice is that you apply utmost care when creating a post - watch your spelling, upper / lower case, punctuation, and data accuracy. Why should anybody care more for your posts than you do? Why should anybody want to guess what your problems are?
In your post#9:
what is the C: for?
you write "files which start with abc_*_*.dat" but post sample files like ghf_20171204.054340.dat - neither the abc nor the second _ appears; so: which one is valid? The code doesn't work with a wrong pattern.
with "filetimestamp" you mean FILESTAMP ? The latter is not directly compared to $THREEDAYSAGO but converted to seconds before the compare.
The result for the cited code is unpredictable as it depends on the number of files in that directory containing at least one _ character (disregarding the .dat ending). As said in post#10, the pattern does NOT match your target files.
Look into your shell's (which version you fail to mention, btw) man page for "Parameter expansion; Remove matching prefix / suffix pattern." to see how ${file#*_} and ${FILESTAMP%_*} are used to extract the time stamp from the file names. Also, use echo in the loop to print out the intermediate states of the extraction.
In post#6 you see date -d $FILESTAMP +'%s' used to convert the time stamp into epoch seconds for the comparison.
In all of your early posts you said that files were named using the pattern abc_YYYYMMDD_hhmmss.dat and the code that RudiC suggested processes the files that match the pattern you supplied.
When you then try running the code he suggested in a directory that doesn't have any files that match that pattern, the code won't find any files that match the pattern you told us all that you wanted to process. This is a classic example of what happens when you supply sample data that doesn't match your real data.
Feel free to modify the code RudiC suggested to match the format of the real filenames you are trying to process. Do not blame RudiC for supplying code that looked for what you said should be looked for when no files in that format are present.
To make the code match the latest sample names you have provided, you'll need to change the filename matching pattern in the for statement and change the parameter expansion that is used to extract the date from the expansion of the shell FILESTAMP variable.
Note that if your *.dat files are not true text files (i.e., no lines longer than 2048 bytes including the terminating <newline> character, every line is terminated by a <newline> character, and contains no NUL characters), the awk script you're using may give you false negatives. If, as an example, each of your "lines" starts with a single 4-byte binary integer value, each of those four bytes could be misinterpreted by awk as a <newline> character. So, in this example, it is theoretically possible that the string S A M P L E might appear on what appears to awk to be any line between line 31 and line 155 in position 27, 28, 29, 30, or 31.
If we knew what operating system you're using, what shell you're using, and the actual format of the .dat files that are being processed by your script; there might be a way to write a shell script, an awk script, or a C program to reliably search those files for the strings you want to find.
The current form of the script will take all files that match your specified requirement (timestamp within 3 days and containing SAMPLE text) and extract the GENERIC text from them all this is appended to the 'other file'