Concatenation of files with same naming patterns dynamically

Jesshelle_David · August 22, 2016, 6:23pm

Since my last threads were closed on account of spamming, keeping just this one opened!

Hi,

I have the following reports that get generated every 1 hour and this is my requirement:

5 reports get generated every hour with the names

"Report.Dddmmyy.Thhmiss.CTLR"
"Report.Dddmmyy.Thhmiss.ACCD"
"Report.Dddmmyy.Thhmiss.BCCD"
"Report.Dddmmyy.Thhmiss.CCCD"
"Report.Dddmmyy.Thhmiss.DDDD"

At the end of the day I need to concatenate all reports with similar names into a single file.
For example: 24 files for pattern "Report.Dddmmyy.Thhmiss.CTLR" into a single file called "Test.dat" and so on.

3.Please note that the last four letters of the file name will be a constant and the reports have to be concatenated using this.

I do not want to parameterize just these last four letters or list the last four letter as ls -l *CTLR* as this will be hard coding so could there be a better way of handling this!?

Kindly advise.

Thanks,
Jess

vgersh99 · August 22, 2016, 6:34pm

something along these lines assuming you're in the directory where the files are located - not tested...

#!/bin/ksh

today=$(date +%d%m%y)

ls -1 Report.D${today}* | awk -F. '
{ a[$NF]}
END {for (i in a) print i}
' | while read ext junk
do
   cat Report.D${today}*.${ext} > DailyReport.D${today}.${ext}
done

rovf · August 23, 2016, 1:21am

Here a simple solution for the Z-Shell (since you didn't indicate any preference to a shell, I assume that you are open, which one to use):

Assume that you have parameterized the file extension using an array, for instance

extensions=(ACCD BCCD CCCD DCCD)

You can now produce a file consisting of the catenation of the individual files by

cat Report.Dddmmyy.Thhmiss.${^extensions}

This assumes that these files all exist, i.e. if one of the files doesn't exist, you get an error message. It is also possible to provide a solution, where non-existing files are siltently skipped, but this is not in the scope of your question.

Don_Cragun · August 23, 2016, 3:47am

jesshelle david:

Since my last threads were closed on account of spamming, keeping just this one opened!

Hi,

I have the following reports that get generated every 1 hour and this is my requirement:

5 reports get generated every hour with the names
"Report.Dddmmyy.Thhmiss.CTLR"
"Report.Dddmmyy.Thhmiss.ACCD"
"Report.Dddmmyy.Thhmiss.BCCD"
"Report.Dddmmyy.Thhmiss.CCCD"
"Report.Dddmmyy.Thhmiss.DDDD"
At the end of the day I need to concatenate all reports with similar names into a single file.
For example: 24 files for pattern "Report.Dddmmyy.Thhmiss.CTLR" into a single file called "Test.dat" and so on.

3.Please note that the last four letters of the file name will be a constant and the reports have to be concatenated using this.

I do not want to parameterize just these last four letters or list the last four letter as ls -l *CTLR* as this will be hard coding so could there be a better way of handling this!?

Kindly advise.

Thanks,
Jess

I don't understand what you're trying to do. I get that at sometime shortly before midnight on August 23, 2016 you will have 24 files matching the pattern Report.D230816.T??????.CTLR , 24 files matching the pattern Report.D230816.T??????.ACCD , 24 files matching the pattern Report.D230816.T??????.BCCD , 24 files matching the pattern Report.D230816.T??????.CCCD , and 24 files matching the pattern Report.D230816.T??????.DDDD .

And I get that you want to copy the contents of the 24 files matching the pattern Report.D230816.T??????.CTLR into a file named Test.dat . I don't know what "and so on" means? Are you saying that you want to overwrite that file four times so at that at midnight on the morning of August 24, 2016, the file Test.dat will contain the contents of the 24 files matching the pattern Report.D230816.T??????.DDDD ? This doesn't make any sense to me, but that seems to be what you are requesting.

And, you say that the script you are writing can't have the strings CTLR , ACCD , BCCD , CCCD , and DDDD built in. Is this because some days may have additional filename extensions? Is it because the set of filename extensions changes from day to day? Are all of the extensions you want to process four characters long? If not, how do we determine which files are supposed to be processed? (And, if the *.CTLR files are supposed to be processed before the *.ACCD files (i.e., not in alphabetic order), if the extensions aren't built into the script, how is the script supposed to know which extension should be processed first?

Jesshelle_David · August 24, 2016, 4:33pm

don cragun:

I don't understand what you're trying to do. I get that at sometime shortly before midnight on August 23, 2016 you will have 24 files matching the pattern Report.D230816.T??????.CTLR , 24 files matching the pattern Report.D230816.T??????.ACCD , 24 files matching the pattern Report.D230816.T??????.BCCD , 24 files matching the pattern Report.D230816.T??????.CCCD , and 24 files matching the pattern Report.D230816.T??????.DDDD .

And I get that you want to copy the contents of the 24 files matching the pattern Report.D230816.T??????.CTLR into a file named Test.dat . I don't know what "and so on" means? Are you saying that you want to overwrite that file four times so at that at midnight on the morning of August 24, 2016, the file Test.dat will contain the contents of the 24 files matching the pattern Report.D230816.T??????.DDDD ? This doesn't make any sense to me, but that seems to be what you are requesting.

And, you say that the script you are writing can't have the strings CTLR , ACCD , BCCD , CCCD , and DDDD built in. Is this because some days may have additional filename extensions? Is it because the set of filename extensions changes from day to day? Are all of the extensions you want to process four characters long? If not, how do we determine which files are supposed to be processed? (And, if the *.CTLR files are supposed to be processed before the *.ACCD files (i.e., not in alphabetic order), if the extensions aren't built into the script, how is the script supposed to know which extension should be processed first?

Hi Don,

Let me elaborate! These reports are generated from a mainframe and sent to a UNIX box every hour! There are 5 reports as I mentioned earlier!

What I need to do is to concatenated me the same type of reports into one and then give it an extension so it is readable and I can split the reports as and when I want to!

---------- Post updated at 02:03 AM ---------- Previous update was at 02:02 AM ----------

vgersh99:

something along these lines assuming you're in the directory where the files are located - not tested...
#!/bin/ksh

today=$(date +%d%m%y)

ls -1 Report.D${today}* | awk -F. '
{ a[$NF]}
END {for (i in a) print i}
' | while read ext junk
do
   cat Report.D${today}*.${ext} > DailyReport.D${today}.${ext}
done

This kinda worked but I could give the files an extension making it unreadable

Don_Cragun · August 24, 2016, 5:12pm

Hi Jesshelle,
It is great to know that vgersh99's suggestion "kinda worked", but it gives us absolutely no idea what part of it worked and what part of it did not work.

I have tried to get details from you about what you want done, but you did not answer those questions. Let me try once more:

Are you trying to create one file containing all 120 daily reports or are you trying to create five files containing 24 daily reports in each file?
What are the names of the files or what is the name of the file you want your script to create?
Will the filename extensions of the input files always be CTLR , ACCD , BCCD , CCCD , and DDDD , or do the extensions vary from day to day?
If there is one output file (instead of five), are the reports for each filename extension supposed to be grouped together or are the reports for each hour supposed to be grouped together? And, if they are grouped together by extension, does the order in which they appear in the output file matter? And, if so, what is the desired output order?
Will you be running this script on the day the reports are being generated? And, if not, how will your script know what day's reports are to be processed?
What are you using to read the daily report file(s)? Most UNIX system and UNIX-like system tools don't care about the filename extension when reading a text file.