Combine multiple Files into one big file

ganesh_248 · November 15, 2011, 1:04pm

Hi Ppl,

I have a requirement like i will be getting files of huge size daily and if the file size is so huge ,the files will be split into many parts and sent.The first file will have the header details followed by detail records and the consecutive files will have detail records and the last file will have the trailer details.I need to merge all these files into one single file in the order Header record then detail records then trailer record.

Example,
Filename pattern is Daily_report_1111_20111115 & Daily_report_2222_20111115.
If the file size is huge it will be split and sent as

Daily_report_1111_20111115_1
Daily_report_1111_20111115_2
Daily_report_1111_20111115_3
Daily_report_1111_20111115_4

Daily_report_2222_20111115_1
Daily_report_2222_20111115_2
Daily_report_2222_20111115_3
Daily_report_2222_20111115_4.

I need to concatenate all the Daily_report_1111_20111115_1,2,3,4 files into one big file Daily_report_1111_20111115 with contents of Daily_report_1111_20111115_1 in the top followed by the contents of the files Daily_report_1111_20111115_2,3,4.So the final output file should be Daily_report_2222_20111115 with contents of all 4 files in the order received(the order of the files will be present in the filename like _ 1,2,3,4)

Similarly for Daily_report_2222_20111115 files.

The requirement is the date 20111115(which is in the filename) will keep changing and we might get multiple days files on the same day.so i need to write and automate the script so that it looks out for the files with the same date and check the filename pattern(either Daily_report_1111_YYYYMMDD or Daily_report_2222_YYYYMMDD and concatenate them into one single file based on the filenames like _1,2,3 etc.

Please help me out!

Corona688 · November 15, 2011, 1:16pm

ls |
        awk -F_ '{ STR=$1; for(N=2; N<NF; N++) STR=STR"_"$N;  T[STR]=1 } END { for(K in T) print K; }' |
        while read LINE
        do
                cat ${LINE}_* > $LINE
        done

The awk groups the filenames, feeding them into a while loop which globs them and sticks them together into the output filename.

ganesh_248 · November 15, 2011, 1:22pm

Hi Corona,

I am sorry but i do not understand the code.Can you explain me each part of the code as i am new to unix.Thanks!

Corona688 · November 15, 2011, 1:27pm

# List files in the current directory
ls |
# Feed it into the AWK programming language.
# splits each line apart $1,$2,...$NF on the _ character.
# Then puts it back together into STR, minus the _1,_2, etc on the end.
# Then stores it in T.
# Once it's read all lines, prints all unique combinations it found.
        awk -F_ '{ STR=$1; for(N=2; N<NF; N++) STR=STR"_"$N;  T[STR]=1 } END { for(K in T) print K; }' |
# Reads unique combinations from awk, like Daily_report_2222_20111115
# and puts them in the variable LINE.
# feed all files matching ${LINE}_* into cat, writing to the file $LINE.
        while read LINE
        do
                cat ${LINE}_* > $LINE
        done

ganesh_248 · November 15, 2011, 1:31pm

Hi Corona,

There might be multiple files of other filename patter and i do not want to touch those.I need to combine only the file names which are of the pattern Daily_report_NNNN_YYYYMMDD in that particular folder.Not all days we might get a single big file split into multiple parts and sent.So i will have to write script and automate it so that if at all i get a large file split and sent into multiple parts i need to concatenate them into one and if its sent as one big file without being split into multiple parts then i need to use the one big file as is for processing.Thanks for your understanding

Corona688 · November 15, 2011, 1:35pm

Okay.

# List relevant files in the current directory
ls Daily_report_[0-9][0-9][0-9][0-9]_[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]_[0-9] |
# Feed it into the AWK programming language.
# splits each line apart $1,$2,...$NF on the _ character.
# Then puts it back together into STR, minus the _1,_2, etc on the end.
# Then stores it in T.
# Once it's read all lines, prints all unique combinations it found.
        awk -F_ '{ STR=$1; for(N=2; N<NF; N++) STR=STR"_"$N;  T[STR]=1 } END { for(K in T) print K; }' |
# Reads unique combinations from awk, like Daily_report_2222_20111115
# and puts them in the variable LINE.
# feed all files matching ${LINE}_* into cat, writing to the file $LINE.
        while read LINE
        do
                cat ${LINE}_* > $LINE
        done

ganesh_248 · November 15, 2011, 2:02pm

Hi Corona,

thanks a lot for your time and patience in explaining me things.
Il try out the script and come back to you if i have any concerns.
Thanks once again Corona!!!

---------- Post updated at 12:32 AM ---------- Previous update was at 12:16 AM ----------

Hi Corona,

Your code worked like a charm.thanks for that.
Also i need to remove the old files after merging the multiple files into one single file.I mean for files that have come as one big file without getting split into multiple parts,i need to keep them untouched while the other big files which had come as multiple parts ,i need to delete these files after merging them into one single big file and i need to keep only the final big file for further processing and i need to delete the other files that came as parts.!

Corona688 · November 15, 2011, 2:08pm

# List relevant files in the current directory
ls Daily_report_[0-9][0-9][0-9][0-9]_[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]_[0-9] |
# Feed it into the AWK programming language.
# splits each line apart $1,$2,...$NF on the _ character.
# Then puts it back together into STR, minus the _1,_2, etc on the end.
# Then stores it in T.
# Once it's read all lines, prints all unique combinations it found.
        awk -F_ '{ STR=$1; for(N=2; N<NF; N++) STR=STR"_"$N;  T[STR]=1 } END { for(K in T) print K; }' |
# Reads unique combinations from awk, like Daily_report_2222_20111115
# and puts them in the variable LINE.
# feed all files matching ${LINE}_* into cat, writing to the file $LINE.
        while read LINE
        do
                cat ${LINE}_* > $LINE
                rm -f ${LINE}_*
        done

ganesh_248 · November 15, 2011, 2:13pm

Thanks Corona!!!You made my day!!!

Corona688 · November 15, 2011, 2:21pm

Just realized this may get things out of order when there's more than 9 segments per file, is that a problem?

vgersh99 · November 15, 2011, 2:27pm

You might want to consider:

Daily_report_[0-9][0-9][0-9][0-9]_[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]_[0-9][0-9]*

if there're more than 9 files per day/block.

ganesh_248 · November 21, 2011, 1:33am

HI Corona/Vgresh,

As of now we are getting less than 9 files per day...I did think of that option too... since we are not getting more than 9 as of now i am not checking the last [0-9]* option.

Thanks to you both for bringing this up and closely following users request and solving them.Great to be in this forum.