Checking in a directory how many files are present and basing on that merge all the files

srikanth_sagi · December 27, 2012, 2:02am

Hi,

My requirement is,there is a directory location like:

:camp/current/

In this location there can be different flat files that are generated in a single day with same header and the data will be different, differentiated by timestamp, so i need to verify how many files are generated in a single day and merge them all into a single file.

Note:header should come in the single file.

Thank you!

RudiC · December 27, 2012, 2:14am

Pls be more specific, post an example of your constellation, where to find the header, and rules how to merge the files (e.g. specially ordered, by contents, no order).

srikanth_sagi · December 27, 2012, 2:46am

Hi,

Thank you!

There is a location like:

scripts/new/projectdir/

in which there is a chance of generating a two or three flat files per day,
they will be with name as flatfile1_v_timestamp,flatfile2_v_timestamp.

I need to check for this pattern in this location for these files as my job runs once in a day i need to pick all these flat files combine them into a single file, header will be same for all these files.

Note: records should come in ordered way and the values should accurately go into particular columns, even if we merge the files into a single file, field separator is '|'

Example:

File1

header :col1_name1|col2_name2|col3_name3
              1              2                3
              1                                1

File2

header: col1_name1|col2_name2|col3_name3
               4              5               6
               1

mergefile:

Header:

col1_name1|col2_name2|col3_name3
                1                 2           3
                1                              1
                4                 5            6
                1

Header will be the same but values may come or may not come for some of the columns.

Please let me know if anything needed on this

RudiC · December 27, 2012, 3:12am

Pls use code tags as advised.
So - the headers are identical across all files, and all except first should be removed. Data should be ordered according to their files' creation time sequence. Try this:

awk 'NR=1 FNR>1' $(ls -rt flatfile1_v_*)

Remove the -r option to ls if you want newest files first.

Pls note: the field separator | that you mention I can see only in the headers given, not in the data lines.

srikanth_sagi · December 27, 2012, 3:20am

sorry i forgot to mention, there is separator for the data as well.

---------- Post updated at 03:20 AM ---------- Previous update was at 03:14 AM ----------

can you explain me what this code does.

awk 'NR=1 FNR>1' $(ls -rt flatfile1_v_*)

where it is checking the directory, whether more than one file was present?

srikanth_sagi · January 2, 2013, 12:55am

awk 'NR=1 FNR>1' $(ls -rt flatfile1_v_*)

when i use this command for merging the flat files into a single file header is coming two times in a single file, but i wanted it only one what i need to do here....

Example:
Flat file1:

column1|column2|column3
1|2|3
4||4

Flatfile2:

column1|column2|column3
4|5|6
1|1|

Mergedfile:

column1|column2|column3
1|2|3
4||4 
4|5|6
1|1|

In the merged file as well particular column value should go into particular column.

Please reply to me at the earliest..

pamu · January 2, 2013, 1:33am

awk 'NR==1 || FNR>1' flatfile1_v_*

rangarasan · January 2, 2013, 1:35am

Hey,

Try this one,

awk 'NR==1 || FNR>1{print;}'  file_list

Cheers,
Ranga:)

srikanth_sagi · January 2, 2013, 3:04am

Hi,

Thank you for the reply.

can you pls tell me what's the difference between these two

awk 'NR==1 || FNR>1' flatfile1_v_*

awk 'NR==1 || FNR>1{print;}'  file_list

rangarasan · January 2, 2013, 3:16am

Please use code tags for data samples and code.

Coming to the point, The both awk commands are same.

RudiC · January 2, 2013, 3:39am

Sorry for my negligence - should have been double equals:

$ awk 'NR==1; FNR>1' $(ls -t flatfile{1,2})
column1|column2|column3
4|5|6
1|1|
1|2|3
4||4

If you want files in creation time order, add the -r option to the $(ls ...) command substitution. If you don't care at all, use just sth like flatfile*.
Explanation:
NR==1 : print very first line of entire stream (all files consecutively)
FNR>1 : print all lines except first of respective file (i.e. lines >=2 in every single file)
ls -tr : supply files in creation date order