Merge all the files in a folder in FIFO order

unankix · June 13, 2012, 5:13pm

Hi All,

I have to merge the data in all the files in a folder such that the data of the earliest file come first then the second file's data and so on. Please help.

Thanks.

migurus · June 13, 2012, 5:47pm

Under ksh or bash (sort by date)

for f in $(ls -tr); do cat "$f" >> /tmp/my_accumulated_data_file; done

if sort by file name just drop -tr from ls

alister · June 13, 2012, 5:48pm

You can use ls with suitable options to generate a list of the directories contents in the proper order. That list can be piped into a while-loop for reading. Use test/[ to verify that the item is indeed a file before cat'ing. Redirect the output of the while-loop to a file of your choice.

Regards,
Alister

Lem · June 13, 2012, 6:01pm

Usually it's a very bad thing to use ls instead of find for task like this, but... being really careful, try to see:

cd mydirectory
ls -1rt | while read item; do [[ -f $item ]] && cat $item >> /somewhereelse/outputfile ; done

alister · June 13, 2012, 6:14pm

That's an unsafe and unreliable way to work with a list of filenames.

$ mkdir test && cd test
$ touch -- '?' '[a]' a '-n'
$ ls -1
?
a
[a]
-n
$ ls | while IFS= read -r f; do printf '%s\n' "$f"
?
a
[a]
-n
$ for f in $(ls); do echo "$f"; done
?
a
a
a

That for-loop is badly broken. Instead of the four files, only two are printed, one of them three times. Pathname expansion (globbing) after the command substitution replaces ? with ? a and [a] with a , leading to the three instances of a. The -n is treated is silently consumed by echo as an option.

Beware pathname expansion after command substitution. Beware feeding echo arbitrary arguments. printf is safer and portable (and, like echo, usually a shell builtin).

Regards,
Alister

---------- Post updated at 06:14 PM ---------- Previous update was at 06:10 PM ----------

lem:

Usually it's a very bad thing to use ls instead of find for task like this, but... being really careful, try to see:
cd mydirectory
ls -1rt | while read item; do [[ -f $item ]] && cat $item >> /somewhereelse/outputfile ; done

If done correctly, there's absolutely nothing wrong or unsafe about piping ls into a while-read loop. The only text it cannot handle is a newline in a pathname because ls and read use it as their delimiter.

Your example, however, isn't well-constructed. You must use IFS= to prevent trimming of leading/trailing whitespace and -r to treat backslashes literally. To read arbitrary lines of text verbatim: IFS= read -r line

Regards,
Alister

unankix · June 13, 2012, 6:38pm

Thanks all for reply..I need to append the timestamp of the file at the end of each record as well...I am new to UNIX,so please provide the code.Thanks for the help..

Lem · June 14, 2012, 5:16am

I knew. When I wrote "be very careful", I meant: I know that a lot of things can go wrong... but at the moment I couldn't say precisely what they are.

I'm just a shell newbie, and bash is only my latest hobby. So please take it into account when I write some nonsense. I'm sorry in advance.

Ok, it's easy to remember. "IFS= ", and read will catch the whole line 100% of times. Thanks.

unankix · June 14, 2012, 10:33am

Thanks a lot guys..but frankly speaking I did not get what will be the right code to use :-(..I am new to UNIX and this is part of my first assignment in UNIX..and please let me know how to append the timestamp of the file at the end of each record while merging the all files..Thanks again.

---------- Post updated at 09:45 AM ---------- Previous update was at 09:34 AM ----------

I am using the code:

ls -1rt | while IFS= read -r f; do [[ -f $item ]] && cat $item >> /work/scripts/acu/outputfile.txt ; done

but it is not working

---------- Post updated at 10:33 AM ---------- Previous update was at 09:45 AM ----------

I have used:

for f in $(ls -tr); do cat "$f" >> /work/scripts/acu/my_accumulated_data_file; done

This is merging all the files in order to timestamp of the file but sometime the first line in the new file is starting from the new line and sometime from the end of the last line of the previous file

I have tried this as well:

ls | while IFS= read -r f; do cat "$f" >> /work/scripts/acu/outputfile.txt ; done

The new line issue is here also and it is not following any order in mergeing the file it seems.
Please advise as it is urgent. And how to append the timestamp (the timestamp when the file placed in the folder) at the end of the each record of that perticular file.

Thanks

Lem · June 14, 2012, 10:54am

ls -1rt | while IFS= read -r item; do [[ -f $item ]] && { cat $item; stat -c %y $item; } >> /work/scripts/acu/outputfile.txt; done

unankix · June 14, 2012, 11:05am

Hi Lem,

i am getting error that "stat: not found." for all the files in the folder.
The first line of all the files are not starting from the new line as well.

Corona688 · June 14, 2012, 11:19am

It probably means what it says. It checks in the current folder for a file named "a.txt" or whatever, and it's not there.

I'm guessing you're not listing the current, hence why it can't find those files -- it's looking in the wrong place.

Prepend the path that ls removed.

ls -1rt folder | while IFS= read -r item; do [[ -f $item ]] && { cat folder/$item; stat -c %y folder/$item; } >> /work/scripts/acu/outputfile.txt; done

unankix · June 14, 2012, 11:23am

ohh..is it for .txt only??can't we do it for any kind of extension?(.dat,.trg etc)..and what about the new line

Corona688 · June 14, 2012, 11:26am

Why on earth would you want the file contents dumped into the logfile if they weren't text? You won't be able to read them...

What about the newline? I don't understand the problem you're having. Show an example of the bad output.

unankix · June 14, 2012, 11:33am

ok..I should have explained the scenario at the very beginning itself.
Different users can put some information in pre-defined format (180 bytes) in a file which may end in .dat or .trg or .txt. We have to merge all this data in one file and then load them in teradata.(to know what record came when we want to append the file timestamp at the end of each record which will be then inserted to teradata).
If this code will work for .txt only we can ask users to put the files in .txt format only but it will be better if there will be no bar on extension.
The new line character issue is: After merge,the first record of the new file is starting from the end of the last record of the earlier file. Can't we have all the records starting from the new line?

alister · June 14, 2012, 1:08pm

In place of cat file you may want to try awk 1 file , if the problem is that the last "line" of some of your files is missing the terminating newline.

Keep in mind that most UNIX tools are designed to handle text files. Text files do not contain nullbytes and other control characters. If some of your files do contain these characters, then utilities like AWK may not work.

Regards,
Alister

unankix · June 14, 2012, 1:28pm

Hi,
even for .txt files i am getting error that Stat not found (its not missing the folder name)

---------- Post updated at 01:28 PM ---------- Previous update was at 01:21 PM ----------

Is it possible that STAT is not installed in my system?(i read somewhere that 'to check if STAT is installed')..If it is not installed then can we get the file timestamp and append thatto record?

Lem · June 14, 2012, 2:10pm

Well, almost everything is possible... even when weird.

Tried

which stat

?

However, I guess it would be of some help to know just a little bit about your system. Ehm... What's your OS?

unankix · June 14, 2012, 3:22pm

There is no stat in /bin /usr/bin /etc /sbin /usr/sbin /usr/ucb /usr/local/bin /usr/local/sbin /opt/universal/bin ..

is there any other way???
Thanks

---------- Post updated at 03:22 PM ---------- Previous update was at 02:37 PM ----------

if not in Shell, can we do the same thing in Perl? As I am sure that we have STATS in Perl :-)...Thanks

jawsnnn · June 14, 2012, 3:22pm

ls --full-time | awk '{print $6, $7, $8}'

gives time of last modification on my system (Ubuntu, bash)

Lem · June 14, 2012, 3:30pm

Intead of

stat -c %y $item;

try with

date -r $item;