split command

arv600 · January 9, 2010, 1:20am

./myapp | split -b 10m -d -a 1 - "myappLog"

here split command is reading the input from the output of myapp and it will write the text in to file where in each file size is 10MB and it will create upto 10 files.

I have observed split is flushing the data for every 4096 bytes. if my application suddenly crashes am not getting the latest messages in to the file.

Can i specify to split command to flush the data immediately once the input text is read?.

Please provide me the solution.

thanks in advance.

jim_mcnamara · January 11, 2010, 9:57am

The problem is not split. Your app needs to block & process signals, SIGSEGV, for example. When the program is going to bomb it should call fsync or fdatasync on the output stream to split, then re-raise signal it orginally got and let it proceed unblocked.

The same is true for fatal errors the program itself enounters.

arv600 · January 11, 2010, 12:00pm

well..i know how to handle in signals. let me rewrite my statement.
let's say if the application safely exit if one condition met. Split command flushing data for every 4096 bytes. i want to see the logs when the application exits.

basically i used split command to put the restriction on the size of the out put files. i am calling split command in the infinite loop. this will be useful even if the max generation of the file limit reaches.

jim_mcnamara · January 11, 2010, 12:38pm

The only way split will flush output is to get an EOF on stdin.

To my knowledge there is no way to force sync on a child process, especially when it is a command line utility. You will have to roll your own version of split. Have it look for something in the input stream that forces it to fflush() the output stream.

fflush(stdin) is undefined behavior in standard C. (C89 && C99)

arv600 · January 11, 2010, 11:09pm

my objective is to put restriction on the size of the out put file.

for example
./myapp >> a.txt

let's say if my application runs for 1 year then the size of a.txt is huge in terms of billions.

I am looking for an approach to put the restriction on the size of the out put file. that's the reason i have gone for split. i wrote a wapper script over split command to run in infnate loop.

thegeek · January 11, 2010, 11:28pm

When i first seen your post itself, it was interesting to me. I started a small research, but unsuccessful in that because of other reasons. But now got a good understanding, here is it, hope it will be useful for you.

Hope you understand the below code, if not, this is the explanation:

A perl application which writes a number every one second, and also handles SIGINT, by flusing and closing stdout, and exiting.

$| = 1;
$i = 0;

$SIG{'INT'} = 'int_handler';

sub int_handler { 
    print STDERR "signal caught...\n"; 
    
    close(stdout); 
    exit(0); 
}

while (1)  {
    print STDOUT ++$i;
    print STDERR $i;
    sleep (1);
}

I executed this program, like this:

perl write.pl | split -b 3 -d -a 1 - "myappLog" 
12

As expected, it started writing both in STDERR and also in STDOUT, so after some seconds i had given SIGINT like,

$ ps ax | grep write
 5314 pts/0    S+     0:00 perl write.pl

$ kill -INT 5314

So the following was the outcome,

$ perl write.pl | split -b 3 -d -a 1 - "myappLog" 
12345678signal caught...
$

Tested whether everything was given to split, and does split processed ?
Yes.

$ ls myappLog*
myappLog0  myappLog1  myappLog2
$ head myappLog*
==> myappLog0 <==
123
==> myappLog1 <==
456
==> myappLog2 <==
78

Program wrote only 8 characters, and all are handled by split ( and it does not wait for 4096 as you have mentioned ).

So the final thing, you have to handle the signal and flush the output.
Hope it helps.

arv600 · January 12, 2010, 12:45pm

I have observed this behaviour on RHAS4.0

---------- Post updated at 12:44 PM ---------- Previous update was at 12:36 PM ----------

I have observed this problem on RHES 4.0.

---------- Post updated at 12:45 PM ---------- Previous update was at 12:44 PM ----------

I have observed this problem on RHES 4.0.

jim_mcnamara · January 12, 2010, 1:33pm

IMO, you are missing the point. sync(), fdatasync(), fsync() force the kernel to write to disk/device all pending write operations - fsync() works on a single stream...fflush() calls fsync(). If a process abends without flushing stdout the output is lost period.

If your process IS ACTUALLY writing everything before it dies, then the problem lies with split. Which I doubt. Can you post the last few lines of output from your app (do not use split) when it hits error conditions?

For example: If you call _exit() rather than exit() in your code it will NOT flush output.

example 1:

#include <signal.h>                
#include <stdio.h>                 
                                   
int main()                         
{                                  
   int i=0;                        
   for(i=0; i< 2048; i++)          
   {                               
       if(i==1024) raise(SIGSEGV); 
       printf("x");                
   }                               
   return 0;                       
}

output:

> cc t.c
> ./a.out
Segmentation Fault(coredump)

Note -- NO output. Why? because the process did not flush stdout.

example2: using split, my version of split does not support -d:

> ./a.out | split -b 10m  -a 1 - "myappLog"                               
> ls -lrt | tail                                                          
-rw-r--r--   1 a     b        1799 Jan 11 11:53 WithSB_After_ChargeCalc.sql
-rw-rw-r--   1 a     b         329 Jan 11 11:53 after_estimates.txt        
-rw-r--r--   1 a     b         354 Jan 11 12:15 sql_threads.h              
drwxrwxr-x   2 a     b        2048 Jan 11 13:19 data                       
-rw-r--r--   1 a     b       22375 Jan 12 09:23 threadbud.pc               
-rw-r--r--   1 a     b        1384 Jan 12 09:28 ClearPrdgForLateMRE.sql    
-rw-------   1 a     b        7276 Jan 12 10:02 dead.letter                
-rw-r--r--   1 a     b         172 Jan 12 11:14 t.c                        
-rwxrwxr-x   1 a     b        6648 Jan 12 11:14 a.out                      
-rw-------   1 a     b      113228 Jan 12 11:19 core

No file produced.

example calling exit():

#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
void signal_handler(int sig)
{
	  exit(1);
	  
}

int main()
{
   int i=0;
    signal(SIGSEGV,signal_handler);
   for(i=0; i< 2048; i++) 
   {
       if(i==1024) raise(SIGSEGV);
       printf("x");
   }
   return 0;
}

output from exit():

appworx> ./a.out
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>

1023 bytes on stdout.

running thru split:

> ./a.out | split -b 10m  -a 1 - "myappLog"
> ls -lrt | tail
-rw-rw-r--   1 appworx  banner       329 Jan 11 11:53 after_estimates.txt
-rw-r--r--   1 appworx  banner       354 Jan 11 12:15 sql_threads.h
drwxrwxr-x   2 appworx  banner      2048 Jan 11 13:19 data
-rw-r--r--   1 appworx  banner     22375 Jan 12 09:23 threadbud.pc
-rw-r--r--   1 appworx  banner      1384 Jan 12 09:28 ClearPrdgForLateMRE.sql
-rw-------   1 appworx  banner      7276 Jan 12 10:02 dead.letter
-rw-------   1 appworx  banner    105004 Jan 12 11:26 core
-rw-r--r--   1 appworx  banner       277 Jan 12 11:27 t.c
-rwxrwxr-x   1 appworx  banner      6852 Jan 12 11:27 a.out
-rw-rw-r--   1 appworx  banner      1024 Jan 12 11:31 myappLoga

myapp is the problem not split. IMO.