Just a quick question: How does gzip behave under linux if it's source is a file that is currently being written to by a different process? Basically, in my below code I want to make sure that there is ultimately no loss of data; ie. the gzip command runs until it "catches up" to the end of the file while the file is expanding, and then the cat /dev/null clears the file immediately, therefore the next write to the file happens when it is empty, and all prior data in the file is safely preserved in the archived gzip file. How does my code look?
CAPDIR=/data/capture
KEEPDIR=/data/capture/keep
for FILE in `find $CAPDIR -maxdepth 1 -not -type d | awk -F/ '{print $NF}'`
do
echo Processing $CAPDIR/$FILE --\> $KEEPDIR/$FILE.GZ
gzip -c /$CAPDIR/$FILE > $KEEPDIR/$FILE.GZ
cat /dev/null > $CAPDIR/$FILE
done
echo
echo Done.
echo
I know in some OS's that when a file handle is locked for reading you get the file contents up to the EOF at the time of lock, not up to the EOF at the current time.
I guess another way to put my question would be is there a way to "atomize" these commands:
gzip -c /$CAPDIR/$FILE > $KEEPDIR/$FILE.GZ
cat /dev/null > $CAPDIR/$FILE
...such that I can be guaranteed that no other process gets a chance to write data to $CAPDIR/$FILE in between the call to gzip and the call to cat /dev/null?