BASH Corrupts Files?

gmark99 · April 30, 2015, 5:28pm

I have a system that uses file access to manage parallel processes. Each process checks to see if a file with a unique name that corresponds to the job it's got to do exists. If it does, it exits. If not, it creates one with that name. There is also a background process that is writing to this file.

Would this cause only blocking, or would it cause actual disk corruption?

Thx!

Corona688 · April 30, 2015, 6:05pm

There's an instant between 'check' and 'create file' when something else could create that file without bash knowing.

You could use mkdir. That's atomic. The first time it gets called it succeeds and creates the folder, the second time, it fails, nothing can sneak between.

I think there's a way for bash to try and create a file and report error if it already exists but am struggling to find the syntax.

Corona688 · April 30, 2015, 6:11pm

I found it. It's a bash-only feature, mind.

set -o noclobber

if ! exec 5>filename
then
        echo "Someone else already has filename" >&2
        exit 1
fi

# things writing to FD 5

Don_Cragun · April 30, 2015, 7:16pm

That isn't a bash ism. The set -C (aka set -o noclobber ) option and its interaction with the:

fd> file

and:

fd>| file

redirection operators is required in all POSIX conforming shells...

#!/bin/bash
options=$(set +o)	# Save current shell options
set -C			# Set noclobber option (synonym for "set -o noclobber"
if > xyzzy		# Fail if file already exists
then	echo '> xyzzy succeeded'
else	echo '> xyzzy failed'
fi
if > xyzzy		# Fail if file already exists
then	echo '2nd > xyzzy succeeded'
else	echo '2nd > xyzzy failed'
fi
if >| xyzzy		# Succeed even if file already exists (as long as you
			# have write permission)
then	echo '>| xyzzy succeeded'
else	echo '>| xyzzy failed'
fi
$options		# Reset original shell options
rm -f xyzzy		# Remove test file

produces output similar to:

> xyzzy succeeded
tester: line 8: xyzzy: cannot overwrite existing file
2nd > xyzzy failed
>| xyzzy succeeded

I said similar to because the format of the diagnostic message can vary slightly from shell to shell and system to system.

gmark99 · May 1, 2015, 12:37am

This sounds great! Thanks!

I'm not sure I understand the mkdir command, tho', since I'm talking about creating
a file, not a folder. I'll have to look into this and see how it pertains to files.

---------- Post updated at 11:37 PM ---------- Previous update was at 11:26 PM ----------

As a matter of fact, let me ask another question. I've set this program up so that it creates files whose unique names specify the jobs their contents describe. In order to retrieve the information inside those files, I have to do a "grep" and awk or sed to extract it. I've just assumed that making a directory with that unique name that contains individual files for each type of information would be wasteful of storage and slower to retrieve. I realize now that I've only assumed this and was not sure.

How would the method of having a thousand or so directories with individual files inside each to contain several types of information compare to my current method of files with those same unique names containing fields to be extracted from those files?

Would getting the file inside a directory be less resource/time intensive than having to extract each piece of info from a single file with that unique name?

?????????????

Don_Cragun · May 1, 2015, 1:29am

You have been given two suggestions using set -C or set -o noclobber that have nothing to do with mkdir. Have you looked at them?

gmark99:

---------- Post updated at 11:37 PM ---------- Previous update was at 11:26 PM ----------

As a matter of fact, let me ask another question. I've set this program up so that it creates files whose unique names specify the jobs their contents describe. In order to retrieve the information inside those files, I have to do a "grep" and awk or sed to extract it. I've just assumed that making a directory with that unique name that contains individual files for each type of information would be wasteful of storage and slower to retrieve. I realize now that I've only assumed this and was not sure.

How would the method of having a thousand or so directories with individual files inside each to contain several types of information compare to my current method of files with those same unique names containing fields to be extracted from those files?

Would getting the file inside a directory be less resource/time intensive than having to extract each piece of info from a single file with that unique name?

?????????????

I don't see how this has anything to do with the topic of this thread. Trying to discuss two topics in one thread confuses everyone trying to help you.

If you want help with another topic, please start a new thread to discuss it and give a much clearer example/explanation of the two proposals you want to evaluate. Show us the contents of the file(s) in the directories and the contents of the individual file(s) if the data is all in one directory (using CODE tags) and show us what data you are trying to extract from that file or those files.

The topic of discussion for this thread is performing an atomic check for the existence of a regular file and creating it if it did not already exist.

MadeInGermany · May 1, 2015, 8:08am

If different processes write to one file at a time, then there is a risk of file corruption (but not filesystem corruption). The corruption is usually that one of the concurrent writes is simply lost. Then it's better to have separate files e.g. in separate directories.
Regarding disk load, it would be best to have a memory filesystem, a RAM disk or the /tmp in Solaris (tmpfs).