Filenames with hyphens - UNIX style?

Hello everyone!
Filenames with hyphens instead of everything else that can be as a space - is it particularly UNIX style of naming or a general practice? It kinda is so in my mind that DOS guys use underscores as spaces and UNIX guys use dashes. Is it so?

It is not true that hyphens are preferred over underscores in filenames in unix. Hyphens and space characters in filenames are not recommended in unix. Recent Microsoft Operating Systems seem to allow any character in a filename regardless of whether this gives difficulty to command-line commands or any interaction with non-Microsoft systems ... or even earlier Microsoft systems.

I normally use underscores.

@methyl: While spaces are obviously not, hyphens are recommended by the Unix standard except as first character.

@guest115:

Unix kernels do not make any limitations in filenames outside two forbidden characters, / and null i.e. \0 and two reserved filenames: . and .. .

File systems, especially non Unix native ones, might be stricter, even when used on Unix.

For portability, POSIX recommends restricting filenames to the portable filename character set, i.e. uppercase and lowercase a-z, digits, dot, hyphen and underscore.

There are no specific recommendations in Unix about whether using hyphens or underscores as a separator. Files with embedded spaces are on the other hand not recommended but supported anyway. They just need proper quoting/escaping when used in the command line. Hyphens pose no problem outside when they are the first character of a file, in which case some tricks are usually required depending on the command used.

I just checked on a Solaris 11 fresh install and got these statistics analyzing 168522 filenames:

  • 27.57% contains at least an hyphen
  • 23.13% contains at least an underscore
  • 4.50% contains both an hyphen and an underscore
  • 0.00% contains a space character (not a single file)

Other statistics that might be interesting:

  • 2.34% are plain numbers
  • 11.89% are only composed of lowercase characters & optionally digits
  • 0.71% are only composed of uppercase characters & optionally digits
  • 22.87% have no extension
  • 94.53% comply with the POSIX portable filename character set: A-Z a-z 0-9 . - _

On this system at least, there is a slight preference for hyphens compared to underscores (I expected the opposite) but both are very common anyway, more than half of the files use at least one of these separators.

4 Likes

@jliiagre
Recommendation to avoid hyphens and space characters in filenames did not come from the Posix "standard". They did however come from a cross-platform portability guides.
I never thought that is would happen but I am seeing more problems with filenames created on unix systems where the user thinks that they are on a Microsoft system:

"C:\March report - with summary figures"

The C:\ upsets my multi-platform backup software (you can never restore the file), the reverse solidus upsets scripts which process filenames (because it's a Shell special character) and the space-delimited hyphen will break any script which does not take special precautions.

I'd be interested in a further analysis of the Solaris kit filenames for colon characters and hash characters.

1 Like

Unix is very permissive in what it accepts but quite restrictive in what it recommends. Your first reply seemed to imply Windows was too permissive while Unix wasn't. The reality is the opposite. Windows has more restrictions and peculiarities like refusing a file to be named null.h or com0.c, having a colon in its name, having a space as its last character and so on, not to mention the way it preserve case but doesn't allow files with the same name but different cases to stay in the same directory.

In any case, your example obviously violates the POSIX recommendations but is still a valid Unix filename. It is obviously unacceptable to Windows and possibly other OSes and defeat non rock-solid scripts.

If your backup software has issues processing this filename, that's a bug or a limitation of the storage format it uses, the OS or the file system. The venerable tar utility has no issues handling it:

$ touch 'C:\March report - with summary figures'
$ tar cvf foo.tar *es
a C:\March report - with summary figures 0K
a files 8388K
$ tar tvf foo.tar
-rw-r--r-- 60004/60004      0 May 11 17:14 2012 C:\March report - with summary figures
-rw-r--r--   0/0   8589076 Nov  9 16:07 2011 files
$ mkdir extract
$ cd extract
$ tar xvf ../foo.tar
x C:\March report - with summary figures, 0 bytes, 0 tape blocks
x files, 8589076 bytes, 16776 tape blocks
$ ls -l
total 2
-rw-r--r--   1 jlliagre jlliagre       0 May 11 17:14 C:\March report - with summary figures
-rw-r--r--   1 jlliagre jlliagre 8589076 Nov  9  2011 files

About your last request, here are the numbers I got:

  • 0.74% have colons (1242)
  • 0.00% hashes (2 files out of 168522)