Based on your results, HP-UX tr is not POSIX-compliant in ways that have been part of the standard for at least 15 years now (perhaps 20). I do not say this pejoratively; it's merely an observation.
I could not find IEEE Std 1003.2-1992 online, which I believe is the first standard to include the utilities (IEEE Std 1003.1-1988 only covered core system services).
The Single UNIX � Specification, Version 2
Copyright � 1997 The Open Group
tr manual page
http://pubs.opengroup.org/onlinepubs/009695399/utilities/tr.html
As I said before, I have no experience with HP-UX nor do I know what it aspires to be.
Perhaps backwards compatibility is most important to HP and its userbase. If that's the case, then it was a mistake to add support for the POSIX/BSD range syntax. a-c
in historical SysV tr means three characters, a
, -
, and c
; it's equivalent to ac-
.
If, however, HP endeavours to be POSIX-compliant, then your results are unexpected and erroneous; scripts that are compliant and work as expected on compliant systems can fail on HP-UX.
That's the expected result for historical SysV behavior, but it's not POSIX-compliant. In a POSIX tr range expression, the brackets are not special at all; [a-c]
is equivalent to ][a-c
.
The POSIX-compliant result is .....
The \[
and \]
escape sequences are undefined in POSIX. Their use is not portable.
That gives me nothing (except for the untranslated newline emitted by echo), which is the POSIX-compliant result.
Linux or BSD is irrelevant for the purposes of this discussion. I'm simply playing POSIX lawyer at the moment ;).
It appears that HP-UX tr added support for the BSD range expression syntax that POSIX long ago adopted, a-c
, but it continues to accept historical SysV syntax, [a-c]
, treating them identically even though according to POSIX they mean different things (the latter includes two brackets which the former does not).
It's understandable that you've been using this syntax for a very long time without any obvious problems. With a SysV tr, the range expression behaves as you intend. With a POSIX or BSD tr, in most instances, where both strings consist of a range expression, the brackets are silently translated into identical characters. While the brackets were not intended to be members of the translation set, since they are translated into themselves, the result is correct (which is why the POSIX standard chose to go with the BSD syntax, less collateral damage). However, in other cases, for example, when only the first string contains a range expression and the second is a repetition expression, tr '[a-z]' '[.*]'
, there exists a potential for a silently erroneous result. And if the tr implementation does padding on the second string, then the repetition expression isn't required for a silent error to occur, tr '[a-z]' '.'
.
methyl, I greatly appreciate your responses to my questions. I realize that these are rarely encountered corner cases, but they pique my curiosity. I often learn more than I intend as I dig into them.
Regards,
Alister