Abnormality while piping tr command output to sed

i have a file seperated each line seperated by newline. For example

alpha
beta
gamma

i am trying to replace the newlines to "," but dont want , present at the end of the line so i am trying the below one liner . but not sure whats wrong but its not working
cat myfile | tr -s '\n' ',' | sed 's/,$//'

But when i try it this way , its working
echo "alpha,beta,gamma," | sed 's/,$//'

can someone please explain me this behaviour.

Based exactly on your input file, it works for me. What is not working about it for you?

$ tr -s '\n' ',' < file | sed 's/,$//'    
alpha,beta,gamma

(no need for cat, though)

You can also use paste

$ paste -sd, file                     
alpha,beta,gamma

At a wild guess, I would say your file has ^M characters in it.

Show the output of:

cat -v infile
1 Like

I would suggest your input file has some non-printing characters in it, most likley it's a DOS formatted file. Try running dos2unix on the file first or use:

tr -s '\r\n' ',' < myfile | sed 's/,$//'

In addition to the possibilities already listed, note that the standards only define the behavior of sed when the input is a text file. The output from the tr command is not a text file (by definition, the last character of a text file that is not an empty file is a <newline> character). The tr command you gave strips off the trailing <newline>; the echo command you gave supplies a trailing <newline>. If this is the problem, the tr|awk pipeline may work as you expect on some systems and produce no output at all (or do something completely different) on other systems.

Here are a couple of portable ways to do this:

awk 'NR==1{o = $0; next}
        {o = o "," $0}
END     {print o}' myfile

and:

#!/bin/ksh
(read o
while IFS="" read x
do      o="$o,$x"
done
printf "%s\n" "$o"
) < myfile

I use the Korn shell, but you can replace /bin/ksh in #!/bin/ksh with a path to any POSIX conforming shell on your system.

1 Like

I have the file in unix system only and i dont see any any extra character. I am using solaris 10 .

# cat myfile
alpha
beta
gamma

# od -c myfile
0000000   a   l   p   h   a  \n   b   e   t   a  \n   g   a   m   m   a
0000020  \n
0000021
#
# tr -s '\n' ',' < myfile | sed 's/,$//'
#

I have tried this with bourne and bash shell. not same result.

sed path : /bin/sed

I tried with xpg4 path and output looks promising but with a warning message

# tr -s '\n' ',' < myfile | /usr/xpg4/bin/sed 's/,$//'
sed: Missing newline at end of file standard input.
alpha,beta,gamma

---------- Post updated at 03:16 PM ---------- Previous update was at 03:10 PM ----------

@Don Cragun your code worked. I am not able to get the point on why this behaviour is seen. why does a strip of trailing newlines piped to sed/awk wont work ?

The sed and awk utilities are only defined to work on text files. If the last line fed to sed or awk does not end in a <newline> character, the results are unspecified. Basically, sed and awk try to read in a line, and until they find the terminating <newline> character they don't have a line; so the partial line at the end of the file may be ignored as Solaris 10's /usr/bin/sed did. Or as Solaris 10's /usr/xpg4/bin/sed did; it can add the newline for you and warn you that it did so. Some versions of sed will silently add the trailing <newline> without a warning. Which behavior is better depends on the source of your data. (If your data source dies in the middle of transmitting/producing your data, the warning lets you know that data from the end of your input stream may have been lost.)

ok , so sed/awk have not worked in here as
tr output : alpha,beta,gamma, did not contain a new line in them. And this behaviour is also sed/awk version specific.
now with awk we can change the record seperator right ? so should not something like this work logically ?
since my tr output does not have a newline i am using RS=""

# tr -s '\n' ',' < myfile | awk 'BEGIN{RS=""}{sub(/,$/,"");print}'

You can use RS to change the record separator, but that does not alter the fact that the behavior of sed and awk is only defined when the input file is a text file. If there are any characters at the end of a file without a trailing <newline> character, the input is not a text file.

If the input is not a text file, the behavior is unspecified.

Will the unspecified behavior change if the value of RS changes; the answer to that is unspecified. It might work; it might not; it might vary depending on which awk or sed you use.

@Don,

I agree with what you say and I know for a fact that some sed's do process that last line, other's don't so as you say the results are unspecified. However, I have never come across a version of awk that does not process that last unterminated line, and I am wondering why that might be. Even oawk on Solaris will process it:

$ printf "hello" | awk '{print}'
hello
$ printf "hello,hello" | awk '{print}' RS=,
hello
hello

Could it perhaps be that awk is almost 'obliged' to process that last line anyway because of the concept of RS, the input record separator which can be set to almost any character. So what might be a consequence of this is that any characters that follow that last RS, should be interpreted as being part of the last record? Otherwise it should have been called RT (Record Terminator)?

Another example would be that if the were not the case then from a UNIX point of view what this printf produces is a valid UNIX text file, but from an awk point of view - if you compare that to the situation where RS is a regular newline - this would be an "unterminated" last record

$ printf "hello,hello\n" | awk '{print}' RS=,
hello
hello

$

After all that last newline is not a file terminator, but a line terminator..

Yes, the last newline is a line terminator. So the input file consists of lines. And, each line is no more than LINE_MAX bytes (including the terminating newline) and there are no null bytes in any line. Therefore, by definition, the input is a text file.

The behavior of awk and sed is defined when the input is a text file no matter what the record separator is.

If the input file had been created by printf "hello,hello" (note that there is no trailing newline), the results would be unspecified no matter how RS is set because the input fiile would not be a text file.