sort truncates line when they contain nulls

When I try to sort a file where some records contain nulls i.e. hex 00 the sort truncates the record when it reaches the null and writes message:

"sort: warning: missing NEWLINE added at end of input file myfile"

I'm assuming from this that the sort sees the null as a special character and acts accordingly. I could hack the file to replace the nulls with spaces but it would be great if I could tell the sort to accept the null as just another character in the record and not truncate.

Anybody got any ideas on options?

Hello Arthur. I have the same problem. Did you could fix it? Thanks

Hi

Most Unix utilities will have this problem...

If x'00' is to be considered a valid character in the body of your file, how would sort identify a 'true' end-of-line?

Do your records have an end-of-line marker other than x'00'?

Just my 2 cents...

JG

If the files are pure 7-bit ASCII, you can replace the NUL with an extended character. Just make sure you don't pick one which already exists in the file. And make sure you don't use its UTF8 representation, which is by definition multiple bytes.

Or if you can find a 7-bit printable character which doesn't occur in the file. try that. (Tab? Tilde? Underscore? @?)

tr '\000' @ <file | sort | tr @ '\000' >output

... assuming your tr understands backslashed octal.

Grepping for special characters can be tricky, too; presumably, your grep will also treat NUL as end of string. Try replacing all occurrences of your character and comparing the result against the original; if they are binary identical, you have found a character which doesn't occur in the file.

 tr -d @ <file | cmp - file

... assuming your cmp accepts - to mean standard input.

Era,
I can not change the byte because it is part of my data.
In Linux works fine, but in AIX truncated data.
Thanks

jgrogan,
My file have x'0A' at end of each records.
thanks

The idea is to change it temporarily so sort can work, then change it back. You just need to take care to use a byte which doesn't occur in your data.

For example, octal \200 or \001 might work if they don't occur in the data file already. So you'd change the NULs to (something unique), sort, and change (something unique) back to NUL. Now the data should be sorted, with the NULs preserved.

(\200 might be problematic too, because it's NUL with the eight bit set, and some procedure might still live in 7-bit land and strip the 8th bit internally; try some other high-value byte between \201 and \377 if it doesn't work.)