Remove or truncate trailing nulls from file

I want to remove from a file the trailing null characters (0x00) and stop doing so when a different byte is found (which should not be deleted), and either put the result to the same file or a different one.

Any ideas?

Try:

tr -d '\0' < file_with_nulls > file_without_nulls

Note that the filename you use for the output file must NOT be a name of your input file.

1 Like

That command removes all nulls. I need to remove only the ones that are at the end of the file just after the last non-null character, not in the middle of the file.

I misunderstood your requirements. Even though you said you wanted to remove trailing nulls, the way you said that you wanted to stop removing nulls when a different byte was found sounded like you wanted to stop removing null bytes when a non-null byte was found after a string of one or more null bytes. Just removing trailing NUL bytes could be done in a shell script using a combination of something like od and grep to find the address of the last non-NUL byte in a file and dd to truncate a file to that length, but a relatively simple C program is probably easier. If you save the following in dtn.c :

#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <sys/stat.h>
#include <unistd.h>

char	buf[8192];	// I/O buffer
int	ec;		// exit code

// pe(format_string, format_string_argument, exit_code_modifier);
void
pe(	const char	*fmt,
	const char	*arg,
	int		ecm) {
	int	serrno;	// hold area for errno

	serrno = errno;
	snprintf(buf, sizeof(buf), fmt, arg);
	errno = serrno;
	perror(buf);
	ec |= ecm;
}

// NAME		dtn -- Delete trailing null bytes.
//
// SYNOPSIS	dtn file...
//
// DESCRIPTION	Delete trailing NUL bytes from each file named as an operand.
//		File are updated in place.
//
// OPERANDS
//	file	A pathname of a file to be truncated to have a length that does
//		not include any trailing NUL bytes.
//
// INPUT FILES	The input files must be regular files.
//
// STDERR	The standard error shall be used only for diagnostic messages.
//
// EXIT STATUS
//	0	All input files were successfully processed.
//	>0	An error occurred.
//
// CONSEQUENCES OF ERRORS
//		Default.

int
main(	int	argc,
	char	*argv[]) {

	ssize_t	buflen;	// number of bytes in buf[]
	int	fd,	// file descriptor
		i,	// loop control
		j;	// loop control
	off_t	nsize,	// new file size
		size;	// current file size

	for(i = 1; i < argc; i++) {
		if((fd = open(argv, O_RDWR)) == -1) {
			pe("Can't open \"%s\":", argv, 1);
			continue;
		}
		nsize = size = 0;
		while((buflen = read(fd, buf, sizeof(buf))) > 0) {
			for(j = 0; j < buflen; j++) {
				size++;
				if(buf[j])
					nsize = size;
			}
		}
		if(buflen) {
			pe("Read error on \"%s\": file will not be truncated:",
			    argv, 2);
		} else if(ftruncate(fd, nsize)) {
			pe("Truncation failed on \"%s\":", argv, 4);
		}
		close(fd);
	}
	return ec;
}

and then run make dtn to build it, you should have a utility (named dtn ) that you can invoke with any number of file operands you want and it will remove trailing NUL bytes from each of those files. (You will need read and write access to each file to do this.) I don't claim that it is highly efficient (it reads from the start of the file noting the offset of the last found non-NUL byte instead of reading from the end and searching for a non-NUL byte), but its performance should be reasonable for most regular files.

1 Like

Thank you very much, it works perfectly.

It looks like (even though we are not dealing with *nix text files due to missing <NL> char at the end) GNU sed could do it, not with octal constants, but with hex constants:

sed ':L;s/\000$//;tL' XX | hd
00000000  73 64 66 65 66 65 65 72  76 30 30 09 00 00 00 00  |sdfefeerv00.....|
00000010  00 00 00 00 00 00 00 00  00 00 00                 |...........|
sed ':L;s/\x00$//;tL' XX | hd
00000000  73 64 66 65 66 65 65 72  76 30 30 09              |sdfefeerv00.|

FreeBSD's sed doesn't work like above.

2 Likes

Not only are non-empty POSIX text files required to end with a <newline> character, they are not allowed to contain any NUL bytes either.

If GNU sed works with:

sed ':L;s/\x00$//;tL' file

does it also work with:

sed 's/\x00*$//' file

???

No idea what can be the cause, but with large files it doesn't remove all the trailing null characters.

A file of 3239571456 bytes resulted in 2902450247 with dtn.c
The same file resulted in 3239570892 bytes using sed on ubuntu 64 bits.
The same file resulted in 3239457516 bytes using sed on cygwin 32 bits.

I've compared the original file with the created with dtn.c using HexCmp and the result is the wanted, while using sed is not.

This doesn't work. This damages completely the file.

Yes, it does:

hd XX
00000000  64 6c 6b 6a 65 72 67 00  00 00 00 00 00 00 00 00  |dlkjerg.........|
sed 's/\x00*$//' XX | hd
00000000  64 6c 6b 6a 65 72 67                              |dlkjerg|
sed --version
sed (GNU sed) 4.2.2
Copyright (C) 2012 Free Software Foundation, Inc.

Does "trailing" mean "at the end-of-file" or "at the end-of-each-line"?

At the end of a binary file