I want to remove from a file the trailing null characters (0x00) and stop doing so when a different byte is found (which should not be deleted), and either put the result to the same file or a different one.
Any ideas?
I want to remove from a file the trailing null characters (0x00) and stop doing so when a different byte is found (which should not be deleted), and either put the result to the same file or a different one.
Any ideas?
Try:
tr -d '\0' < file_with_nulls > file_without_nulls
Note that the filename you use for the output file must NOT be a name of your input file.
That command removes all nulls. I need to remove only the ones that are at the end of the file just after the last non-null character, not in the middle of the file.
I misunderstood your requirements. Even though you said you wanted to remove trailing nulls, the way you said that you wanted to stop removing nulls when a different byte was found sounded like you wanted to stop removing null bytes when a non-null byte was found after a string of one or more null bytes. Just removing trailing NUL bytes could be done in a shell script using a combination of something like od
and grep
to find the address of the last non-NUL byte in a file and dd
to truncate a file to that length, but a relatively simple C program is probably easier. If you save the following in dtn.c
:
#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <sys/stat.h>
#include <unistd.h>
char buf[8192]; // I/O buffer
int ec; // exit code
// pe(format_string, format_string_argument, exit_code_modifier);
void
pe( const char *fmt,
const char *arg,
int ecm) {
int serrno; // hold area for errno
serrno = errno;
snprintf(buf, sizeof(buf), fmt, arg);
errno = serrno;
perror(buf);
ec |= ecm;
}
// NAME dtn -- Delete trailing null bytes.
//
// SYNOPSIS dtn file...
//
// DESCRIPTION Delete trailing NUL bytes from each file named as an operand.
// File are updated in place.
//
// OPERANDS
// file A pathname of a file to be truncated to have a length that does
// not include any trailing NUL bytes.
//
// INPUT FILES The input files must be regular files.
//
// STDERR The standard error shall be used only for diagnostic messages.
//
// EXIT STATUS
// 0 All input files were successfully processed.
// >0 An error occurred.
//
// CONSEQUENCES OF ERRORS
// Default.
int
main( int argc,
char *argv[]) {
ssize_t buflen; // number of bytes in buf[]
int fd, // file descriptor
i, // loop control
j; // loop control
off_t nsize, // new file size
size; // current file size
for(i = 1; i < argc; i++) {
if((fd = open(argv, O_RDWR)) == -1) {
pe("Can't open \"%s\":", argv, 1);
continue;
}
nsize = size = 0;
while((buflen = read(fd, buf, sizeof(buf))) > 0) {
for(j = 0; j < buflen; j++) {
size++;
if(buf[j])
nsize = size;
}
}
if(buflen) {
pe("Read error on \"%s\": file will not be truncated:",
argv, 2);
} else if(ftruncate(fd, nsize)) {
pe("Truncation failed on \"%s\":", argv, 4);
}
close(fd);
}
return ec;
}
and then run make dtn
to build it, you should have a utility (named dtn
) that you can invoke with any number of file operands you want and it will remove trailing NUL bytes from each of those files. (You will need read and write access to each file to do this.) I don't claim that it is highly efficient (it reads from the start of the file noting the offset of the last found non-NUL byte instead of reading from the end and searching for a non-NUL byte), but its performance should be reasonable for most regular files.
Thank you very much, it works perfectly.
It looks like (even though we are not dealing with *nix text files due to missing <NL> char at the end) GNU sed
could do it, not with octal constants, but with hex constants:
sed ':L;s/\000$//;tL' XX | hd
00000000 73 64 66 65 66 65 65 72 76 30 30 09 00 00 00 00 |sdfefeerv00.....|
00000010 00 00 00 00 00 00 00 00 00 00 00 |...........|
sed ':L;s/\x00$//;tL' XX | hd
00000000 73 64 66 65 66 65 65 72 76 30 30 09 |sdfefeerv00.|
FreeBSD's sed
doesn't work like above.
Not only are non-empty POSIX text files required to end with a <newline> character, they are not allowed to contain any NUL bytes either.
If GNU sed
works with:
sed ':L;s/\x00$//;tL' file
does it also work with:
sed 's/\x00*$//' file
???
No idea what can be the cause, but with large files it doesn't remove all the trailing null characters.
A file of 3239571456 bytes resulted in 2902450247 with dtn.c
The same file resulted in 3239570892 bytes using sed on ubuntu 64 bits.
The same file resulted in 3239457516 bytes using sed on cygwin 32 bits.
I've compared the original file with the created with dtn.c using HexCmp and the result is the wanted, while using sed is not.
This doesn't work. This damages completely the file.
Yes, it does:
hd XX
00000000 64 6c 6b 6a 65 72 67 00 00 00 00 00 00 00 00 00 |dlkjerg.........|
sed 's/\x00*$//' XX | hd
00000000 64 6c 6b 6a 65 72 67 |dlkjerg|
sed --version
sed (GNU sed) 4.2.2
Copyright (C) 2012 Free Software Foundation, Inc.
Does "trailing" mean "at the end-of-file" or "at the end-of-each-line"?
At the end of a binary file