I have a set of files without extensions. How can I programatically tell if a file is in gzip format? The gzip file format spec
RFC 1952 GZIP File Format Specification version 4.3
states that gzip files have certain hex/oct values at the beginning of the file.
1st byte = 0x1f in hex, \037 in octal
2nd byte = 0x8b in hex, \213 in octal
How can I search for these values to determine if a file is truly gzip format?
Thanks.
zaxxon
2
od -c infile.gz| head
0000000 037 213 \b \b � 201 Q L \0 003 i n f i l e
0000020 \0 \v N - � � � q � - � t 001 221 ~ 211
Here is one way of doing it using C
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#define GZIP_MAGIC "\037\213" /* Magic header for gzip files, 1F 8B */
int
main(int argc, char *argv[])
{
int fd;
char magic[2];
ssize_t ret;
if (argc < 2) {
fprintf (stderr, "ERROR: A filename must be specified.\n");
exit(1);
}
fd = open(argv[1], O_RDONLY);
if (fd < 0) {
fprintf (stderr, "ERROR: cannot open %s\n", argv[1]);
exit(1);
}
if ((ret = read(fd, &magic, 2)) == -1) {
fprintf (stderr, "ERROR: cannot read %s\n", argv[1]);
exit(1);
}
if (memcmp(magic, GZIP_MAGIC, 2) == 0) {
fprintf (stdout, "%s is a gzipped file\n", argv[1]);
} else {
fprintf (stdout, "%s is not a gzipped file\n", argv[1]);
}
close(fd);
exit(0);
}