C program to read a binary file and search for a string?

newbie_01 · May 17, 2014, 7:27am

Hi,

I am not a C programmer. The only C exposure I have is reading and completing the exercises from the C (ANSI C ) Programming Language book:o

At the moment, I am using the UNIX strings command to extract information for a binary file and grepping for a particular string and the value after it.

While this is quite quick on a 50M file, it takes a while for a 500M file.

Can someone please advise where I can find some examples of reading a binary file that can search for a particular string and then exits when it finds it or maybe one that reads the first few bytes of a binary file where I assume the contents of the information is.

When running strings | grep, I assume it is running the strings on the whole 500M file and then doing a grep. I am hoping to be able to use a C program that will read only the first x bytes or read the binary file and exits once it found the information that it is looking for.

Any guidance much appreciated. Thanks in advance.

drl · May 17, 2014, 10:56am

Hi.

Using the GNU utilities, one could avoid resorting to c:

#!/usr/bin/env bash

# @(#) s1	Demonstrate searching binary file.

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
LC_ALL=C ; LANG=C ; export LC_ALL LANG
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C strings head grep

FILE=${1-$(which cut)}

pl " Input data file $FILE is type $(file $FILE)"

pl " Results, strings -- $(wc -l < $FILE) strings found." 

pl " Results, strings and first 2 occurrences:"
strings $FILE |
head -2 

pl " Results, strings and grep searching for \"bucket\""
strings $FILE |
grep bucket

pl " Results, grep, (expect only a message):"
grep bucket $FILE

pl " Results, grep, every occurrence:"
grep -a bucket $FILE

pl " Results, grep, only first 2 results:"
grep -a -m 2 bucket $FILE

exit 0

producing:

$ ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian 5.0.8 (lenny, workstation) 
bash GNU bash 3.2.39
strings GNU strings (GNU Binutils for Debian) 2.18.0.20080103
head (GNU coreutils) 6.10
grep GNU grep 2.5.3

-----
 Input data file /usr/bin/cut is type /usr/bin/cut: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.8, stripped

-----
 Results, strings -- 130 strings found.

-----
 Results, strings and first 2 occurrences:
/lib64/ld-linux-x86-64.so.2
__gmon_start__

-----
 Results, strings and grep searching for "bucket"
# buckets:         %lu
max bucket length: %lu
# buckets used:    %lu (%.2f%%)

-----
 Results, grep, (expect only a message):
Binary file /usr/bin/cut matches

-----
 Results, grep, every occurrence:
# buckets:         %lu
max bucket length: %lu
# buckets used:    %lu (%.2f%%)

-----
 Results, grep, only first 2 results:
# buckets:         %lu
max bucket length: %lu

This uses command cut binary file as the file to be searched.

See man pages for details.

Best wishes ... cheers, drl

achenle · May 17, 2014, 8:58pm

Better to use something like this:

head -c 32m | grep $STRING

That addresses the problem of strings searching the entirety of large files when only the first few MB are to be searched. (The example will search the first 32 MB.)

RudiC · May 18, 2014, 5:40pm

Or

dd if=file count=x | grep "string"

Note that x will default to 512 byte blocks.