Reading binary content

Dear Gurus

I am stuck with the peice of work and do not know from where to start.

I get a machine generated file which is binary file contain binary data, i want to read binary data as it is without converting into any other format.

i want to read byte by byte.

Please let me know what unix command will show the content of the binary file in binary format itself.

when i do cat file name then the output looks like below content

z� cH$zp b$zp c$z� b$z� c*$zp b*$zp c�$z� b�$z� cF�$zp%+*$zp%*�$z�%*�$z�

I want to see in the binary format and read byte by byte.

any help will be greatly appriciated

Please use code tags as required by forum rules.

Did you consider the hexdump or od commands to show binary data?

No, I don't know the command but my requirment is i need to get in binary format only and donot want in hex or octal format

The bc utility can be used to transform from hex or octal to binary presentation. Something like (more or less untested):

od -An -w1 -b -v your.binary.file |while read BYTE; do 
   echo "ibase=8;obase=2;$BYTE" |bc
done

command is not returning any output

It should give you an idea how to solve it.
I do not have a Solaris box at hand and I do not know what shell you use - thats why I said untested. You'll have to put some effort into it and find the correct syntax for your environment.

Here is a working solution based on cero suggestion, although extremely inefficient. Shell scripting is not really adapted to this kind of tasks.

od -An -v -t d1 yourbinary |
  while read line ; do
    for byte in $line ; do
      if [ $byte -lt 0 ] ; then byte=$((byte+256)); fi
      printf "%08.8s" $(echo "ibase=10;obase=2;$byte"|bc)
    done
  done

Here is how I would do it in C:

main(){
        int i,j,cc;
        static unsigned char buf[1024];
        static unsigned char bin[256][9];
        for(i=0;i<256;i++) {
                for(j=0;j<8;j++) {
                        bin[7-j]=i&(1<<j)?'1':'0';
                }
                bin[8]=0;
        }
        while((cc=read(0,buf,1024))>0)
                for(i=0;i<cc;i++)
                        printf("%s",bin[buf]);
        printf("\n");
}

The shell script takes 2 min 10 s for a 10 kB file while the compiled C takes about 10 milliseconds for the same input.

Demo using OSX 10.7.5, default bash terminal...

#!/bin/bash
# bin.sh
dd if=/dev/urandom of=/tmp/binary count=1
for subscript in {0..511}
do
	num=`hexdump -n1 -s$subscript -v -e '1/1 "%u"' /tmp/binary`
	echo "Decimal number=$num..."
	echo "ibase=10; obase=2; $num" | bc
done

Results:-

Last login: Mon Oct 19 20:49:24 on ttys000
AMIGA:barrywalker~> cd Desktop/Code/Shell
AMIGA:barrywalker~/Desktop/Code/Shell> ./bin.sh
1+0 records in
1+0 records out
512 bytes transferred in 0.000129 secs (3969471 bytes/sec)
Decimal number=245...
11110101
Decimal number=234...
11101010
Decimal number=130...
10000010
Decimal number=107...
1101011
Decimal number=226...
11100010
Decimal number=34...
100010
Decimal number=146...
10010010
Decimal number=229...
11100101
Decimal number=129...
10000001
.
.
.
.
.
Decimal number=96...
1100000
Decimal number=171...
10101011
Decimal number=145...
10010001
Decimal number=39...
100111
Decimal number=30...
11110
Decimal number=27...
11011
Decimal number=71...
1000111
Decimal number=66...
1000010
Decimal number=169...
10101001
Decimal number=235...
11101011
Decimal number=245...
11110101
AMIGA:barrywalker~/Desktop/Code/Shell> _

EDIT:-
This might be of interest too...

shell doesn't have to be slow, but C will usually be faster.

od -An -t x1 -v yourfilename.bin |
 tr -s 'a-f ' 'A-F\012' |
 (echo 16;echo i;echo 2;echo o;sed -e 1d -e 's/^/F/' -e 'a\p') |
 dc -f - |
 sed 's/^1111//' |
 tr -d '\012'

(not totally convinced I know that the OP is wanting though)

This being the Solaris forum, you should stick to Posix or Solaris commands, options and syntax.

@wisecracker:
bash: hexdump: command not found

@cjcox: your code is indeed much better but has a several issues with Solaris.

sed: command garbled: a\p

With the a\p syntax fixed, the output is still broken:

0111111101000101empty stack01000110000000010000000100000001000000000000000000000
00000000000000000000000000000000000000000000000000011 is unimplemented0000001000
00000000000011000000000000000100000000000000000000000000000000142 is unimplement
ed000000000101000010000011010000000000000000000000000011 is unimplemented1000100
00010001100000000000000000000000000000000000000000000000000110100000000000010000
0000000000000011000000000001010000000000011 is unimplemented00010111000000000001
01010000000000000110000000000000000000000000001101000000000000000000000000000011
010000000000000001010000100011 is unimplemented000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000010100000000000000000000000
011 is unimplemented000000000000000000000000000000000000001100000000000000000000
0000111101000000000000000000000000000000000000000000000000000000000011 is unimpl
emented0000000000000000000000000000000000010001000000000000000000000000000000000
000000000000000000000000000010000000000000000000000000011 is unimplemented000000
00000000000000000000000000000000010000000000000000000000000000000000000000000000
00000000000000000000000000000001010000100011 is unimplemented0000000000000000000
00000000000000010142 is unimplemented000100000000000000000010142 is unimplemente
d000100000000000000000000010100000000000000000000000011 is unimplemented00000000
00000000000000010000000000000001000000000000000000000000000000000010000000000000
000000000000000000100000000001100000100011 is unimplemented000000000000000000000
000hex digit > 16out of stack space145 is unimplementedhex digit > 16out of stack

Hi,

Where is the file name in c program which read binary

It's reading from stdin - file descriptor 0.

Hi jlliagre...
OK, using shell and od only in OSX 10.7.5 default terminal limited to 32 bytes for this basic DEMO. It can easily be enhanced upon...
(Both hexdump and od are used in AudioScope.sh as CygWin does not have hexdump.)

#!/bin/sh
# bin.sh
dd if=/dev/urandom of=/tmp/binary bs=32 count=1
for subscript in {0..31}
do
	num=`od -An -N1 -j$subscript -tu /tmp/binary`
	echo "Decimal number ="$num"..."
	echo "ibase=10; obase=2; $num" | bc
done

Results:-

Last login: Tue Oct 20 14:00:40 on ttys000
AMIGA:barrywalker~> cd Desktop/Code/Shell
AMIGA:barrywalker~/Desktop/Code/Shell> ./bin.sh
1+0 records in
1+0 records out
32 bytes transferred in 0.000031 secs (1032444 bytes/sec)
Decimal number = 48 ...
110000
Decimal number = 82 ...
1010010
Decimal number = 118 ...
1110110
Decimal number = 152 ...
10011000
Decimal number = 42 ...
101010
Decimal number = 11 ...
1011
Decimal number = 185 ...
10111001
Decimal number = 185 ...
10111001
Decimal number = 101 ...
1100101
Decimal number = 238 ...
11101110
Decimal number = 6 ...
110
Decimal number = 11 ...
1011
Decimal number = 87 ...
1010111
Decimal number = 62 ...
111110
Decimal number = 12 ...
1100
Decimal number = 94 ...
1011110
Decimal number = 1 ...
1
Decimal number = 142 ...
10001110
Decimal number = 233 ...
11101001
Decimal number = 102 ...
1100110
Decimal number = 57 ...
111001
Decimal number = 39 ...
100111
Decimal number = 149 ...
10010101
Decimal number = 134 ...
10000110
Decimal number = 46 ...
101110
Decimal number = 94 ...
1011110
Decimal number = 238 ...
11101110
Decimal number = 200 ...
11001000
Decimal number = 111 ...
1101111
Decimal number = 38 ...
100110
Decimal number = 191 ...
10111111
Decimal number = 248 ...
11111000
AMIGA:barrywalker~/Desktop/Code/Shell> 

Try also this bash ism:

od -An -w1 -td1 file | while read VALUE; do for ((i=7; i>=0; i--)); do printf "%d" $(( (VALUE>>i)%2 )); done; printf "\n"; done

@wisecracker

With the correct shebang ("#!/bin/bash" as you use bash specific features), your script works under Solaris but is missing some code to answer the original question. Moreover, opening then skipping to each possible offset the input file for every byte in it is a very conterproductive approach.

@RudiC

"od -w" is non portable GNUism:

$ od -An -w1 -td1 file
usage: od [-bcCdDfFoOsSvxX] [-] [file] [offset_string]
       od [-bcCdDfFoOsSvxX] [-t type_string]... [-A address_base] [-j skip] [-N count] [-] [file...]
1 Like

Thanks. Alas, I don't have access to Solaris for tests.Try, then,

od -An -td1 file | tr -s ' ' $'\n'

Not sure if tr -s is portable, though.

:slight_smile: Sorry about that. I was in the Linux zone of mind.

@RudiC, still requiring some improvement:

$ od -x file
0000000 cea9 b0db 13f8 f4f5 837a a463 9066 3d1c
0000020 b201 61d4 2a5c 7732 bf55 b480 88e7 82a8
0000040
$ od -An -td1 file | tr -s ' ' $'\n' | while read VALUE; do for ((i=7; i>=0; i--)); do printf "%d" $(( (VALUE>>i)%2 )); done; printf "\n"; done
00000000
-10-10-100-1
-1-100-1-1-10
-1-10-1-10-1-1
-10-1-10000
-1-1-1-1-1000
00010011
-1-1-1-10-10-1
-1-1-1-10-100
01111010
-100000-1-1
01100011
-10-100-100
01100110
-100-10000
00011100
00111101
00000000
00000001
-10-1-100-10
-1-10-10-100
01100001
01011100
00101010
00110010
01110111
01010101
-10-1-1-1-1-1-1
-10000000
-10-1-10-100
-1-1-100-1-1-1
-1000-1000
-10-10-1000
-100000-10

Hmmm -

od -x file
0000000 6c73 6461 6a66 0a6b 6672 6676 000a

od -An -v -td1 file | tr -s ' ' $'\n' | while read VALUE; do  printf "%02X\t" $VALUE;   for ((i=7; i>=0; i--)); do printf "%d" $(( (VALUE>>i) %2 )); done; printf "\n"; done
00    00000000
73    01110011
6C    01101100
61    01100001
64    01100100
66    01100110
6A    01101010
6B    01101011
0A    00001010
72    01110010
66    01100110
76    01110110
66    01100110
0A    00001010

---------- Post updated at 19:19 ---------- Previous update was at 19:17 ----------

... works on FreeBSD as well ...

---------- Post updated at 19:21 ---------- Previous update was at 19:19 ----------

Not sure where the - signs come from. If you remove them, the output is correct.

You are testing with an ASCII file so are missing to observe these negative values. Use a truly binary input file.

1 Like