Reading binary content

RudiC · October 20, 2015, 3:39pm

Ohhh yes - that's it. Be back later.

---------- Post updated at 21:39 ---------- Previous update was at 21:35 ----------

Try the -tu1 instead of the -td1 option to od .

jlliagre · October 20, 2015, 4:21pm

Closer but still a couple of issues:

$ od -tu1 file
0000000 014 066 202 255 019 245 240 233 158 067 242 233 144 115 182 186
0000020 160 127 044 156 086 172 078 022 251 106 139 122 032 244 210 223
0000040
$ od -tx1 file
0000000 0e 42 ca ff 13 f5 f0 e9 9e 43 f2 e9 90 73 b6 ba
0000020 a0 7f 2c 9c 56 ac 4e 16 fb 6a 8b 7a 20 f4 d2 df
0000040
$ od -An -v -tu1 file | tr -s ' ' $'\n' | while read VALUE; do  printf "%02X\t" $VALUE;   for ((i=7; i>=0; i--)); do printf "%d" $(( (VALUE>>i) %2 )); done; printf "\n"; done
00      00000000
0C      00001100
36      00110110
CA      11001010
FF      11111111
bash: printf: 019: invalid number
00      bash: 019: value too great for base (error token is "019")

Don_Cragun · October 20, 2015, 5:04pm

It is also compiler dependent. With some compilers type char is signed; in other compilers it is unsigned. In compilers with type char == type signed char, you'll get negative numbers from the bytes with the high bit set when printing in decimal format as in:

printf $'\xc4'$'\x74' |od -td1u1o1x1
0000000   -60 116                                                        
          196 116
          304 164                                                        
           c4  74                                                        
0000002

but on systems where od was built using a compiler with type char == type unsigned char, you'll get the output:

0000000   196 116                                                        
          196 116
          304 164                                                        
           c4  74                                                        
0000002

with the same input.

Note also that with od -An you get leading spaces where the address would go if you were printing it. That is causing some unwanted zeros to appear in the output corresponding to the start of each line of od output. To get around both of these problem you could try:

od -An -v -tu1 file | tr -s ' ' $'\n' |
    grep -v '^$' |
    while read VALUE
    do	printf "%02X\t" $VALUE
	for ((i=7; i>=0; i--))
	do	printf "%d" $(( (VALUE>>i) %2 ))
	done
    printf "\n"
done

I see RudiC already posted the -tu1 fix. I hope the explanation of why -td1 didn't work helps and that the added grep gives you what you want.

I don't have a Solaris system I can use for testing, but on OS X I'm not getting leading zeros printed by od -tu1 , but if that is a problem on Solaris systems, you could change the:

tr -s ' ' $'\n' | grep -v '^$'

in the pipeline to:

/usr/xpg4/bin/awk '{for(i=1;i<=NF;i++){sub(/^0+/,"",$i);printf("%u\n",$i)}}'

wisecracker · October 20, 2015, 5:11pm

How about this:-

#!/bin/sh --posix
# bin.sh
dd if=/dev/urandom of=/tmp/binary bs=16 count=1
od -tx1 /tmp/binary
echo "Binary decode of hexdump display above..."
for subscript in {0..15}
do
	printf "%08d\n" $(echo 'ibase=10; obase=2;'"$(od -An -N1 -j$subscript -tu /tmp/binary)" | bc)
done

Results for just 16 bytes generated...

Last login: Tue Oct 20 22:03:13 on ttys000
AMIGA:barrywalker~> cd Desktop/Code/Shell
AMIGA:barrywalker~/Desktop/Code/Shell> ./bin_old.sh
1+0 records in
1+0 records out
16 bytes transferred in 0.000036 secs (444430 bytes/sec)
0000000    b9  ed  ed  fe  39  86  fd  59  36  2e  65  dd  3c  ee  ef  ed
0000020
Binary decode of hexdump display above...
10111001
11101101
11101101
11111110
00111001
10000110
11111101
01011001
00110110
00101110
01100101
11011101
00111100
11101110
11101111
11101101
AMIGA:barrywalker~/Desktop/Code/Shell> _

jlliagre · October 22, 2015, 3:57am

@guddu_12 Beware that the initial C source code I posted had a bug, it is fixed now. It takes 30 ms for a 10k file.

@wisecracker {0..15} is not POSIX. In any case, as already stated, your algorithm is too slow to be usable with anything but tiny files. It takes 26 minutes for the same 10k file

@Don Cragun, with the extra awk code, your script works fine under Solaris and takes 17 seconds for the same 10k file.

Here is a mostly awk based solution that takes 230 ms:

od -An -v -tx1 binary_file |
                /usr/xpg4/bin/awk '
                BEGIN {
                        for(i=0;i<256;i++) {
                                s=""
                                p=128
                                for(j=0;j<8;j++) {
                                        if (int((i/p)%2)==0)
                                                s=s "0"
                                        else
                                                s=s "1"
                                        p=p/2
                                }
                                hi=sprintf("%02x",i)
                                bin[""hi]=s
                                # printf(" %s:%s\n",i,bin[""hi])
                        }
                }
                {
                        for(i=1;i<=NF;i++) {
                                printf("%s\n",bin["" $i])
                        }
                }'