I have been doing binary experiments yet again and came across a superb piece of code...
I extracted a very small piece and re-wrote to suit my needs:-
#!/bin/bash --posix
# bash-hexdump
# Open the file $1 to be read with an fd 3.
exec 3<"$1"
saveIFS="$IFS"
IFS=""
char="00"
val="FF"
position=0
while read -s -u 3 -d '' -r -n 1 char
do
# """If the leading character is a single-quote or double-quote,
# the value shall be the numeric value in the underlying codeset
# of the character following the single-quote or double-quote."""
printf -v val "%02X" "'$char"
if [ ${#val} -gt 2 ]
then
position=$[ ( ${#val} - 2 ) ]
val="${val:$position:2}"
fi
echo -n " $val "
done
echo ""
IFS="$saveIFS"
# Finally ensure fd 3 is closed.
exec 3<&-
Results:-
Last login: Mon Oct 14 11:27:22 on ttys000
AMIGA:barrywalker~> ./bin_import.sh BinaryFile.dat
00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F 10 11 12 13 14 15 16
17 18 19 1A 1B 1C 1D 1E 1F 20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D
2E 2F 30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F 40 41 42 43 44
45 46 47 48 49 4A 4B 4C 4D 4E 4F 50 51 52 53 54 55 56 57 58 59 5A 5B
5C 5D 5E 5F 60 61 62 63 64 65 66 67 68 69 6A 6B 6C 6D 6E 6F 70 71 72
73 74 75 76 77 78 79 7A 7B 7C 7D 7E 7F 80 81 82 83 84 85 86 87 88 89
8A 8B 8C 8D 8E 8F 90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F A0
A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF B0 B1 B2 B3 B4 B5 B6 B7
B8 B9 BA BB BC BD BE BF C0 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE
CF D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 DA DB DC DD DE DF E0 E1 E2 E3 E4 E5
E6 E7 E8 E9 EA EB EC ED EE EF F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 FA FB FC
FD FE FF
AMIGA:barrywalker~>
NOTE: Binary 0, (zero), has been accepted...
QUESTION:
I understand how "exec" is creating a file descriptor from "$1" but how does this affect "read"'s ability to accept binary 0, (zero), for manipulation by "printf"...
This is just a wild guess: i suppose read itself doesn't have anything to do with it. Probably the shell itself "cooks away" binary zeroes as it digests input and read only gets what was put through the shell. As the shell never gets to see the file (at least not as "input", that is, from stdin ) read is served the whole story and not only parts of it.
The -n 1 makes read only one character instead of the line (up to a \n character).
Further the -r tells to not discard leading space characters (that only fits for line mode).
IMHO the stuff can be simplified to
#!/bin/bash --posix
# bash-hexdump
char="00"
val="FF"
position=0
while IFS="" read -r -n 1 char
do
# """If the leading character is a single-quote or double-quote,
# the value shall be the numeric value in the underlying codeset
# of the character following the single-quote or double-quote."""
printf -v val "%02X" "'$char"
if [ ${#val} -gt 2 ]
then
position=$[ ( ${#val} - 2 ) ]
val="${val:$position:2}"
fi
echo -n " $val "
done <"$1"
echo ""
exec has no effect whatsoever on the handling of the nullbyte.
No, it hasn't. That code is simply the beneficiary of serendipitous default behavior. The null byte is not in the variable, but printf fills any vacuums with zeroes/nullstrings, which in this case gives the correct result.
In short, it's a lucky break (but I don't think it's an unsafe dependence).
The code that you posted will not work under all circumstances. read works with characters but in a hexdump, bytes are what matter. If the locale specifies a multibyte encoding, the results will be incorrect.
I observed such breakage on one of my machines with the following environment:
The preceding result, with its string of incorrect zeroes, occurred on a Windows machine using Cygwin. In a Linux (Ubuntu) virtual machine, bytes 128-255 were also mangled, but instead of double zeroes the output consisted of 16-digit hexadecimal numbers.
Multibyte character issues are also a concern when generating the binary data with a locale-aware awk implementation (in this case, gawk):
OK a few tests, OSX 10.7.5, default bash terminal:-
The first block is MadeInGermany's, the last three characters should be newlines not NULLs.
Second is alister's and only allows KB input much akin to INKEY$ on this tool.
Third is mine unchanged from the original post, last three characters correct.
Lastly a real hexdump command showing the last three characters.
Alister, the original code I found did include LANG=C but I removed it. Thanks for the info...
The test binary code is attached as a "txt" file...
Last login: Mon Oct 14 18:32:36 on ttys000
AMIGA:barrywalker~> ./hex_dump3.sh /tmp/bin.dat
20 00 20 09 00 20 00 00 01 02 03 04 05 06 07 08 09 00 0B 0C
0D 0E 0F 10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F 20
21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F 30 31 32 33 34
35 36 37 38 39 3A 3B 3C 3D 3E 3F 40 41 42 43 44 45 46 47 48
49 4A 4B 4C 4D 4E 4F 50 51 52 53 54 55 56 57 58 59 5A 5B 5C
5D 5E 5F 60 61 62 63 64 65 66 67 68 69 6A 6B 6C 6D 6E 6F 70
71 72 73 74 75 76 77 78 79 7A 7B 7C 7D 7E 7F 80 81 82 83 84
85 86 87 88 89 8A 8B 8C 8D 8E 8F 90 91 92 93 94 95 96 97 98
99 9A 9B 9C 9D 9E 9F A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC
AD AE AF B0 B1 B2 B3 B4 B5 B6 B7 B8 B9 BA BB BC BD BE BF C0
C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF D0 D1 D2 D3 D4
D5 D6 D7 D8 D9 DA DB DC DD DE DF E0 E1 E2 E3 E4 E5 E6 E7 E8
E9 EA EB EC ED EE EF F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 FA FB FC
FD FE FF 00 00 00
AMIGA:barrywalker~> ./hex_dump4.sh /tmp/bin.dat
71
77
65
72
74
79
75
69
6F
70
AMIGA:barrywalker~> ./hex_dump5.sh /tmp/bin.dat
20 0A 20 09 0A 20 0A 00 01 02 03 04 05 06 07 08 09 0A 0B 0C
0D 0E 0F 10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F 20
21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F 30 31 32 33 34
35 36 37 38 39 3A 3B 3C 3D 3E 3F 40 41 42 43 44 45 46 47 48
49 4A 4B 4C 4D 4E 4F 50 51 52 53 54 55 56 57 58 59 5A 5B 5C
5D 5E 5F 60 61 62 63 64 65 66 67 68 69 6A 6B 6C 6D 6E 6F 70
71 72 73 74 75 76 77 78 79 7A 7B 7C 7D 7E 7F 80 81 82 83 84
85 86 87 88 89 8A 8B 8C 8D 8E 8F 90 91 92 93 94 95 96 97 98
99 9A 9B 9C 9D 9E 9F A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC
AD AE AF B0 B1 B2 B3 B4 B5 B6 B7 B8 B9 BA BB BC BD BE BF C0
C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF D0 D1 D2 D3 D4
D5 D6 D7 D8 D9 DA DB DC DD DE DF E0 E1 E2 E3 E4 E5 E6 E7 E8
E9 EA EB EC ED EE EF F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 FA FB FC
FD FE FF 0A 0A 0A
AMIGA:barrywalker~> hexdump -C /tmp/bin.dat
00000000 20 0a 20 09 0a 20 0a 00 01 02 03 04 05 06 07 08 | . .. ..........|
00000010 09 0a 0b 0c 0d 0e 0f 10 11 12 13 14 15 16 17 18 |................|
00000020 19 1a 1b 1c 1d 1e 1f 20 21 22 23 24 25 26 27 28 |....... !"#$%&'(|
00000030 29 2a 2b 2c 2d 2e 2f 30 31 32 33 34 35 36 37 38 |)*+,-./012345678|
00000040 39 3a 3b 3c 3d 3e 3f 40 41 42 43 44 45 46 47 48 |9:;<=>?@ABCDEFGH|
00000050 49 4a 4b 4c 4d 4e 4f 50 51 52 53 54 55 56 57 58 |IJKLMNOPQRSTUVWX|
00000060 59 5a 5b 5c 5d 5e 5f 60 61 62 63 64 65 66 67 68 |YZ[\]^_`abcdefgh|
00000070 69 6a 6b 6c 6d 6e 6f 70 71 72 73 74 75 76 77 78 |ijklmnopqrstuvwx|
00000080 79 7a 7b 7c 7d 7e 7f 80 81 82 83 84 85 86 87 88 |yz{|}~..........|
00000090 89 8a 8b 8c 8d 8e 8f 90 91 92 93 94 95 96 97 98 |................|
000000a0 99 9a 9b 9c 9d 9e 9f a0 a1 a2 a3 a4 a5 a6 a7 a8 |................|
000000b0 a9 aa ab ac ad ae af b0 b1 b2 b3 b4 b5 b6 b7 b8 |................|
000000c0 b9 ba bb bc bd be bf c0 c1 c2 c3 c4 c5 c6 c7 c8 |................|
000000d0 c9 ca cb cc cd ce cf d0 d1 d2 d3 d4 d5 d6 d7 d8 |................|
000000e0 d9 da db dc dd de df e0 e1 e2 e3 e4 e5 e6 e7 e8 |................|
000000f0 e9 ea eb ec ed ee ef f0 f1 f2 f3 f4 f5 f6 f7 f8 |................|
00000100 f9 fa fb fc fd fe ff 0a 0a 0a |..........|
0000010a
AMIGA:barrywalker~>
EDIT:
I forgot to add that I want to use the bash builtins only and not rely on transient commands like od,
xxd or hexdump to get my required results purely as an exercise to learn more about the limitations
of bash, (or any shell for that matter), scripting... You guys are a godsend to people like me and
certainly have put me on the straight and narrow...
My version does not limit you to keyboard input; itt reads from standard input, which you can redirect to your heart's content. Refer to how I invoked it in my previous post.
In fact, my script is the most flexible since it can accept input from the terminal, from a file, from a pipe, from anywhere that you care to redirect. All of the other options are limited to a filename.
Regards,
Alister
---------- Post updated at 03:31 PM ---------- Previous update was at 03:20 PM ----------
IFS affects how a line is split into fields. -d affects what a line is. Loosely speaking, in AWK parlance, IFS is akin to FS and -d to RS.
Without -d, \n is the default. read consumes the newline so it cannot be assigned to the target variable. When the variable expands to an empty string, printf substitutes the zeros seen.
You can see this happening back in post #4, in wisecracker's first response to your first post in this thread:
You ain't gonna like me... ;o)
Running your code:-
NOTE: I did discover this error from the very original code I found and cured it
with the condition code I added to make it work on this Macbook Pro...
(I have the pointer to the code from stackoverflow, it is near exactly the same as
transient command 'hexdump -C filename'.)
I saw similar result on a Linux system. Enforcing the C/POSIX locale fixes it.
When you're experimenting with different versions of the code, always post the code that generated the output. Don't make us guess what's going on (although, in this case, I'm confident I know that this code left out the locale fix).