I am now beyond the limits of my POSIX knowledge here.
Below is a piece of code that runs perfectly well on small string lengths, BYTE sizes up to around 1KB, (3KB of octal text).
It generates byte vlues from 0x00 to 0xFF.
It passes Shell Check's requirements and runs perfectly well under 'dash'
The problem is, as the octal string becomes large, say the equivalent of say 16KB of pure binary data in octal format this is disastrously slow.
Because I have such bizarre requirements and intend to use builtins only I am stuck for a MUCH quicker way.
The 256 byte code below takes about 0.1 seconds to do the 'binary' part...
Any ideas?
#!/usr/local/bin/dash
# o2b.sh
# This passes ShellCheck and runs under dash!
# 256 bytes of octal values from 000 to 377.
octal="000001002003004005006007010011012013014015016017020021022023024025026027030031032033034035036037040041042043044045046047050051052053054055056057060061\
062063064065066067070071072073074075076077100101102103104105106107110111112113114115116117120121122123124125126127130131132133134135136137140141142143144145\
146147150151152153154155156157160161162163164165166167170171172173174175176177200201202203204205206207210211212213214215216217220221222223224225226227230231232\
233234235236237240241242243244245246247250251252253254255256257260261262263264265266267270271272273274275276277300301302303304305306307310311312313314315316317\
320321322323324325326327330331332333334335336337340341342343344345346347350351352353354355356357360361362363364365366367370371372373374375376377"
binary()
{
# Obtain octal string length.
length="${#octal}"
# Subscript position starts at 1 NOT 0."
position=1
# 3 character octal value to be read.
ooo="???"
while [ "$position" -lt "$length" ]
do
# These two lines obtain the octal value.
subtext1=${octal%"${octal#${ooo}}"}
subtext2=${subtext1#"${subtext1%???}"}
# Convert to pure binary.
printf '%b' \\"$subtext2"
# Increment values needed.
position=$(( position + 3 ))
ooo=$ooo'???'
done
}
binary "$octal" > /tmp/binary
# The line below is just for checking...
hexdump -C /tmp/binary
exit 0
Result:-
Last login: Sat Aug 27 22:02:46 on ttys000
AMIGA:barrywalker~> cd Desktop/Code/Shell
AMIGA:barrywalker~/Desktop/Code/Shell> ./o2b.sh
00000000 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f |................|
00000010 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f |................|
00000020 20 21 22 23 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f | !"#$%&'()*+,-./|
00000030 30 31 32 33 34 35 36 37 38 39 3a 3b 3c 3d 3e 3f |0123456789:;<=>?|
00000040 40 41 42 43 44 45 46 47 48 49 4a 4b 4c 4d 4e 4f |@ABCDEFGHIJKLMNO|
00000050 50 51 52 53 54 55 56 57 58 59 5a 5b 5c 5d 5e 5f |PQRSTUVWXYZ[\]^_|
00000060 60 61 62 63 64 65 66 67 68 69 6a 6b 6c 6d 6e 6f |`abcdefghijklmno|
00000070 70 71 72 73 74 75 76 77 78 79 7a 7b 7c 7d 7e 7f |pqrstuvwxyz{|}~.|
00000080 80 81 82 83 84 85 86 87 88 89 8a 8b 8c 8d 8e 8f |................|
00000090 90 91 92 93 94 95 96 97 98 99 9a 9b 9c 9d 9e 9f |................|
000000a0 a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 aa ab ac ad ae af |................|
000000b0 b0 b1 b2 b3 b4 b5 b6 b7 b8 b9 ba bb bc bd be bf |................|
000000c0 c0 c1 c2 c3 c4 c5 c6 c7 c8 c9 ca cb cc cd ce cf |................|
000000d0 d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 da db dc dd de df |................|
000000e0 e0 e1 e2 e3 e4 e5 e6 e7 e8 e9 ea eb ec ed ee ef |................|
000000f0 f0 f1 f2 f3 f4 f5 f6 f7 f8 f9 fa fb fc fd fe ff |................|
00000100
AMIGA:barrywalker~/Desktop/Code/Shell> _
OSX 10.7.5, default terminal but calling dash in the script.
Even though string operations on shell variables are relatively fast, stripping three characters off of a huge string takes a while. Consider splitting your huge string into shorter strings and feed them into your existing code in a loop. Note also that the standards say that octal values passed to the printf %b format specifier need to be in the format:
and your code is not supplying the leading 0 for octal values larger than 077.
I don't have dash installed on my system, but sh , bash , and ksh on OS X El Capitan Version 10.11.6 all produce the output you specified when running the following modified version of your script:
#!/bin/ksh
# alt_o2b.sh
# This has not been tested by ShellCheck and runs under sh, bash, and ksh!
# 256 bytes of octal values from 000 to 377.
split_octal="000001002003004005006007010011012013014015016017
020021022023024025026027030031032033034035036037040041042043044045046047
050051052053054055056057060061062063064065066067070071072073074075076077
100101102103104105106107110111112113114115116117120121122123124125126127
130131132133134135136137140141142143144145146147150151152153154155156157
160161162163164165166167170171172173174175176177200201202203204205206207
210211212213214215216217220221222223224225226227230231232233234235236237
240241242243244245246247250251252253254255256257260261262263264265266267
270271272273274275276277300301302303304305306307310311312313314315316317
320321322323324325326327330331332333334335336337340341342343344345346347
350351352353354355356357360361362363364365366367370371372373374375376377"
binary()
{
printf '%s\n' $split_octal | while read octal
do
# Obtain octal string length.
length="${#octal}"
# Subscript position starts at 1 NOT 0."
position=1
# 3 character octal value to be read.
ooo="???"
while [ "$position" -lt "$length" ]
do
# These two lines obtain the octal value.
subtext1=${octal%"${octal#${ooo}}"}
subtext2=${subtext1#"${subtext1%???}"}
# Convert to pure binary.
printf '%b' "\0$subtext2"
# Increment values needed.
position=$(( position + 3 ))
ooo=$ooo'???'
done
done
}
binary > /tmp/binary2
# The line below is just for checking...
hexdump -C /tmp/binary2
exit 0
Note that I changed the output file name form /tmp/binary to /tmp/binary2 so you can compare the results of the two scripts directly if you'd like to compare run-times of your script against this script and compare the output files produced.
When testing your script (using ksh instead of dash and with the 2nd operand to printf '%b' modified as shown in the script above, I get the same output as you got with both scripts. But, the script above ran in about 10% of the time needed to run your script. I would expect a considerably greater run time improvement for considerably longer input data.
Note also that although you were passing an argument to the binary function, the function you have defined does not use any positional parameters. Therefore, I have removed that operand from the function invocation.
This is at least an order of magnitude faster than my test code.
Works perfectly in dash, except for your typo for hexdump near the end of the script.
Off to try some big binary files now. I will keep you informed over the next few days.
Shell Check makes minor warnings on line 20...
printf '%s\n' $split_octal | while read octal
...to be:-
printf '%s\n' "$split_octal" | while read -r octal
Bit I don't see why it needs changing as the file is always going to be multiple 3 digit octal values only, along with newlines of course, so I have left as is...
Thanks a lot I have something to get my teeth into now.
Hi wisecracker,
Thank you for pointing out the iexdump (which has now been fixed in my post). I am not sure how that happened; I have hexdump in the code I copied and tried to paste.
It is good that you left the expansion $split_octal as is. With quotes around that expansion, the function won't work.
Yes. This is better. I hadn't noticed that the function was being called with an operand until after I had posted the script. One might also consider changing it to:
binary() {
while read octal
do
while [ "${#octal}" -gt 0 ]
do
subtx=${octal%"${octal#???}"}
octal=${octal#"${subtx}"}
printf '%b' "\0$subtx"
done
done
}
and invoke it with:
printf '%s\n' $split_octal | binary > output_file
or with:
binary < input_file > output_file
where input_file contains text similar to the string assigned to split_octal without the quotes (which would be handy if the data being processed is sometimes in a separate file and sometimes in a shell variable).