Command as a SED Replacement is not working

Hello.

dd if=binarty_file 2>/dev/null | tr '\n' ' ' |
sed "s;\(.\{432\}\)\(.\).*;$(printf '%s' \\2 | od -An -tx);"

This output should be 00000001 but it shows 0000325c. Such byte I need to catch can have any hex value, but in this sample is 00000001.

I figured out the od -An -tx is not correcty receiving the printf '%s' \\2 input. I need to solve this issue by only changing $(printf '%s' \\2 | od -An -tx) and all the rest of code must remain as it is.

In any case, thank you.

Sorry for asking, but three things:

  1. Why would you replace any \n with ' ' in a binary file?
  2. Why are you convinced, that the thing you're looking for is exactly at a 433rd character position of sed's textual input/output? (after replacing all \n characters with spaces)
  3. Why do you believe 0000325c is a single byte when it's actually more? (using this syntax it's 4 bytes: 00 00 32 5C, grabbing a single character with \(.\) would mostly return 00 - if I understood correctly, that you're referring to an actual byte content and not the address of where it occurs)

BTW, as sed is not necessarily well suited for replacing any binary data, because it's designed to work with plain text, so what's the purpose of all this? I'm trying to understand.

2 Likes

Welcome!
The $( ) is a sub shell that is run first, then replaced by its output, then the result string is passed to sed.
The sub shell cannot get something from sed; a \\2 is simply a \ and a 2. The od shows them as hex 5c and 32.

3 Likes

Do you want to print byte number 433 in hex?
Then try the following bash code:

# Read the first 433 bytes into xx
read -N 433 xx < binarty_file
# Print the last character of xx as ord() in hex
printf "%x\n" "'${xx:0-1}"

If you insist on dd and od then try

dd if=binarty_file ibs=1 skip=432 count=1 2>/dev/null | od -An -tx
2 Likes

The default for od -t x is to show 4 bytes. If you want a one-byte replacement, use od -An -t x1.

In any case, the code would replace a single binary byte by two (or eight) hex characters (i.e. 0-9 and A-F), thereby changing the file length etc.

If you want to inject a single character, omit the od and give printf an ANSI-C string constant using the $'...' syntax.

There is also the issue that sed has syntax. If (for example), you require sometime that the replacement character shall be ; (hex 0x3B), then sed will break because you are using that as the separator for s; ; ;.

$ printf '%s' $'\x4A'
J$ 
$ printf '%s' $'\x4A' | od -A n -t x1a
  4a
   J
$

This seems an unusually restrictive case: "and all the rest of code must remain as it is". Is this homework? dd has options that will edit specific bytes in a file in situ. Using sed here is using a corkscrew to paint an aardvark.

$ echo > foo.binary abcdefghijklmnopqrstuvwxyz
$ od -A d -t x1a foo.binary
0000000  61  62  63  64  65  66  67  68  69  6a  6b  6c  6d  6e  6f  70
          a   b   c   d   e   f   g   h   i   j   k   l   m   n   o   p
0000016  71  72  73  74  75  76  77  78  79  7a  0a
          q   r   s   t   u   v   w   x   y   z  nl
0000027
$ time printf '%s' $'\x1A' |
>     dd of=foo.binary status=none conv=notrunc oflag=seek_bytes seek=17

real    0m0.005s
user    0m0.002s
sys     0m0.005s
$ od -A d -t x1a foo.binary
0000000  61  62  63  64  65  66  67  68  69  6a  6b  6c  6d  6e  6f  70
          a   b   c   d   e   f   g   h   i   j   k   l   m   n   o   p
0000016  71  1a  73  74  75  76  77  78  79  7a  0a
          q sub   s   t   u   v   w   x   y   z  nl
0000027
2 Likes

Some ideas:

(1) The tr keeps the byte positions the same, just converts the whole text into one looong line, so that the 433 is independent of line wraps. So I suspect the "binary" file is ASCII with a few special characters (like 0x01 SOH), and not much over 1KB.

(2) Because that's what the course leader specified.

(3) Irrelevant, because you cannot run a shell fragment from inside a sed substitution. You could make a one-byte substitution for the whole file and run the od through a pipe, but the prohibition on changing anything outside the sed itself seems to scupper this.

I'm wondering how UTF-8 works in this case, but not enough to research it.

As the given code already runs dd, no sed is needed.

EDIT:

I stand corrected: according to the version of this question posted in SO, GNU/sed has a sub-option to the s///e function which executes the replacement text as a shell command whose output is then substituted for that command, and can include back-references.

It does not specify which shell is used, and the performance might be dire, and there are far better solutions, but the problem as stated is (sadly) solvable.

3 Likes

Look at the code in the initial post. The tr transforms the entire file into one line, then sed skips the first 432 bytes, captures the following byte, and skips all the following. The \\2 is an attempt to refer to the match of the single character group.

If this SO is a cross-post then the answer is yes, or very close to a yes.

This command should outputs the hex value of the 18th byte (00000001 in this case)

But still something is wrong here, because a byte is represented by two hex digits.
So what is meant here: two bytes? Four bytes? One byte as a binary number (not hex)?

3 Likes

Difficult stuff. I have never tried this (experimental?) feature.
In contrast, the perl s///e (that runs embedded perl code) is well documented and quite mature.

2 Likes

I got a result. It does seem to run bash, and the quoting is messy, but basically:

$ echo > foo.binary abcdefghijklmnopqrstuvwxyz
$ sed -n < foo.binary 's+^\(.\{5\}\)\(...\).*$+printf "%s%s" "\2" "\1" | od -A n -t x1a+ep'
  66  67  68  61  62  63  64  65
   f   g   h   a   b   c   d   e
$

I am (almost) ashamed of that. I put in two backrefs for bravado.

1 Like

Firstly I would like to thank all you that used your precious time trying to help me.

Since a long time ago I knew the Unix Community, but that's the first time I asking in here. I feel this is an old school community with real gentle experienced people and not robots (like stackoverflow an others).

Well..

I just figured out the (printf '%s' "\\2") can get the match \2 . I checked it out by adding an od -An -tx out of sed context:

dd if=data_stream 2>/dev/null | tr '\n' ' ' |
sed "s;\(.\{432\}\)\(.\).*;$(printf '%s' "\\2");" | od -An -tx

... it did output 00000001 as it should be. However as I said before I need to solve this issue by only changing $(printf '%s' \2 | od -An -tx) and keeping all the rest of code as it is.

I feel we are close to solve this issue.

Thank you.

That's identical with

dd if=data_stream 2>/dev/null | tr '\n' ' ' | sed "s;\(.\{432\}\)\(.\).*;\\2;" | od -An -tx

and

dd if=data_stream 2>/dev/null | tr '\n' ' ' | sed 's;.\{432\}\(.\).*;\1;' | od -An -tx

Simpler,faster,safer is my earlier suggestion

dd if=data_stream ibs=1 skip=432 count=1 2>/dev/null | od -An -tx

safer: dd tolerates a \0 byte in the data_stream. The text-based sed stops if it meets a \0

1 Like

I really agree with you and I'd do it in another situations. However is necessary to concentrate the solution inside the main sed command.

I figured out the issue is at second part of $(printf '%s' "\\2" | od -An -tx) wich is " | od -An -tx".

I tried od <<< \\2 but it didn't work.

Is there any other way to makes OD process direct inputs (instead of file)? I just know this trick with <<<

Thanks.

As was pointed out, the shell code in $( ) is run before the sed, so cannot be used.
As @Paul_Pedant pointed out, GNU sed has the s///e hackfeature (here s;;;e):

dd if=data_stream 2>/dev/null | tr '\n' ' ' | sed "s;\(.\{432\}\)\(.\).*;printf '%s' '\\2' | od -An -tx;e"

Here the shell only substitutes \\2 to \2 then runs sed, that does the substitution then runs the resulting shell code.
The <<< construct only works in bash, and might work if /bin/sh is a link to bash

dd if=data_stream 2>/dev/null | tr '\n' ' ' | sed "s;\(.\{432\}\)\(.\).*;od -An -tx <<< '\\2';e"
1 Like

sed seems to fall over in a heap if the incoming file is truly binary (and thus looks likes UTF in places). I'm not sure why sh throws the error (especially as a file not found), but printf works, which implies full Bash mode ? "sed hack" is all too true.

Fib is a small C binary that outputs the Fibonacci series.

$ time tail --bytes=+433 Fib | head --bytes=1 | od -A n -t x1
 3c
real	0m0.036s
user	0m0.007s
sys     0m0.006s

$ time LC_ALL=C awk -v RS='' '{ printf ("%s", substr ($0, 433, 1)) | "od -A n -t x1"; }' Fib
 3c

real	0m0.020s
user	0m0.004s
sys     0m0.010s
$ dd if=Fib 2>/dev/null | tr '\n' ' ' | sed "s;\(.\{432\}\)\(.\).*;printf '%s' '\\2' | od -An -tx1;e"
sh: 1: ELF: not found

$ export LC_ALL=C

$ time dd if=Fib 2>/dev/null | tr '\n' ' ' | sed "s;\(.\{432\}\)\(.\).*;printf '%s' '\\2' | od -An -tx1;e"
 3c
real	0m0.067s
user	0m0.031s
sys	0m0.016s
$ 

Not often you see awk out-performing tail, head and sed by a factor of 2 or 3 !

1 Like

And

dd if=Fib ibs=1 skip=432 count=1 2>/dev/null | od -An -tx1

?
Should be very fast.
awk is a text processor, on binary data it's unsafe, like sed.

1 Like

IIRC, ibs=1 makes dd read single bytes. In GNU/dd, I get better performance using some newer iflag and oflag options which interact better with the bs, ibs, obs, block and count options, and with skip and seek. I suspect the difference is more marked with huge files.

time dd if=~/Fib status=none bs=4096 iflag=skip_bytes,count_bytes skip=432 count=1 | od -A n -t x1
3c
real    0m0.007s
user    0m0.001s
sys     0m0.010s

I do a lot of freaky stuff with GNU/awk (it's OK, I don't think anybody has noticed yet), and as long as I LC_ALL="C" locally, I never see an issue. It allows strings containing NUL, for instance. It will even split each byte of the whole 8-bit set into an array element, using the empty field separator.

I wrote an Awk emulator for base64 recently, and it exactly encodes and decodes bit-for-bit with the real thing, for a 160KB ELF binary. I tried all the BitWise functions, but the syntax is horrible, so I switched back to *, /, int() and %, with hex constants. However, it will read, write, encode and decode all 256 distinct byte values.

1 Like

And

od -j 432 -N 1 -An -tx1 Fib

?
Found after a tip from @munkeHoller. Stupid me, even a Posix od has got the -j skip -N count

1 Like

Drive by comment: Tools change and get new features. Don't beat yourself up too hard for not knowing about them.

Learning about new features is one of the reasons I participate in forums like this and periodically re-read the man pages of commands I use all the time.

3 Likes

My Dears,

The issue was solved this way:

LANG=C ;
dd if=data_stream 2>/dev/null | tr '\n' ' ' |
sed "s;\(.\{432\}\)\(.\).*;printf '%s' '\\2' | od -An -t x1;e"

I hope it helps someone someday.

Thank you.