Regex and backreference to replace in binary file

Hello to all,

I have this sed script that replaces hex strins within a binary file.

As you can see, I want to replace all bytes 4X with 2X (where X could take values 0 to F).

sed -e 's/\x40/\x20/g' -e 's/\x41/\x21/g' -e 's/\x42/\x22/g' -e 's/\x43/\x23/g' -e 's/\x44/\x24/g' -e 's/\x46/\x26/g' -e 's/\x47/\x27/g' file

I would like to know if it is possible to use regex and backreference for this in sed or perl, etc?

I'd like to be able to do something like this, but is not working:

sed -e 's/\x4\(.\)/\x2\1/g' file

Thanks in advance for any help.

If you show input data and expected output it will be helpful for us.

You can try simple loop in Perl:

perl -pe 'for ($i=0;$i<16;$i++){$x=sprintf "%X",$i; s/\x4$x/\x2$x/g}' file

Hello Akshay,

It could be any binary file, an image for example like attached. Only want to replace hex strings with another. In this case the string is only one byte 41 or 42 or 4X... with 21 or 22 or 2X using regex and backreference.

Hello bartus11,

Is not working, when I check with hexdump -C , it seems that actually is removing or deleting bytes.

I'd like to replace more than one hex string, in sed is easy only do multiple replacements using -e 's///g' .. -e 's///g' .. -e 's///g'.

Thanks for the help.

Although some versions of sed may work on binary files, the standards only require sed to work on text files. A simple way that should work on most current systems is to use tr:

LC_ALL=POSIX tr '\100-\117' '\040-\057' <in_file >out_file

Some implementations of the tr utility may accept hex ranges, but the standards only specify octal as shown above. If your current locale has an underlying code set that only has single-byte eight-bit characters (such as ASCII or EBCDIC; but not UTF-8), you can skip the LC_ALL=POSIX.

Note, however, that if your image includes geolocation, timestamp, or camera settings data in addition to the actual image, you might not like the results unless you can extract just the image data you want to be processed.

Hello Don,

Thanks for your help. It seems to work!

Is possible to do multiple replacements for example this to '\220-\233' '\300-\313' and '\100-\117' to '\040-\057' in a single command?

is possible to replace an hex string of more than one byte? for example ff4567 with 3A0013

Thanks again.

To do multiple ranges:

LC_ALL=POSIX tr '\100-\117\220-\233' '\040-\057\300-\313'<in_file >out_file

The tr utility translates characters or bytes; not strings or multi-byte sequences, and as I said before, tr isn't required to recognize hex. Look at the man page for tr on your system to determine if your tr utility has additional options or formats that may help you.

Hello Don,

I get it, thanks for your explanation and details. I think I'll need to use sed to replace multi-byte sequences and apply "tr " command to the output as workaround of replace with "regex/backreference", but with octal ranges.

It would be nice to know if with some utility or a unique command both task can be reached.

Thanks again