How to parse a binary file?

Hello to all in forum,

Please your help with this.

I have a binary file that is represented in Binary or Binary Coded Decimal (BCD).

Do you know a parser for this kind of binary files?

I have the structure format, but I don't know how and where to begin.

You answer would be very appreciated.

Take a look at this info:text processing - How to use bash script to read binary file content? - Unix & Linux Stack Exchange

Also post example input and define exactly what you want to do.

Post the definition of the structure format...

Hello spacebar and shamrock,

I want to convert to readable text the binary file to be able to parse it based on the structure defined.

Below a sample input after applying hexdump -C.

00000000  31 58 58 58 5f 58 58 58  5f 4c 58 58 36 30 39 30  |1XXX_XXX_XX06090|
00000010  37 32 36 30 30 33 31 07  ff ff ff ff ff ff ff ff  |7260031.........|
00000020  32 00 00 01 24 32 58 01  30 08 29 0f 47 42 05 00  |2...p...0.).PB..|
00000030  01 ff ff ff 00 15 00 0a  48 00 01 33 00 01 36 00  |........H..3..6.|
00000040  01 37 00 01 66 00 01 65  00 01 77 00 01 78 00 01  |.7..f..e..w..x..|
00000047  69 00 01 79 00 04 93 00  01 22 00 00 21 00 02 09  |i..y....."..!...|
00000060  00 01 26 00 01 0f 00 01  11 00 01 13 00 01 08 00  |..&.............|
00000070  01 2b 00 00 2c 00 00 2d  00 00 2e 00 00 55 00 01  |.+..,..-.....U..|
00000080  56 00 07 2a 00 00 2f 00  00 30 00 00 31 00 00 ff  |V..*../..0..1...|
00000090  34 00 80 19 32 c9 06 00  00 08 88 a0 e0 ca 0e 54  |4...2..........T|
000000a0  00 91 0b 47 43 39 90 10  8f ff ff 00 14 80 09 35  |...PC9.........5|
000000b0  c9 06 00 00 08 88 00 00  03 81 0f 01 02 00 00 00  |................|
000000c0  01 47 43 99 01 05 ff ff  ff 00 83 10 01 0c 00 00  |.PC.............|
000000d0  00 01 47 43 99 01 05 ff  ff ff 00 01 86 1d 02 0c  |..PC............|
000000e0  00 00 00 04 47 43 99 01  09 ff ff ff 00 0e 00 00  |....PC..........|
000000f0  00 04 47 43 99 01 09 ff  ff ff 00 87 0f 01 01 00  |..PC............|
00000100  00 00 03 47 43 99 01 08  ff ff ff 00 84 0e 00 00  |...PC...........|
00000110  00 00 00 00 01 01 ff ff  00 00 00 00 85 06 00 00  |................|
00000120  00 00 00 00 ff 32 00 00  02 24 32 58 01 30 08 29  |.....2...p...0.)|
00000130  2f 47 42 05 00 03 ff ff  ff 00 15 00 0a 48 00 01  |/PB..........H..|
00000140  33 00 01 36 00 01 37 00  01 66 00 01 65 00 01 77  |3..6..7..f..e..w|
00000147  00 01 78 00 01 69 00 01  79 00 04 93 00 01 22 00  |..x..i..y.....".|
00000160  00 21 00 02 09 00 01 26  00 01 0f 00 01 11 00 01  |.!.....&........|
00000170  13 00 01 08 00 01 2b 00  00 2c 00 00 2d 00 00 2e  |......+..,..-...|
00000180  00 00 55 00 01 56 00 07  2a 00 00 2f 00 00 30 00  |..U..V..*../..0.|
00000190  00 31 00 00 ff 34 00 80  19 32 c9 06 00 00 08 88  |.1...4...2......|
000001a0  a0 e0 ca 0e 54 00 91 0b  47 43 39 90 10 8f ff ff  |....T...PC9.....|
000001b0  00 14 80 09 35 c9 06 00  00 08 88 00 00 03 81 0f  |....5...........|
000001c0  01 02 00 00 00 01 47 43  99 01 05 ff ff ff 00 83  |......PC........|
000001d0  10 01 0c 00 00 00 01 47  43 99 01 05 ff ff ff 00  |.......PC.......|

Part of the structure definition is below.

Data representation:    Binary, Binary Coded Decimal (BCD), ISO
Numeral presentation:    Left-adjusted with trailing #F (where required) 
Text presentation:    Left-adjusted with trailing spaces (where required)
Filler:    Spaces for ISO characters and #F for BCD numbers and TBCD string

0    1    Numeral 1    Record type, administrative data 
ISO coded.
1    12    Identifier 1 - 12 characters    Exchange Identity 
12 most significant characters in the exchange identity.
ISO coded.
13    2    Digit string 00 - 99    Starting year for recording 
ISO coded.
15    2    Digit string 01 - 12    Starting month for recording 
ISO coded.
17    2    Digit string 01 - 31    Starting day for recording 
ISO coded.
19    2    Digit string 00 - 23    Starting hour for recording 
ISO coded.
21    2    Digit string 00 - 59    Starting minute for recording 
ISO coded.

Thanks for any advice.

---------- Post updated at 10:50 PM ---------- Previous update was at 12:29 AM ----------

Maybe somebody can advise.

I only want to know if someone has used some binary parser that let me put in a template for each parameter within the file the number of bytes, type of encoding of each byte etc... I mean, put de description of each byte of groups of bytes in a template to be able to parse it.

Thanks for any help.

Do you have the 'struct' or template?
Here is an example in perl:Perl pack and unpack : joakimbech.com

Hello spacebar,

Thanks for your help and for share that link, I'll check it!

The structure description is very long. But all is almost the same as the follwing description of the firsts bytes.

Position........Bytes........Value.....................................Field Description
0...................1...............Numeral 1..............................Record type, administrative data ISO coded.
1...................12..............Identifier 1 - 12 characters.....Site Identity (12 most significant characters in the site identity, ISO coded.)
13..................2...............Digit string 00 - 99.................Starting year for recording ISO coded.
15..................2...............Digit string 01 - 12.................Starting month for recording ISO coded.
17..................2...............Digit string 01 - 31.................Starting day for recording ISO coded.
19..................2...............Digit string 00 - 23.................Starting hour for recording ISO coded.
21..................2...............Digit string 00 - 59.................Starting minute for recording ISO coded.

Hi Ophicus...

00000000  31 58 58 58 5f 58 58 58  5f 4c 58 58 36 30 39 30  |1XXX_XXX_XX06090|

This does NOT make sense...

Have you manually altered ths?

Bytes 0 to 8 are correct...
Byte 9 is 0x4C and should read "L"...
Byte 10 is correct...
Byte 11 should read "X"...

However try this to extract a single binary byte from anywhere in the file then you can work on it...

http://www.unix.com/shell-programming-scripting/212715-another-building-block-binary-file-manipulation.html

Hello wisecracker,

Sorry, the below is a correct sample. I was doing some tests with the first sample and I changed bad the hex digits when I pasted here.

I read you code in the link you shared me. I see the code converts from hex each byte to decimal value to get the ascii char. But in
my file, some byte sequences BCD coded and should be interpreted as literal decimal values. For example the values highlighted in red
are the numbers 874401013008290 and 3742050001. Others are hexadecimal bytes sequences.

So, regarding the hexdump line below, how can I modifiy the hexdump command to print decimal values without convert it to hex value?,
I mean, when those sequences must be treated as decimal values literally.

number=`hexdump -n1 -s$subscript -v -e '1/1 "%u"' SomeBinaryFile.dat`

If there is way to put in a template the rules defined in description of the file in how must be treated the sequences of bytes, I'll be able
to parse the binary file.

Thanks for any help.

Regards.

Binary file

00000000  31 41 42 43 5f 4a 50 51  5f 50 54 30 38 30 33 30  |1ABC_JPQ_PT08030|
00000010  39 32 36 30 30 33 31 07  ff ff ff ff ff ff ff ff  |9260031.........|
00000020  32 00 00 01 87 44 01 01  30 08 29 0f 37 42 05 00  |2....D..0.).7B..|
00000030  01 ff ff ff 00 15 00 0a  48 00 01 33 00 01 36 00  |........H..3..6.|
00000040  01 37 00 01 66 00 01 65  00 01 77 00 01 78 00 01  |.7..f..e..w..x..|
00000050  69 00 01 79 00 04 93 00  01 22 00 00 21 00 02 09  |i..y....."..!...|
00000060  00 01 26 00 01 0f 00 01  11 00 01 13 00 01 08 00  |..&.............|
00000070  01 2b 00 00 2c 00 00 2d  00 00 2e 00 00 55 00 01  |.+..,..-.....U..|
00000080  56 00 07 2a 00 00 2f 00  00 30 00 00 31 00 00 ff  |V..*../..0..1...|
00000090  34 00 80 19 32 c9 06 00  00 08 88 a0 e0 ca 0e 54  |4...2..........T|
000000a0  00 91 0b 37 42 39 90 10  8f ff ff 00 14 80 09 35  |...7B9.........5|
000000b0  c9 06 00 00 08 88 00 00  03 81 0f 01 02 00 00 00  |................|
000000c0  01 37 42 99 01 05 ff ff  ff 00 83 10 01 0c 00 00  |.7B.............|
000000d0  00 01 37 42 99 01 05 ff  ff ff 00 01 86 1d 02 0c  |..7B............|
000000e0  00 00 00 04 37 42 99 01  09 ff ff ff 00 0e 00 00  |....7B..........|
000000f0  00 04 37 42 99 01 09 ff  ff ff 00 87 0f 01 01 00  |..7B............|
00000100  00 00 03 37 42 99 01 08  ff ff ff 00 84 0e 00 00  |...7B...........|
00000110  00 00 00 00 01 01 ff ff  00 00 00 00 85 06 00 00  |................|
00000120  00 00 00 00 ff 32 00 00  02 87 44 01 01 30 08 29  |.....2....D..0.)|
00000130  2f 37 42 05 00 03 ff ff  ff 00 15 00 0a 48 00 01  |/7B..........H..|
00000140  33 00 01 36 00 01 37 00  01 66 00 01 65 00 01 77  |3..6..7..f..e..w|
00000150  00 01 78 00 01 69 00 01  79 00 04 93 00 01 22 00  |..x..i..y.....".|
00000160  00 21 00 02 09 00 01 26  00 01 0f 00 01 11 00 01  |.!.....&........|
00000170  13 00 01 08 00 01 2b 00  00 2c 00 00 2d 00 00 2e  |......+..,..-...|
00000180  00 00 55 00 01 56 00 07  2a 00 00 2f 00 00 30 00  |..U..V..*../..0.|
00000190  00 31 00 00 ff 34 00 80  19 32 c9 06 00 00 08 88  |.1...4...2......|
000001a0  a0 e0 ca 0e 54 00 91 0b  37 42 39 90 10 8f ff ff  |....T...7B9.....|
000001b0  00 14 80 09 35 c9 06 00  00 08 88 00 00 03 81 0f  |....5...........|
000001c0  01 02 00 00 00 01 37 42  99 01 05 ff ff ff 00 83  |......7B........|
000001d0  10 01 0c 00 00 00 01 37  42 99 01 05 ff ff ff 00  |.......7B.......|
000001e0  01 86 1d 02 0c 00 00 00  04 37 42 99 01 09 ff ff  |.........7B.....|
000001f0  ff 00 0e 00 00 00 04 37  42 99 01 09 ff ff ff 00  |.......7B.......|
00000200  87 0f 01 01 00 00 00 03  37 42 99 01 08 ff ff ff  |........7B......|
00000210  00 84 0e 00 00 00 00 00  00 01 01 ff ff 00 00 00  |................|

Think of your problem!

hextext=`hexdump -n1 -s$subscript -v -e '1/1 "%02x"' SomeBinaryFile.dat`

This should give you the _hex_ byte at position "$subscript" as a 2 byte text string.
This should correspond with the "values" you see but auto-changed to ASCII.

Just use a loop and concatenate that which you need...

Hello wisecracker,

Thank you for your help.

I've been trying your code. But I'm not sure what happens when using the hexdump line you shared me in last post.

I want to interpret sequence of bytes as decimals (literal decimals). I'm using to test the code below:

start_offset=36
jump=1
last_byte_limit=43

for subscript in $( seq $start_offset $jump $last_byte_limit )
do
    hextext=`hexdump -n1 -s$subscript -v -e '1/1 "%02x"' BinSample.dat`
    char=$char`printf "%d" '0x'$hextext`
done
printf $char

The decimal sequence I would like to extract from the Binary sample in my previous post (in red) is from byte 36 to 43. The desired
result is: 87 44 01 01 30 08 29 0f. Exactly how it looks in hexdump, but the code is giving me another output (13568114884115)
and I don't know why.

May you please correct me.

Thanks in advance.

(How is it that I, a mere amateur, is showing a professional the way?)

Once again think of your problem!

Did you ACTUALLY read my other reply correctly?

start_offset=36
jump=1
last_byte_limit=43

subscript=0
hextext=""
char=""

for subscript in $( seq $start_offset $jump $last_byte_limit )
do
    hextext=`hexdump -n1 -s$subscript -v -e '1/1 "%02x"' BinSample.dat`
    # NOTE! A (white)space is added as this is YOUR requirement.
    char=$char" "$hextext
done
printf "$char"

I have not tested it but it should work...

EDIT:
Now tested on this Macbook Pro 22:42pm local UK time using the binary file generated in the original code:-

Last login: Tue May 21 18:19:33 on ttys000
Barrys-MacBook-Pro:~ barrywalker$ /Users/barrywalker/hextext.sh
 db da d9 d8 d7 d6 d5 d4Barrys-MacBook-Pro:~ barrywalker$ _

NOTE:- There is NO newline...

1 Like

Hello wisecracker,

Thank your for the help.

Actually I'm not a proffesional in this area of programming or unix, in fact I'm a newbie in bash scripts. I only asking in the forum where experts could help.

Now my problem would be how to decode based on description, because sequences of bytes are interpreted sometimes as iso coded, sometimes as decimal values, sometimes as binary coded. I think here is where a template for parsing is needed but I don't know how to do it helping me with the hexdump command.

With hexdump, is possible convert an hex number like "A" into binary?
A=1010.

Thanks for help so far.

Regards

I really have no idea how to parse NIBBLES directly but this as a byte transfer should be a starter and splitting into two halve should be very easy...

Search the forums for more info on adding leading zeros, lower to upper case, etc...
It's all on here...

Research the bc command for more info:-

#!/bin/bash
# NOTE! Uppercase required...
for bits in 00 FF 55 AA
do
        BITS=`echo "obase=2; ibase=16; $bits" | bc`
        printf "$BITS\n"
done

Test using this Macbook Pro...

Last login: Wed May 22 07:12:22 on ttys000
Barrys-MacBook-Pro:~ barrywalker$ ./hextobit.sh
0
11111111
1010101
10101010
Barrys-MacBook-Pro:~ barrywalker$ 

Hello again wisecracker,

Thank for your last sample script, It will help me in this issue as the other code you shared will be very useful for me.

Thanks again, I've learned several things from you in this thread.

Best regards.

Hi Ophiucus...

No problem glad to be of help...

I might put all this lot together as a demo _parser_ and upload to here as I never
even considered pure binary bit conversion.

I have only ever needed decimal and hex from a binary file for my amateur uses...

I think the moderators can safely close this thread now...

We don't close threads just because a member thinks the question has been answered.

This is not our policy at unix.com, nor our practice.

1 Like