How to determine the encoding of this dd image?

I have extracted a raw data file from a magnetic tape using the dd command

Afterwards, I managed to read the extracted data with Bless HexEditor and I found out that at offset 0x200000 there is a sequence of value which originally was stored in a table.

I would like to extract this data and import it in an Excel or CSV file, but above all I would like to understand the data encryption.

This is an example of the data extracted in hexadecimal format and its representation in unsigned int little endian:

//Hex Not.    // Little-Endian unsigned int 32
10 00 00 00   // 84       (Ref Number)  
55 F1 2E 04   // 20070306 (Date)
A2 3F 32 01   // 15144184 (Time)
F8 14 E7 00   // 20070306 (Date)
A2 3F 32 01   // 15491037 (Time)
DD 5F EC 00   // 1
01 00 00 00   // 1
01 00 00 00   // 4486656
00 76 44 00   // 492
EC 01 00 00   // 1
01 00 00 00   // 814185724
FC 7C 87 30   // 814185728
00 7D 87 30   // 814185732
04 7D 87 30   // 16

New Line

10 00 00 00   // 84
55 F1 2E 04   // 20070306
A2 3F 32 01   // 15491037
DD 5F EC 00   // 20070306
A2 3F 32 01   // 15534889
29 0B ED 00   // 18
12 00 00 00   // 1
01 00 00 00   // 4486656
00 76 44 00   // 492
EC 01 00 00   // 1
01 00 00 00   // 814185724
FC 7C 87 30   // 814185728
00 7D 87 30   // 814185732
04 7D 87 30   // 16

Can you understand starting from this piece of converted data how the raw file is encoded and how I can create a bash script to decode it??

Thanks

Try the file command. It might give you some info about the file.

Do you suggest any particular parameter?

It returns that that file is a normal data stream (Very Useful :confused:)
The magic file cannot be found and the data is in a proprietary format.

Before we can help more: What is the source of the tape?

Please: Give the name of the OS or brand of machine that wrote it - the OS or brand of computer, not the tape drive name.

There are some ways to "resurrect" tape data from ancient systems.

The hardware configuration should be:
CPU Pentium II 400 MHz
Internal memory 128 Mb

Operating System: Windows 2000 server

If the data is in a proprietary format, you pretty much have to go back to the proprietor of that format.

You might get something useful by running "strings" against the file.

Thank you for your reply

I have heard many times that this job cannot be done without knowing the source of my data and how it has been written on the tape and I agree with you. Unfortunately, I must do it without any insight and this is the reason why I started looking at its hexadecimal notation with an hex editor.

The tape is a mix of metadata and voice audio and I need to figure out how I can decrypt both these source within the image.

Thanks to the hex editor I now understand that before each audio there is a piece information concerning the audio.

The information above is regarding this info.
My knowledge about reverse engineering is limited but I must do it (Job is Job :()

Apprarently I was previously wrong since I have copied and pasted this piece of data on this

http://www.scadacore.com/field-applications/miscellaneous/online-hex-converter.html

website and the data format seems to be big endian 32 :confused:

Btw I have typed this command in the shell to convert the file

cat file | strings --encoding=B > convToString

but something went wrong.

whereas the command

cat file | strings > convToString

convert it but with many errors

Any help?

I am not sure why you are using "cat" when strings strips all human readable text as standard:-

 strings /full/path/to/your/filename > /your/wanted/path/to/filename.txt

EDIT:-
Example:-

strings /bin/bash > /tmp/text
1 Like

How do you even know this is a mix of metadata and voice audio? Where did it come from? What were you told about it?

I know exactly what there is inside the tape because I have the tape and its content on my disk.

However, the problem is that I have other 10 tapes that were produced with the some machine that needs to be decrypted.

The tape contains voice calls written by a voice recorder system. Each tape has more or less 80000 calls and its metadata associated.

The software that created the tape was written in Borland C++ and I have disassembled it with IDA. The software is unknown (I even couldn't find a guide on google) The software is too complicated to be disassembled in a reasonable amount of time but at least it gives me some tips and advice.
For instance, the tape was written with this function

HANDLE WINAPI CreateFile(   _In_      LPCTSTR lpFileName, 
                            _In_      DWORD dwDesiredAccess,   
                            _In_      DWORD dwShareMode,   
                            _In_opt_  LPSECURITY_ATTRIBUTES lpSecurityAttributes,   
                            _In_      DWORD dwCreationDisposition,  
                            _In_      DWORD dwFlagsAndAttributes,  
                            _In_opt_  HANDLE hTemplateFile 
);

Regarding the string conversion the procedure mentioned earlier does not work at all, except for the fact that it finds some name here and there within the file.

My initial question was very precise about converting the hexadecimal notation to a Big Endian.

if you copy and paste this code:

13 42 53 52 56 20 45 6C 65 6D 65 6E 74 20 48 65 61 64 65 72 00 00 00 00 00 00 00 00 10 00 00 00 10 00 00 00 41 00 00 00 55 F1 2E 04 A2 3F 32 01 32 38 FB 00 A2 3F 32 01 B7 

to this website you will see that the conversion under UNIT 32 Big Endian is what I am looking for.

How can I do the same conversion in Linux Bash in order to convert my hexadecimal file (I know it is only a notation and the file is actually a binary file) to those number displayed under UNIT 32 Big Endian

Therefore, how can I do the following conversion in Bash:

54 00 00 00  -> 84        
A2 3F 32 01  -> 20070306 

Thanks

It certainly wasn't written to with that function. Opened, perhaps, written, no.

$ echo $((0x042ef155))
70185301
$

Looking at your original post how do you get this???

//Hex Not.    // Little-Endian unsigned int 32
10 00 00 00   // 84       (Ref Number)  
54 00 00 00   // 70185301 (Ref Number)
55 F1 2E 04   // 20070306 (Date)
A2 3F 32 01   // 15144184 (Time)
F8 14 E7 00   // 20070306 (Date)
A2 3F 32 01   // 15491037 (Time)
DD 5F EC 00   // 1
01 00 00 00   // 1

It should be something like this:-

//Hex Not.    // Little-Endian unsigned int 32
10 00 00 00   // 16 ???
54 00 00 00   // 84       (Ref Number)  
55 F1 2E 04   // 70185301 (Ref Number)
A2 3F 32 04   // 20070306 (Date)
F8 14 E7 00   // 15144184 (Time)
A2 3F 32 01   // 20070306 (Date)
DD 5F EC 00   // 15491037 (Time)
01 00 00 00   // 1
01 00 00 00   // 1

This will work within limits but will probably be slow:-

#!/bin/bash
# Assuming longword aligned and data ONLY...
# Endian convert...
> /tmp/data
ifs_str="$IFS"
IFS=" "
arraytext=(13 42 53 52 56 20 45 6C 65 6D 65 6E 74 20 48 65 61 64 65 72 00 00 00 00 00 00 00 00 55 F1 2E 04 10 00 00 00 10 00 00 00 41 00 00 00 55 F1 2E 04 A2 3F 32 01 32 38 FB 00 A2 3F 32 01 B7)
n=0
while [ $n -lt ${#arraytext[@]} ]
do
	printf "\x${arraytext[$n]}" >> /tmp/data
	n=$((n+1))
done
hexdump -C < /tmp/data
echo "Start of decoding..."
# Assume data starts after 'BSRV Element Header' hard coded for this demo...
n=20
while [ $n -lt ${#arraytext[@]} ]
do
	hex=${arraytext[$((n+3))]}${arraytext[$((n+2))]}${arraytext[$((n+1))]}${arraytext[$n]}
	printf "%u\n" $((0x$hex))
	n=$((n+4))
done
echo "End of decoding..."
IFS="$ifs_str"
exit 0

Results on OSX 10.7.5, default bash terminal...

Last login: Thu Jun 19 21:21:16 on ttys000
AMIGA:barrywalker~> ./endian.sh
00000000  13 42 53 52 56 20 45 6c  65 6d 65 6e 74 20 48 65  |.BSRV Element He|
00000010  61 64 65 72 00 00 00 00  00 00 00 00 55 f1 2e 04  |ader........U...|
00000020  10 00 00 00 10 00 00 00  41 00 00 00 55 f1 2e 04  |........A...U...|
00000030  a2 3f 32 01 32 38 fb 00  a2 3f 32 01 b7           |.?2.28...?2..|
0000003d
Start of decoding...
0
0
70185301
16
16
65
70185301
20070306
16463922
20070306
183
End of decoding...
AMIGA:barrywalker~> _

This does NOT take into account any voice data that may be in the file...

2 Likes

Thanks, this is what I was looking for, You are THE MAN.

I need just a final tip but it is off-topic (so feel free to answer). I know the audio data in encoded in G729.
My question is, how this data should be modified before giving it as input to the G729 decoder?

Just swapping bits is enough?

Does ffmpeg or sox have an argument that swap bits automatically and convert them to wav?

Hi...

I trust the Moderators will allow this...

I can't really answer your question definitively but a few pointers.

I was once told your best test gear is ears, eyes, nose, and throat so...

1) Assume the data is raw to start with.
2) Assuming that you are have OSS compatible, (Pulseaudio too?), /dev/dsp.
3) Feed the data into /dev/dsp something like...

cat /path/to/your/tape/data > /dev/dsp

The standard default /dev/dsp is:-
Sampling speed = 8KHz.
8 bit Unsigned integer depth.
RAW data.
MONO...
4) If your ears say this is dreadful then resort to SOX. This is seriously powerful so
thoroughly read the manual on it...
5) I have no idea whether the __audio__ on the tape is 8, 16, 24, 32 bit depth,
signed. unsigned, ULAW, ALAW, MONO/STEREO, etc, etc...
6) Assuming no kind of strange algorithm has been HW performed then an approximation
of sampling speed will be roughly your complete file-length divided by the number of
seconds duration _IF_ in MONO, 8 bit, unsigned mode which I suspect being a voice recorder it is.
(It is just as easy to work out sampling speed if the recording is in stereo too.)
Trial and error is your only method I am afraid...
7) After RAW start trying much older audio formats and come forwards to the latest
encodings...
Hope this helps...

EDIT:
Just noticed you mentioned G729 encoding.
The above still applies though, although RAW it probably is not... ;o)

1 Like

Sorry you are not the man, you are the KING

The encoded audio data probably doesn't need byte-swapping, that's generally the codec's job. I can't tell much else without seeing it, which doesn't seem likely to happen, confidential and all that :frowning: Good luck.

1 Like

Hi guys

Here there is an example of audio data
After having tried a lot any form of audio codec I need to ask your help

https://www.dropbox.com/s/1l041xgxb1yucd9/input

It should be encoded in a-law but I cannot get anything useful out of it. I used both sox and audacity

Can you please help me to convert it

Thanks

a-law what, though?

This audio come out from a voice logger. The logger apparently (not totally sure) was set to compress the data in a-law. So there could be two solutions. First, the logger was not properly set (very possible) and it is not a-law. Last this data should be manipulated first and then played :(.... any help??

FYI this audio file should be one among the following:

G711 a-law, u-law, mu-law, g726, g729, adpcm32, adpcm16, g723, and similar. The audio is telephony audio data

In Audacity, I think can open that with project -> import raw data, telling it the type is a-law and the sample rate is 8000hz. It might be music of some sort, the audio quality isn't good enough to tell.

Trying to open it any other way just gets me near-noise, so, I guess it is a-law, or close to.

It refuses to even try as g726/g729/etc, but if the headers were stripped off, that might not mean anything.