EBCDIC to ASCII conversion

Hi,

We have a mainframe file which is in EBCDIC format.We dont have direct access to mainframe ,client has provided us the mainframe file.The mainframe file is containing pact data(COMP1 ,COMP2 etc) which are unreadble.Can anyone suggest me how to convert this kind of ebcdic file to ascii format(which are readble) through unix only.Any help in this regard will be highly appreciated.Thank

Please read again the last of your previous thread:

We tried to help you in your earlier thread: EBCDIC to ASCII conversion. That thread was closed because you refused to answer any questions about the format of the file you want to process. (Note that EBCDIC is not a format; it is a character set. The format of your file is not a text file; it is a binary file containing some text and some binary data. Without knowing the format of your file there is no way for us to help you at all in any attempts to extract the text or the binary data included in any of the records in that file.)

On top of what Don Cragun said, the "Post Here to Contact Site Administrators and Moderators" is NOT an adequate forum for technical questions. You've been warned before, and now received a (warning) infraction.

hi Don Cragun,

Im not sure about the file format.Its containing alphanumeric,special characters and out of keyboard characters. Copybook containing comp data.I have attached the sample input file for reference..

Where is the sample?
Was it meant to be an attachment? If so then it has not appeared.

Hi Swapna,
No. There is no copybook (whatever that is) attached to your last post. And there is no sample input file attached to your last post.

If you have the file and do not have the file format and are unable to determine the format by inspecting the data, why do you think that any of us who have never seen the data would have any chance of telling you how to decode it? Our crystal balls do not provide enough magic for us to be able to reliably guess at which bytes in records of unknown size refer to what in the unknown number of data fields in those records containing information about whatever they are intended to contain.

If you are not sure about the file format, contact your customer and get the file format from them. Without knowing the format of the data in the file you are processing, we do not have the information required to be able to help you do anything with your file.

Get the file format from you customer and then add a post to this thread that clearly defines that file format. Then we might be able to help you. Without you and us knowing the file format, there is nothing we can do to help you.

Against my better judgement, I am going to leave this thread open to give you one final chance to tell us the format of the file(s) you will be processing.

l would add to all that has already been said:
i worked 5 years end 80 but mostly early 90s with such files, as said EBCDIC, EBC, ASCII are not a format, COMP2 is a data format, only IT IS OF NO USE TO KNOW THAT, if you have no idea of the data structure, worse, this looks like a COBOL file, and COBOL file, you have to know the data structure AND the type, in those days I used to work with indexed files, but once I ruined one of the most important files of the site because I was given the structure, but on one told me it was a random file...
So sorry but we cannot help you with the information you have given so far, but I believe what you are looking for is a still in activity COBOL programmer you will have to pay for a week... because this is not an easy task depending on the file structure and organisation

Copybooks are used in same way that the include statement is used in php, or the ". filename" is used in bash.

FYI "packed format" is not part of the EBCDIC character set, it is a way of storing numerical data on the host. Basically the host has two forms of storing numerical data: "zoned (decimal)" and "packed (decimal)". Here is a thorough description.

Basically "zoned" stores a hexadecimal value in the lower nibble (the lower 4 bits of a byte) and an "F" in the higher nibble. In the last digit the higher nibble is used for storing the sign instead. i.e "123" (or "+123") would look like F1F1C3 , the "C" being the code for a positive number. This format can be translated easily into ASCII (simply replace the high-nibbles with "3" instead of "F").

The packed format stores a digit in each of the nibbles so that there is no direct translation of one byte EBCDIC to one byte ASCII. The sign is stored in in the last nibble but coded in various ways (depending on "format" versus "informat". Either "C" is positive and "D" is negative (format) or "A", "C", "E" and "F" is positive and "B" and "D" is negative (informat). If you end with an odd number of nibbles the leftmost byte is padded with a 0 (zero) in the high-nibble. You actually need to calculate the value from the bytewise representation to translate it to ASCII.

There are professional software packages for the file transfer between (z/OS) mainframes and the UNIX world. Connect: Direct, IND$FILE and XCOM Data Transfer, just to name a few. The problem with transferring mainframe data is that there are a lot more data formats on the mainframe than there are on UNIX. You have i.e. fixed-length-record data, you have packed and zoned decimals and lots more. It isn't (only) that easy 1:1 translation you have from one charset to another.

I hope this helps.

bakunin

I'm fully familiar of what packed decimal and zoned decimal are. But knowing how to interpret fields encoded in those formats if you don't know record boundaries and field boundaries is a wild guessing game.

I didn't know that a copybook was another language's name for what the C language calls an include file (thank you jgt).

Fortunately, I haven't had to use COBOL since 1975. I'm also fully aware that the UNIX dd utility was a joke showing how versatile UNIX utilities were and could even be made to use operands that were familiar to programmers used to writing IBM System 360 Job Control Language (JCL) card decks instead of all of the supposedly "confusing" single letter utility options used on most UNIX utilities. If you've never written JCL, the JCL DD statements described where various input and output files were to be found and/or placed for whatever job was to be run by that JCL deck and, especially if the file resided on a magnetic tape, the size of the blocks that were to be read from or written to the device used to access that file. Unfortunately (or fortunately, depending on how much you like JCL), the utility proved very useful when transferring files between UNIX systems and mainframes and we're still stuck with that syntax today.

But enough of this digression.

swapna_1990 has a file containing some EBCDIC text and some packed decimal, zoned decimal, or other binary data mixed in and wants it converted to ASCII text and "normal data" that is readable. Whether that data comes from a mainframe or from a C program written on a UNIX, Linux, or BSD system doesn't really matter. If you don't know the format of the data you're reading including field lengths or separators, field types, and record lengths or separators; then you don't know how to process that data. Until we get the data format from swapna_1990's customer, there is nothing we can do to guess at how that data could be extracted nor what tools might need to be used to do so. The fact that it came from a mainframe makes a COBOL program an obvious guess at something that might work. But from what we know so far, if swapna_1990 is extremely lucky, it is possible that a dd command coupled with output piped through some awk code might be able to work wonders. (Not highly likely, but there is a chance.)

Hi...

I decided to see which printable ASCII characters would appear in a 'cat /path/to/packed_decimal_filename.ext' and came to the conclusion that without even a hint of a sample there is literally no way of helping...

#!/bin/bash
# EBCDIC.sh

# These are all the PRINTABLE ASCII ONLY characters generated by "EBCDIC" packed numbers.
# Note this does NOT include byte value 0, NULL; Ctrl characters; extended characters above decimal 128.

# Upper 4 bits BCD 2 to 7.
for high_nibble in {2..7}
do
    # Lower 4 bits BCD 0 to 9.
    for low_nibble in {0..9}
    do
        decimal=$(( low_nibble+(high_nibble*16) ))
        printf "Low nibble = %d, high nibble = %d, decimal = %d,    character = " $low_nibble $high_nibble $decimal
        printf \\x$( printf "%02x" "$decimal" )".\n"
    done
done

Results, OSX 10.14.3, default bash terminal:

Last login: Fri Mar 22 20:31:10 on ttys000
AMIGA:amiga~> cd Desktop/Code/Shell
AMIGA:amiga~/Desktop/Code/Shell> ./EBCDIC.sh
Low nibble = 0, high nibble = 2, decimal = 32,    character =  .
Low nibble = 1, high nibble = 2, decimal = 33,    character = !.
Low nibble = 2, high nibble = 2, decimal = 34,    character = ".
Low nibble = 3, high nibble = 2, decimal = 35,    character = #.
Low nibble = 4, high nibble = 2, decimal = 36,    character = $.
Low nibble = 5, high nibble = 2, decimal = 37,    character = %.
Low nibble = 6, high nibble = 2, decimal = 38,    character = &.
Low nibble = 7, high nibble = 2, decimal = 39,    character = '.
Low nibble = 8, high nibble = 2, decimal = 40,    character = (.
Low nibble = 9, high nibble = 2, decimal = 41,    character = ).
Low nibble = 0, high nibble = 3, decimal = 48,    character = 0.
Low nibble = 1, high nibble = 3, decimal = 49,    character = 1.
Low nibble = 2, high nibble = 3, decimal = 50,    character = 2.
Low nibble = 3, high nibble = 3, decimal = 51,    character = 3.
Low nibble = 4, high nibble = 3, decimal = 52,    character = 4.
Low nibble = 5, high nibble = 3, decimal = 53,    character = 5.
Low nibble = 6, high nibble = 3, decimal = 54,    character = 6.
Low nibble = 7, high nibble = 3, decimal = 55,    character = 7.
Low nibble = 8, high nibble = 3, decimal = 56,    character = 8.
Low nibble = 9, high nibble = 3, decimal = 57,    character = 9.
Low nibble = 0, high nibble = 4, decimal = 64,    character = @.
Low nibble = 1, high nibble = 4, decimal = 65,    character = A.
Low nibble = 2, high nibble = 4, decimal = 66,    character = B.
Low nibble = 3, high nibble = 4, decimal = 67,    character = C.
Low nibble = 4, high nibble = 4, decimal = 68,    character = D.
Low nibble = 5, high nibble = 4, decimal = 69,    character = E.
Low nibble = 6, high nibble = 4, decimal = 70,    character = F.
Low nibble = 7, high nibble = 4, decimal = 71,    character = G.
Low nibble = 8, high nibble = 4, decimal = 72,    character = H.
Low nibble = 9, high nibble = 4, decimal = 73,    character = I.
Low nibble = 0, high nibble = 5, decimal = 80,    character = P.
Low nibble = 1, high nibble = 5, decimal = 81,    character = Q.
Low nibble = 2, high nibble = 5, decimal = 82,    character = R.
Low nibble = 3, high nibble = 5, decimal = 83,    character = S.
Low nibble = 4, high nibble = 5, decimal = 84,    character = T.
Low nibble = 5, high nibble = 5, decimal = 85,    character = U.
Low nibble = 6, high nibble = 5, decimal = 86,    character = V.
Low nibble = 7, high nibble = 5, decimal = 87,    character = W.
Low nibble = 8, high nibble = 5, decimal = 88,    character = X.
Low nibble = 9, high nibble = 5, decimal = 89,    character = Y.
Low nibble = 0, high nibble = 6, decimal = 96,    character = `.
Low nibble = 1, high nibble = 6, decimal = 97,    character = a.
Low nibble = 2, high nibble = 6, decimal = 98,    character = b.
Low nibble = 3, high nibble = 6, decimal = 99,    character = c.
Low nibble = 4, high nibble = 6, decimal = 100,    character = d.
Low nibble = 5, high nibble = 6, decimal = 101,    character = e.
Low nibble = 6, high nibble = 6, decimal = 102,    character = f.
Low nibble = 7, high nibble = 6, decimal = 103,    character = g.
Low nibble = 8, high nibble = 6, decimal = 104,    character = h.
Low nibble = 9, high nibble = 6, decimal = 105,    character = i.
Low nibble = 0, high nibble = 7, decimal = 112,    character = p.
Low nibble = 1, high nibble = 7, decimal = 113,    character = q.
Low nibble = 2, high nibble = 7, decimal = 114,    character = r.
Low nibble = 3, high nibble = 7, decimal = 115,    character = s.
Low nibble = 4, high nibble = 7, decimal = 116,    character = t.
Low nibble = 5, high nibble = 7, decimal = 117,    character = u.
Low nibble = 6, high nibble = 7, decimal = 118,    character = v.
Low nibble = 7, high nibble = 7, decimal = 119,    character = w.
Low nibble = 8, high nibble = 7, decimal = 120,    character = x.
Low nibble = 9, high nibble = 7, decimal = 121,    character = y.
AMIGA:amiga~/Desktop/Code/Shell> _

As one can see unless we have the EXACT format of the file then there is no way on earth to decode it.
There are some 'Ctrl' characters, ; binary 0 - NULL; a few in the _extended_ASCII_ region.