How to convert expanded hex decimal value in fields into CHAR FOR BIT DATA or CHAR type

sandjlee · November 11, 2020, 11:15pm

I have a data-file exported from a db2 table,
Some of the columns in db2 table column type is CHAR FOR BIT DATA.
I can export this db2 table as is and the problem is that the exported data-file contains un-redable characters because of column type is CHAR FOR BIT DATA.
see the below sample of exported data-file1:

"001234      ","  	PPtD   ",," 9WPX   ",20040327,"Y01"

to make this readable, I use hex function when export this table.
see below exported data-file2 using hex function:

"001234      ","20200916155016507444000000",,"20180614143957501758000000",20040327,"Y01"

Could you please advise how to convert back to data-file1 from data-file2?

Thanks,

John

vgersh99 · November 12, 2020, 1:11am

@sandjlee have you tried Google? Nothing promising?
Once again, I'd ask you to format your posts properly with the markdown tags. You'll get better "traction" with the correctly formatted posts - otherwise you're on your own.

jgt · November 12, 2020, 4:55pm

Assuming that you do not intend to modify the hex fields and then replace them back into the db2 source, just carry the hex fields forward into data-file2, as fields 7 and 8.

sandjlee · November 12, 2020, 5:04pm

Hello, vgersh99,

Thanks for your comment about the The Markdown Guide!
I've been trying to follow its guide and still tends to forget.
I will try my best I can.

For my quest to look for >How to convert hex decimal value in fields into CHAR() FOR BIT DATA or CHAR < , Title >How to skip ASCII special characters in fields when process records< may be better to explain my quest.

Let me explain again what my problem is:

After I export data from one of db2 table, I've noticed that some of columns in db2 table defined as CHAR() FOR BIT DATA which is ended up ACSII special characters in exported fields .
see below sample exported record:
-"001742 "," PPtD ",," 9WPX ",20040327,"Y01"

the hex decimal representation of field2 and field4 of above is:
-field2: "20200916155016507444000000"
-field4: "20180614143957501758000000"

After I ran some matching and deleting process, it generates record of output-file as below.
as you can see field2 is truncated from the starting position of hex value 0x00 in field2 and at the end of record, LF is shown.
-"001742 "," PPtD

and cat -vet output-file shown as below:
-"001742 "," ^I^V^UP^VPtD$

Expected output:
-"001742 "," PPtD ",," 9WPX ",20040327,"Y01"

Here is what I am looking for your advise and how to do it:

how to ignore or skipping field 2 and field 4 when to process and write record as is including field2 and field4 and as well as the rest of fields.

-input:
"001742 "," PPtD ",," 9WPX ",20040327,"Y01"
-output:
"001742 "," PPtD ",," 9WPX ",20040327,"Y01"

how to load back hex decimal fields field2 and field4 into `CHAR() FOR BIT DATA' type column in db2 table?

I thought I need to convert hex decimal to ASCII special characters as you can see in my sample input record.

-field2:
"20200916155016507444000000" to " PPtD "
-field4:
"20180614143957501758000000" to " 9WPX "

Is there any other way?

Thanks in advance!

sandjlee · November 12, 2020, 5:10pm

hello jgt,

Could you please explain why field 2 and field 4 carries forward as field7 and field8?
I could not understand what you mean.

John

jgt · November 12, 2020, 5:32pm

while IFS="," read f1 f2 f3 f4 f5 f6
do
echo f1, unhex(f2),f3 unhex(f4), f5, f6, f2, f4
done

The problem though might be that the input fields occasionally translate to a line feed character and cause anomalous results.

sandjlee · November 12, 2020, 8:14pm

hello jgt,
Could you please explain about echo f1, unhex(f2),f3 unhex(f4), f5, f6, f2, f4 and
what unhex(f2) does?

I think you might not fully understand issues I describe on my original post.
I described issues more in detail on 3rd post.
Could you please revisit it and let me know you advise again?

More precisely, this is what I should say about below some matching and deleting process:

After I ran some matching and deleting process,

awk -F, ' {sub(/\r$/,"")} FNR==NR {A[$1,$2,$3,$4,$5,$6,$7];next} !(($44,$6,$7,$1,$5,$12,$20) in A)' key-file.del data-file.del > output-file.del,

it generates record of output-file as below.
as you can see field2 is truncated from the starting position of hex value 0x00 in field2 and at the end of record, \n (LF) is shown.
-"001742 "," PPtD

Thanks in advance!

jgt · November 12, 2020, 10:29pm

Does this help. https://bytes.com/topic/db2/answers/182124-function-convert-bit-data-column-string

sandjlee · November 12, 2020, 10:49pm

Hi jgt,

As I've already mentioned on my first post, I generated two different outputs.

just as is. see below as an example.
"001234 "," PPtD ",," 9WPX ",20040327,"Y01"
using hex function on field2 and field4 whihc is on the link provide by you. see below.
"001234 ","20200916155016507444000000",,"20180614143957501758000000",20040327,"Y01"

Please revisit on my original post and 3rd post for more info .
I have two questions!

jgt · November 14, 2020, 2:36am

I am beginning to regret my original answer. I do not have any experience with db2. What I do have is experience with unusual data formats.
My suggestion to you is to write your application in a language that understands the data type you are dealing with. To the best of my knowledge, unix utilities do not support this data type.
You may be able to resolve this using cobol, pl1, rpg, sql or a odbc connector.

sandjlee · November 14, 2020, 4:26pm

Dear jgt,

First, I didn't look for help about db2. I need to know how to convert hex to ascii char prior to import or load data file into db2 table.

Second, you don't need to regret your original answer.

your mentioned unhex function lead for me to search the info about it and I have found a awk program posted at unhex.awk: Convert hex encoded strings to ASCII characters which I need to modify for my need.

The below is the source code posted I've found:

########################################################
# AWK script to convert hex encoded ASCII strings back #
# in to the printable ASCII characters that they       #
# represent.                                           #
# Non-printable characters are converted to a '?'.     #
#                                                      #
# unhex.awk v2013.04.01                                #
# https://malwaremusings.com/scripts/unhex-awk          #
#                                                      #
# usage: awk -f unhex.awk <inputfilename>              #
#   where inputfilename is a text file containing hex  #
#   encoded (0x...) strings.                           #
########################################################
  
/0x/ {
  while (s = match($0,"0[Xx][0-9A-Fa-f]+",matcharray) > 0) {
    chrstr = "";
    hexdigits = "0123456789abcdef";
    word = tolower(matcharray[0]);
    value = 0;
    for (nibble = 3;nibble <= length(word);nibble++) {
      char = substr(word,nibble,1);
      idx = index(hexdigits,char) - 1;
      value += idx * ((nibble % 2) == 1?16:1);
      if (idx == -1) printf("WARNING: Invalid hex digit %c\n",char) >"/dev/stderr";
      if (nibble % 2 == 0) {
        if ((value >= 0x20) && (value < 0x7f)) {
          chrstr = chrstr sprintf("%c",value);
        } else {
          chrstr = chrstr "?";
        }
        value = 0;
      }
    }
    $0 = substr($0,1,RSTART - 1) "0x\"" chrstr "\"" substr($0,RSTART + RLENGTH);
  }
  print;
}

My problem is that I am not experienced well enough to modify the above source code to my need.

I want to modify the above unhex.awk to apply some fields which contains expanded hex decimals ,such as "20200916155016507444000000", in a file which is comma delimited, instead of applying it to a whole record.

for example,

the first 2 digit 20 need to convert to 0x20 which is sp in acsii char
the last 2 digit 00 need to convert to 0x00 which is nul in ascii char
etc.

any advise will be appreciated greatly and thanks in advance.

John

jgt · November 14, 2020, 4:50pm

while read a h1 c h2 d e f
do
   u1=$((echo $h1|awk -f unhex.awk))
   u2=$((echo $h2|awk -f unhex.awk))
   echo $a $u1 $c $u2 $d $e $f
done

sandjlee · November 14, 2020, 4:56pm

Wow!, it is fast!
Is this a sample shell script I can use?
Where should I add this? How to use this?

Could you please explain what each line is for? (I am learning...)
Thanks!

jgt · November 14, 2020, 5:15pm

For the purpose of this example, I have ignored the fact that the input is a comma delimited file, and used a space separated file.

Line 1 reads the input
Line 2 starts a loop for each line in the input file
Line 3.  set u1 to the output of the command inside the double brackets.
             the command inside the brackets uses h1 as input into the awk process
Line 4 does the same for u2
line 5 outputs the original fields and the decoded fields
line 6 completes the looping process.

You have to modify the read statement to suit comma delimited input, and the output to suit your needs

sandjlee · November 14, 2020, 5:23pm

Adding IFS="," just before while read a h1 c h2 e f is good enough?

and

my actual data-file has 74 fields in a record.
question is "should I provide all 74 fields in read statement?

jgt · November 14, 2020, 5:41pm

My example shows $((.....)), this should be $(.....) not double brackets.
If your read statement contains fewer variables than the number of variables in the input, the last variable contains the data for the balance of the input. Presumably you should only have to define up to the last hex field plus one if the output file has the same format as the input.

sandjlee · November 15, 2020, 12:59am

I ran the following script:
nohup Convert_HEXDEC_CHAR_2_ASCII_CHAR.sh in-data-file-hex-2-ascii.del > out-data-file-hex-2-ascii.del

the source of script Convert_HEXDEC_CHAR_2_ASCII_CHAR.sh:

#!/bin/sh
# Convert exported hex decimal CHARS to ascii chars

IFS=","
while read $01 $02 $03 $04 $05 $06 $07 $08 $09 $10
           $11 $12 $13 $14 $15 $16 $17 $18 $19 $20
           $21 $22 $23 $24 $25 $26 $27 $28 $29 $30
           $31 $32 $33 $34 $35 $36 $37 $38 $39 $40
           $41 $42 $43 $44 $45 $46 $47 $48 $49 $50
do
  u02=$(echo $02|awk -f unhex.awk)
  u04=$(echo $04|awk -f unhex.awk)
  u19=$(echo $19|awk -f unhex.awk)
  u21=$(echo $21|awk -f unhex.awk)
  u43=$(echo $43|awk -f unhex.awk)
  u45=$(echo $45|awk -f unhex.awk)
  u49=$(echo $49|awk -f unhex.awk)

echo $01  $u02 $03  $u04 $05  $06 $07 $08 $09  $10
     $11  $12  $13  $14  $15  $16 $17 $18 $u19 $20
     $u21 $22  $23  $24  $25  $26 $27 $28 $29  $30
     $31  $32  $33  $34  $35  $36 $37 $38 $39  $40
     $41  $42  $u43 $44  $u45 $46 $47 $48 $u49 $50
done

the source of unhex.awk:

/0x/ {
  while (s = match($0,"0[Xx][0-9A-Fa-f]+",matcharray) > 0) {
    chrstr = "";
    hexdigits = "0123456789abcdef";
    word = tolower(matcharray[0]);
    value = 0;
    for (nibble = 3;nibble <= length(word);nibble++) {
      char = substr(word,nibble,1);
      idx = index(hexdigits,char) - 1;
      value += idx * ((nibble % 2) == 1?16:1);
      if (idx == -1) printf("WARNING: Invalid hex digit %c\n",char) >"/dev/stderr";
      if (nibble % 2 == 0) {
#       if ((value >= 0x20) && (value < 0x7f)) {
        if ((value >= 0x00) && (value < 0x09)) {
          chrstr = chrstr sprintf("%c",value);
        } else {
          chrstr = chrstr "?";
        }
        value = 0;
      }
    }
    $0 = substr($0,1,RSTART - 1) "0x\"" chrstr "\"" substr($0,RSTART + RLENGTH);
  }
  print;
}

the result:

./Convert_HEXDEC_CHAR_2_ASCII_CHAR.sh[5]: ./Convert_HEXDEC_CHAR_2_ASCII_CHAR.sh1: is not an identifier
./Convert_HEXDEC_CHAR_2_ASCII_CHAR.sh[6]: in-data-file-hex-2-ascii.del1:  not found
./Convert_HEXDEC_CHAR_2_ASCII_CHAR.sh[7]: 1:  not found
./Convert_HEXDEC_CHAR_2_ASCII_CHAR.sh[8]: 1:  not found
./Convert_HEXDEC_CHAR_2_ASCII_CHAR.sh[9]: 1:  not found

Questios:

How to define input and output file? I ran as below, is it wrong?
nohup Convert_HEXDEC_CHAR_2_ASCII_CHAR.sh in-data-file-hex-2-ascii.del > out-data-file-hex-2-ascii.del
Could you please explain messages returned?

jgt · November 15, 2020, 1:39am

remove the $ signs from the read statement. The $ sign is only used on the right side of an assignment statement.
So:

a="abc"
b=$a
echo $a $b

Also, variable names should start with a letter. Numeric variables are command line variables.
Replace all the $ characters on the read startment with "i" and replace the lines with unhex.awk
with:

u02=$(echo $i02|awk -f unhex.awk)

sandjlee · November 15, 2020, 11:02pm

I ran again after fixing upon your advise, this is the result:

 syntax error The source line is 2.
 The error context is
		  while (s = >>>  match($0,"0[Xx][0-9A-Fa-f]+", <<< matcharray) > 0) {
 awk: The statement cannot be correctly parsed.
 The source line is 2.
 syntax error The source line is 22.
./Convert_HEXDEC_CHAR_2_ASCII_CHAR.sh: Broken pipe
./Convert_HEXDEC_CHAR_2_ASCII_CHAR.sh9 9 9

Questions:

Could you explain the following sodes?

/0x/ {                                                         # what is this `/0x/` for?
  while (s = match($0,"0[Xx][0-9A-Fa-f]+",matcharray) > 0) {   # is this correct?
.
.
.
  }
  print;
}

immediatrly after /ox/ why { is there?
why ( present after read while ?
in this "0[Xx][0-9A-Fa-f]+", what + is for?

I've been trying to decode the first 2 lines of code but I could not find any clue or example.
Could you please explain about the first two line of code?

Thanks in advance.

vgersh99 · November 16, 2020, 1:04am

syntax error The source line is 2.
 The error context is
		  while (s = >>>  match($0,"0[Xx][0-9A-Fa-f]+", <<< matcharray) > 0) {
 awk: The statement cannot be correctly parsed.
 The source line is 2.
 syntax error The source

What's your OS?
Is your awk actually a gawk?
What dates awk --version return?