How to convert expanded hex decimal value in fields into CHAR FOR BIT DATA or CHAR type

sandjlee · November 21, 2020, 5:18pm

I ran and returned as below:

echo $'\0' | cat -vet
$^@$

echo -e '\0' | cat -vet
-e ^@$

/usr/bin/printf '\0\n' | cat -vet
^@$

What does it mean?

vgersh99 · November 21, 2020, 5:22pm

@sandjlee in your shell what does the following return echo $0?

sandjlee · November 21, 2020, 5:27pm

do yo mean the result of this echo $0 | cat -vet?

echo $0 | cat -vet
-ksh$

ran after this 'awk -v nul="$(printf '\000')" ' is replace with awk -v nul="$'\000'" '
the result is shown $' and no warning message is returned.

cat -vet out-data-file-zero.del
"$'^A^B^C^D^E^F^G^H^I","^P 0@P`p",,,$
"M-^@M-^AM-^BM-^CM-^DM-^EM-^FM-^GM-^HM-^I","M-^PM-^QM-^RM-^SM-^TM-^UM-^VM-^WM-^XM-^Y",,,$

ran after this #!/bin/bash is replaced with #!/usr/bin/ksh93,
the result is still not shown nul: that means using /bash or /ksh93, the result is same.
-and awk -v nul="$(printf '\0')" ' is used.
-no warning message is returned.

cat -vet out-data-file-zero.del
"^A^B^C^D^E^F^G^H^I","^P 0@P`p",,,$
"M-^@M-^AM-^BM-^CM-^DM-^EM-^FM-^GM-^HM-^I","M-^PM-^QM-^RM-^SM-^TM-^UM-^VM-^WM-^XM-^Y",,,$

vgersh99 · November 21, 2020, 5:41pm

That's good enough, thanks.
Could we try one more thing. Change the she-bang line (in the latest script) to #!/usr/bin/ksh93 and run the script again.
And drop nohup from the execution line.

sandjlee · November 21, 2020, 6:06pm

for #!/usr/bin/ksh93, see the result in 2. of my previous post
let me explain what I am trying to achieve here. To describe better, I need to post an image because if I post the record, it shows not as it displayed in notepad++.

the original data: this is just a sample there are 74 fields in a record.
![alt text]
image823×170 17.1 KB

these ascii characters converted to hex decimal format as below to resolve issues caused by special characters in it:
"20200916154948266375000000","20180614143957501758000000"

after that, process these records and change some field's value and / or delete some records if matching condition is met and write it back to original record format as seen above image posted.

I've managed with help from vgersh99, MadeInGermany and jgt, to this far and my final challenge is to replace hex 00 to ASCII character NUL as seen from the image of original data.

vgersh99 · November 21, 2020, 8:30pm

well... I guess my bright idea wasn't as bright as I'd hoped - happens to me quite a bit lately (yes, it's a lame attempt at being funny - thought I'd mention it).

Seems like we cannot trick AiX awk - let's try to post process awk's output and trick it that way.
Change:

awk -v nul="$(printf '\000')" '

to

awk -v nul="$(printf '%b' '\350')" '

execute the script as:

./myScriptName.sh myFile2process | tr '\350' '\0'|cat -vet

Let's see what comes out

sandjlee · November 21, 2020, 9:19pm

Wow! this time is something different..

ran ./Convert_HEXDEC_CHAR_2_ASCII_CHAR.sh in-data-file-zero.del | tr '\350' '\0'|cat -vet after I replace with this awk -v nul="$(printf '%b' '\350')" ' and the result is:

"^@^A^B^C^D^E^F^G^H^I","^P 0@P`p",,,$
"M-^@M-^AM-^BM-^CM-^DM-^EM-^FM-^GM-^HM-^I","M-^PM-^QM-^RM-^SM-^TM-^UM-^VM-^WM-^XM-^Y",,,$

in awk -v nul="$(printf '%b' '\350')" ', what %b and \350 is for?
I could not find an info about %b but my understanding is replace whatever %b is to \350, correct?
what | tr '\350' '\0'|cat -vet is for? is it to replace \350 to \0?
what's next?
FYI, I've got the same result using the following but I wansn't sure what the meaning of ^@ is :

.
.

    if (i%2==0) {
      x=px*16+x
      if (x==0 && px==0) { v=v "^@" }
      else { v=(v sprintf("%c",x?x:nul)) }
      }
    else { px=x }
.
.

as the original data posted in my previous post, it is shown as NUL for hex00.
Question is "does these two represent same ASCII char?

Thanks in advance!

vgersh99 · November 21, 2020, 9:54pm

sandjlee:

Wow! this time is something different..

ran ./Convert_HEXDEC_CHAR_2_ASCII_CHAR.sh in-data-file-zero.del | tr '\350' '\0'|cat -vet after I replace with this awk -v nul="$(printf '%b' '\350')" ' and the result is:
"^@^A^B^C^D^E^F^G^H^I","^P 0@P`p",,,$
"M-^@M-^AM-^BM-^CM-^DM-^EM-^FM-^GM-^HM-^I","M-^PM-^QM-^RM-^SM-^TM-^UM-^VM-^WM-^XM-^Y",,,$
in awk -v nul="$(printf '%b' '\350')" ', what %b and \350 is for?
I could not find an info about %b but my understanding is replace whatever %b is to \350, correct?

do man printf to find out the meaning of the %b formatting option:

       %b     ARGUMENT  as  a string with '\' escapes interpreted, except that
              octal escapes are of the form \0 or \0NNN

tr '\350' '\0' - replace octal 350 with octal 0. For the rest, refer to man tr
cat -vet should be self-explanatory by now...

I think/hope (a joke) there's no "next" as the initial Q has been covered in more details than initially expected. The "take away" I think is for you to understand the pieces to the final solution and what issue we (ALL of us) had to overcome.
I think this thread came be closed with the proper post marked as a solution. I'll leave it up to the OP/@sandjlee to work out the closing logistics.

You're welcome.

In the nut shell. The last hurdle we had to overcome was the AiX awk inability to deal with the "null byte" as indicated by:

./Convert_HEXDEC_CHAR_2_ASCII_CHAR.sh: line 2: warning: command substitution: ignored null byte in input$

As we determined that AiX awk couldn't deal with ASCII NUL, we (ALL of us) decide to encode the ASCII NUL as ASCII octal 350 (probably could be any octal over 255 ASCII range) coming out of the script and convert it back to ASCII NUL (octal 0) by something other than awk that we hoped no such limitation. Seemed like AiX tr had no issue dealing with the conversion.
Bottom line, mnemonic approach dealing with ASCII NUL:
encode ASCII NUL as something else by awk -> decode the encoded char(s) back to ASCII NUL
NOTE: was thinking we could eclipse the number of posts for a single thread record set by Share a Classic Rock Music Video from the 1960s and 70s, but we've come very close. I don't think we need a recount

sandjlee · November 21, 2020, 10:17pm

Dear vgersh99,

Thanks for your help with your expert knowledge in awk and shell script on this thread.

I am waiting two more answer from you:

when I ran upon your advice, ./myscript.sh .... was used. what is ./ and when does it need?
Could you please answer my question 6. on my previous post?

Thanks in advance!

vgersh99 · November 21, 2020, 10:43pm

the ./ means find myscript.sh in the current working directory. If you run the script withOUT leading ./, shell will try to locate myscript.sh using the env var $PATH - which might NOT include the current working directory (./). I'll let you Google for a more detailed explanation.

sandjlee:

Could you please answer my question 6. on my previous post?
as the original data posted in my previous post, it is shown as NUL for hex00.
Question is "does these two represent same ASCII char?

ASCII NUL is 0 (or 00 or 00000) in any representation: decimal, octal, hex or any other N-based representation. Do ascii from the shell to see the ASCII table. Google for the relative terms for the more detailed explanation.

sandjlee · November 21, 2020, 10:48pm

I owe you my soul for your help!

Thanks and stay safe wherever you are.

John

sandjlee · December 1, 2020, 11:31pm

Update.

When I performed regression test with a big file, noticed that quotation in a middle of field causing an issue. it is hex decimal value 22 (x=34) and it need to be converted as ascii character "" (double quotations) because when it is imported back, "" is loaded as a single quotation (").
therefore, I changed the function hex2str as below if (x==34):

# return the unhexed h string
function hex2str(h,  i,x,px,v) {
  h=tolower(h); sub(/^0x/,"",h)
  for(i=1;i<=length(h);i++) {
    x=index("0123456789abcdef",substr(h,i,1))-1
    if (i%2==0) {
      x=px*16+x
      if (x==0 && px==0) { v=v nul }
      else
        if (x==34) { v=(v sprintf("%c",x?x:nul)); 
                     v=(v sprintf("%c",x?x:nul))
                   }
        else { v=(v sprintf("%c",x?x:nul)) }
      }
    else { px=x }
  }
  return v
}

vgersh99 · December 2, 2020, 12:48am

To be honest, I don't understand:

What "hex decimal value 22" means. It's either decimal OR hex, it cannot be "hex decimal"
"because when it is imported back, "" is loaded as a single quotation (")." Is it single quote or one double quote? I don't follow your explanation.

But either way... If you have double quotes in the cab file, it usually means that you can potentially have an embedded comma in the field that needs to be distiguishéd from the cab field separator (CSV).

By simply removing double quotes could potentially break whatever data you're importing.

I'd be carefully by not simply "making it work" by tweaking the code, but rather considering the end result and the integrity of the imported data.

The integrity of the massaged data has already been discussed in this 72-posts thread in the number of posts already. No need to go over it again.

I'll let you reread the above and think about it and make whatever call you feel comfortable with.

Good luck with that.

sandjlee · December 4, 2020, 9:04pm

I may need to clarify about below:

it is hex decimal value 22 (x=34) and it need to be converted as ascii character "" (double quotations) because when it is imported back, "" is loaded as a single quotation (" )

you are correct. It should be read as hex value 22 not as hex decimal 22.
My issue was, after converted to a single quotation (") , I tried to import / load using db2 utility using a field contains a single quotation into one of the column in db2 table, and is caused a problem to resolve this issue, hex 22 need to converted to double quotation.

It may be depends on the need but for my case, double quotation is the solution. -eom-

system · February 2, 2021, 9:05pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.