Creating data delimited by ASCII code 1

dasun · June 20, 2019, 3:16am

<columnname1(binary1)columnvalue(binary1)columnname2(binary1)columnvalue(binary1)columnname3(binary1)columnvalue... 1st row/>
<columnname1(binary1)columnvalue(binary1)columnname2(binary1)columnvalue(binary1)columnname3(binary1)columnvalue.. second row/>

jim_mcnamara · June 20, 2019, 8:16am

Pure shell script does NOT work at all well with binary data. The main reason is that binary data will have the ASCII 0 character. For shell, this character marks the end of a string. I would try using the dd command instead. For a start on this.

What OS do you have
are the binary "fields" all exactly the same length, e.g., pick one: 4 bytes, 8 bytes, etc.
can you work with writing simple C code? or perl? This may be another option for you.

wisecracker · June 20, 2019, 1:26pm

Hi dasun...
As a starting point you could try something like this and is fully POSIX compliant...
Longhand on OSX 10.14.3, default bash terminal calling dash purely as a demonstration:

Last login: Thu Jun 20 17:47:09 on ttys000
AMIGA:amiga~> dash
AMIGA:\u\w> CSV=$( printf "\001" )
AMIGA:\u\w> echo '<text1'${CSV}'text2'${CSV}'text3'${CSV}'text4/>' > /tmp/CSV
AMIGA:\u\w> echo '<text1'${CSV}'text2'${CSV}'text3'${CSV}'text4/>' >> /tmp/CSV
AMIGA:\u\w> hexdump -C /tmp/CSV
00000000  3c 74 65 78 74 31 01 74  65 78 74 32 01 74 65 78  |<text1.text2.tex|
00000010  74 33 01 74 65 78 74 34  2f 3e 0a 3c 74 65 78 74  |t3.text4/>.<text|
00000020  31 01 74 65 78 74 32 01  74 65 78 74 33 01 74 65  |1.text2.text3.te|
00000030  78 74 34 2f 3e 0a                                 |xt4/>.|
00000036
AMIGA:\u\w> cat /tmp/CSV
<text1text2text3text4/>
<text1text2text3text4/>
AMIGA:\u\w> exit
AMIGA:amiga~> _

This ASSUMES you want the whole of (binary1) string replacing.

jim_mcnamara · June 21, 2019, 12:10pm

What happens in the above nice example when there is a 00 character? I do not think it will work as required. I do not have your system, but on mine (opensuse 13, bash, and dash) I get short fields (fewer bytes).

wisecracker · June 21, 2019, 1:10pm

Hi Jim M...

This is purely a demonstration and shows it can be done and with modifications the idea might suit the OP.
However, the OP specifically asked for Ctrl-A, 0x01...
So depending on ANY 8 bit field separator:

#!/usr/local/bin/dash
: > /tmp/FILE
CSV()
{
    # The OP specifically asked for Ctrl-A, 0x01 but for any binary...
    printf "\000"
} >> /tmp/FILE

# First line...
printf '<' >> /tmp/FILE
for TEXT in text1 text2 text3
do
    printf "${TEXT}" >> /tmp/FILE
    CSV
done
echo 'text4/>' >> /tmp/FILE

# Second line...
printf '<' >> /tmp/FILE
for TEXT in text5 text6 text7
do
    printf "${TEXT}" >> /tmp/FILE
    CSV
done
echo 'text8/>' >> /tmp/FILE
# And so on...

# Check it works...
hexdump -C /tmp/FILE

# How it views using 'cat'...
cat /tmp/FILE

Results, OSX 10.14.3, default bash terminal calling dash.

Last login: Fri Jun 21 17:28:12 on ttys000
AMIGA:amiga~> cd Desktop/Code/Shell
AMIGA:amiga~/Desktop/Code/Shell> chmod 755 ZERO_CSV.sh
AMIGA:amiga~/Desktop/Code/Shell> ./ZERO_CSV.sh
00000000  3c 74 65 78 74 31 00 74  65 78 74 32 00 74 65 78  |<text1.text2.tex|
00000010  74 33 00 74 65 78 74 34  2f 3e 0a 3c 74 65 78 74  |t3.text4/>.<text|
00000020  35 00 74 65 78 74 36 00  74 65 78 74 37 00 74 65  |5.text6.text7.te|
00000030  78 74 38 2f 3e 0a                                 |xt8/>.|
00000036
<text1text2text3text4/>
<text5text6text7text8/>
AMIGA:amiga~/Desktop/Code/Shell> _

jim_mcnamara · June 21, 2019, 1:55pm

Thanks! for the correction.

MadeInGermany · June 21, 2019, 3:17pm

A function can hide most complexity:

#!/bin/sh
csvline(){
  _s="<"
  for _i
  do
    printf "$_s$_i"
    _s=$sep
  done
  printf "/>\n"
}

sep=$(printf "\001")
{
csvline text1 text2 text3 text4 
csvline text5 text6 text7 text8 
} > outputfile