bash: read file line by line (lines have '\0') - not full line has read???

alex_5161 · February 23, 2010, 12:34pm

I am using the while-loop to read a file.
The file has lines with null-terminated strings (words, actually.)
What I have by that reading - just a first word up to '\0'!
I need to have whole string up to 'new line' - (LF, 10#10, 16#A)

What I am doing wrong?

   #make file 'grb' with '\0's :
--0223-112518:~/develop/src> printf \
> "hello\0 word\0 done\0\n"\
> "this\0 next\0 line\0\n"\
> "last\0 ln\n"\
> >grb
--0223-112602:~/develop/src>
  # now reading it line by line:
--0223-112603:~/develop/src> while IFS= read ln; do echo "$ln"; done < grb
hello
this
last
  # - only first words are printed ?!?!
--0223-112714:~/develop/src> cat grb
hello word done
this next line
last ln
--0223-112738:~/develop/src>

How to get whole line inside of the loop?

Thanks!

radoulov · February 23, 2010, 4:07pm

I'm not sure if bash can handle null bytes (usually they don't belong to text files).
As a quick fix I would use another tool for parsing files containing null bytes.

By the way, some shells (I tried with zsh and pdksh using read -r) seem to handle it.

joeyg · February 23, 2010, 4:23pm

tr -d '\0' < textfile > newfile

alex_5161 · February 23, 2010, 5:47pm

I have checked it here (Solaris) and -r does not make a trick.

Yes, I can, but it is as 'hands up' on problem
By now I do it in perl, and that is pretty useful way, but I'd like to now how to be in such situation.
What I do not like in the 'tr' - need to create another file. Also, the removing (-d) is not useful as I need to read positioned fields, but replacing with spaces works.

 > cat grb|tr '\0',' '| while IFS= read ln; do echo $ln; done;
hello word done
this next line
last ln
 >

I am not sure: that way with the 'cat ..', it is, again, done on whole file, isn't it?
(And, it seems to me, there is some glitch in bash-2.05 in processing pipe by while (something about that I've experiensed about half year ago.) Seems something with asigning variables...
So, another point why I do not like that solution by 'tr..'

rowanthorpe · April 14, 2010, 1:26am

I realise this thread is over a month old, but I'll add my input even if it's no longer useful to the original poster, but just for others browsing. I realise the following code is far from elegant (ugly would be a good word), but it "works" (namely, it allows you to read full lines including nulls into a string for processing, one line at a time, without losing the nulls, using only Bash builtins).

Bash treats strings the same as C does (null-terminated), so it is obviously impossible to read in strings containing true nulls. The following code only breaks from the loop when the "read" command returns null (\0) twice in a row without other intervening text. It optionally adds an escaped null back into the string with each non-terminal read.

Generally it is far less painful to use something like Perl for this, but if you really are stuck in Bash and need a solution without external tools, maybe this will help.

printf \
  "hello\0word\0done\0\n"\
  "this\0next\0line\0\n"\
  "last\0ln\n"\
  > grb
buffer=""
xtra=""
while IFS= read -r -d '' ln; do
  buffer+="$xtra"
  ## If you wish to re-include the nulls as \0, which will work
  ## when you output with "printf", do this
  if [[ -n "$buffer" ]]; then
    buffer+="\0"
    ## otherwise
    #buffer+=" "
  fi
  buffer+="${ln%%$'\n'*}"
  xtra="${ln#*$'\n'}"
  if [[ "${ln/$'\n'}" != "$ln" ]]; then
    ## USE "$buffer" HERE HOWEVER YOU WISH
    printf "${buffer}\n" ## ..for example
    ## ...TILL HERE
    buffer=""
  else
    xtra=""
  fi
done <grb

When bash reads from a pipe it spawns a subshell, so any variables you assign within the subshell will disappear after the command/loop which reads from the pipe finishes. It's not a "glitch", it's a feature. Using shell redirection avoids this. For example:

blob=""
while read temp; do
  blob+="$temp"
done < filename
echo "$blob"

will work, but:

blob=""
cat filename | while read temp; do
  blob+="$temp"
done
echo "$blob"

will not.

AskApache · April 14, 2010, 2:13am

Very nice rowanthorpe.. I try to use bash builtins to spawn as few subshells and processes with fd's as possible.. always on a some server.

The pipe vs. redir question is one I've been trying to figure out too.. One of the best ways I've found to handle that is by using exec manually on fds. Consider this dos2unix clone and it's alternate way of determining input. N6=/dev/null personal pref..

dos2unixx ()
{
    [[ $# -eq 0 ]] && exec tr -d '\015\032' || [[ ! -f "$1" ]] && echo "Not found: $1" && return;
    for f in "$@";
    do
        [[ ! -f "$f" ]] && continue;
        tr -d '\015\032' < "$f" > "$f.t" && cmp "$f" "$f.t" > $N6 && rm -f "$f.t" || ( touch -r "$f" "$f.t" && mv "$f" "$f.b" && mv "$f.t" "$f" && rm -f "$f.b" ) >&$N6;
    done
}

And strangely enough, earlier today I was doing some work on my own builtin MORE command, basically I wanted a cat pager, this does pretty good but I've only had it a day..

shmore ()
{
    local l L M="`echo;tput setab 4&& tput setaf 7||echo -en  \"\e[34;01\"`   --- SH More ---   `tput sgr0||echo -e \"\e[m"`";
    L=1;
    while read l; do
        echo "${l}";
        ((L++));
        [ "$L" == "${LINES:-80}" ] && {
            L=1;
            read -p"$M" -u1
        };
    done
}

Finally, here's the shcat I use.. and if you do $ cat file | shcat | head, you get an error from the pipe issue you talk about. However you can work around it with an $ exec 2>&1 in the correct place.

shcat ()
{
    local l f e IFS="";
    e=0;
    if [ $# -eq 0 ]; then
        while read -r l; do
            echo "${l}";
        done;
    else
        for f in "$@";
        do
            if [ -r "${f}" ]; then
                while read -r l; do
                    echo "${l}";
                done < "${f}";
            else
                 < "${f}";
                e=1;
            fi;
        done;
        return $e;
    fi
}

Also, these 2 aliases I have created over time that work very well for stuff like this.

cata='exec 2>&1 cat -A'
cate='exec 2>&1 cat -v | sed s/\\^\\[/\\\\033/g'

---------- Post updated at 02:13 AM ---------- Previous update was at 02:08 AM ----------

seq -s`echo -ne \012` --format=%03g 0 128
seq -s`echo -ne \\012` --format=%03g 0 128
seq -s`echo -ne "\\012"` --format=%03g 0 128
seq -s`echo -ne "\\011"` --format=%03g 0 128
seq -s`echo -ne "\\010"` --format=%03g 0 128
seq -s`echo -ne "\\009"` --format=%03g 0 128
seq -s`echo -ne "\\002"` --format=%03g 0 128
seq -s`echo -ne "\\02"` --format=%03g 0 128
seq -s`echo -ne "\\2"` --format=%03g 0 128
seq -s`tput cols`` --format=%03g 0 128
seq -s`tput cols` --format=%03g 0 128
seq -s`tput sgr` --format=%03g 0 128
seq -s`tput eol` --format=%03g 0 128
seq -s`tput erase` --format=%03g 0 128
seq -s`tput bs` --format=%03g 0 128
seq -s`tput kbs` --format=%03g 0 128
seq -s'`tput kbs` ' --format=%03g 0 128
seq -s"`tput kbs` " --format=%03g 0 128
seq -s" `tput kbs` " --format=%03g 0 128
seq -s" `tput kbs`" --format=%03g 0 128
seq -s" `tput kbs` " --format=%03g 0 128

part of my shell session from earlier today... I was actually trying to make the separator be a null.. might be useful to know there are several ways to output nulls..

aa_print_ascii_chart ()
{
    local i;
    for i in `seq ${1:-0} ${2:-256}`;
    do
        echo -e "\\0$(( $i/64*100 + $i%64/8*10 + $i%8 ))";
    done
}

rowanthorpe · April 14, 2010, 4:58am

They are great scripts AskApache! I will read them in more depth when I get online later. At a glance at your shcat script, and the mention of the broken pipe problem, I remembered a thread over at the gnulib-bug-mailinglist, particularly this bit:

I tried running your shcat with the echo's replaced with printf's as below:

shcat ()
{
   local l f e IFS="";
   e=0;
   if [ $# -eq 0 ]; then
       while read -r l; do
           printf "${l}\n"
       done;
   else
       for f in "$@";
       do
           if [ -r "${f}" ]; then
               while read -r l; do
                   printf "${l}\n"
               done < "${f}";
           else
                < "${f}";
               e=1;
           fi;
       done;
       return $e;
   fi
}

but on my shell it still had the error... Strangely, when I used the non-builtin printf like so:

shcat ()
{
   local l f e IFS="";
   e=0;
   if [ $# -eq 0 ]; then
       while read -r l; do
           /bin/printf "${l}\n"
       done;
   else
       for f in "$@";
       do
           if [ -r "${f}" ]; then
               while read -r l; do
                   /bin/printf "${l}\n"
               done < "${f}";
           else
                < "${f}";
               e=1;
           fi;
       done;
       return $e;
   fi
}

it worked perfectly (but ridiculously slowly...).