Strange Phenomena with records filed in variable

Cochise · September 25, 2014, 8:40am

Trying to find out whether there is a limit for the number of records that can be stored in a variable I set up this small script:

#!/usr/bin/ksh 
for ((i = 1; i < 21; i++)) 
do 
  n=$(($i*100)) 
  echo "Trying $n records:" 
  recs=$(head -$n error.log) 
  echo "$recs" | wc 
done

Strangely somewhere beyond 1500 records some bytes/words/records are added to the data. The output looks like this:

Trying 100 records: 
     100    1701   14803 
Trying 200 records: 
     200    3405   29497 
Trying 300 records: 
     300    5105   44207 
Trying 400 records: 
     400    6802   58874 
Trying 500 records: 
     500    8654   74208 
Trying 600 records: 
     600   10464   89113 
Trying 700 records: 
     700   12309  104241 
Trying 800 records: 
     800   14067  119066 
Trying 900 records: 
     900   15913  134357 
Trying 1000 records: 
    1000   17700  149270 
Trying 1100 records: 
    1100   19556  164703 
Trying 1200 records: 
    1200   21441  180414 
Trying 1300 records: 
    1300   23329  195869 
Trying 1400 records: 
    1400   25225  211358 
Trying 1500 records: 
    1500   27204  227203 
Trying 1600 records: 
    1603   29132  243107 
Trying 1700 records: 
    1703   31039  258401 
Trying 1800 records: 
    1815   32952  274627 
Trying 1900 records: 
    1915   34816  289880 
Trying 2000 records: 
    2015   36687  305311

What is going on here?

Scrutinizer · September 25, 2014, 9:29am

What happens when you change:

recs=$(head -$n error.log) 
echo "$recs" | wc

to

head -$n error.log | wc

Cochise · September 25, 2014, 11:48am

Changed script:

do
  n=$(($i*100))
  echo "Trying $n records:"
  head -$n error.log | wc
  recs=$(head -$n error.log)
  echo "$recs" | wc
done

New output:

Trying 100 records:
     100    1701   14803
     100    1701   14803
Trying 200 records:
     200    3405   29497
     200    3405   29497
Trying 300 records:
     300    5105   44207
     300    5105   44207
Trying 400 records:
     400    6802   58874
     400    6802   58874
Trying 500 records:
     500    8654   74208
     500    8654   74208
Trying 600 records:
     600   10464   89113
     600   10464   89113
Trying 700 records:
     700   12309  104241
     700   12309  104241
Trying 800 records:
     800   14067  119066
     800   14067  119066
Trying 900 records:
     900   15913  134357
     900   15913  134357
Trying 1000 records:
    1000   17700  149270
    1000   17700  149270
Trying 1100 records:
    1100   19556  164703
    1100   19556  164703
Trying 1200 records:
    1200   21441  180414
    1200   21441  180414
Trying 1300 records:
    1300   23329  195869
    1300   23329  195869
Trying 1400 records:
    1400   25225  211358
    1400   25225  211358
Trying 1500 records:
    1500   27204  227203
    1500   27204  227203
Trying 1600 records:
    1600   29129  243110
    1603   29132  243107
Trying 1700 records:
    1700   31036  258404
    1703   31039  258401
Trying 1800 records:
    1800   32937  274642
    1815   32952  274627
Trying 1900 records:
    1900   34801  289895
    1915   34816  289880
Trying 2000 records:
    2000   36672  305326
    2015   36687  305311

Scrutinizer · September 25, 2014, 12:37pm

OK, so the problem can be in the command substitution statement, since trailing white space is discarded (so if there are empty lines or spaces at the end of the head output, they will be removed)

Try:

recs=$(head -$n error.log; printf x)
echo "${recs%x}" | wc

But that will not be your problem here, since there apear to be more lines..

There can be an additional problem because you are using echo which is not standardized and may interpret special characters. Instead you could try

recs=$(head -$n error.log; printf x)
printf "%s\n" "${recs%x}" | wc

Cochise · October 1, 2014, 12:20pm

Sorry for the late response, shell shock kept me busy the past days.

I found the problem: \n is interpreted as newline, so a line in the file like ...

C:\Users\AD~1\AppData\Local\Temp\notes32C5CD\lnw01.gif

becomes

C:\Users\AD~1\AppData\Local\Temp
otes32C5CD\lnw01.gif

As shown above in my previous posts, the number of lines and words increases, while the number of bytes decreases.

What can be done so that such escape sequences are taken literally by printf or echo?

Corona688 · October 1, 2014, 1:32pm

This is bad:

printf "${some_arbitrary_variable}\n"

This is how it's intended to be used:

printf "%s\n" "${some_arbitary_varible}"

Scrutinizer · October 1, 2014, 2:32pm

See also answer #4

Cochise · October 2, 2014, 5:23am

Thank you Scrutinizer and Corona688.

printf "%s\n" "${some_arbitary_varible}"

... does the trick.

Btw,

print -r "${some_arbitary_varible}"

does the same in ksh.

Is there any special reason why you use

"${some_arbitary_varible}"

instead of

"$some_arbitary_varible"

?

Don_Cragun · October 2, 2014, 2:36pm

In this case, they are synonymous. The braces must be used if the character following the variable name is a valid character in a variable name and when referencing a multi-digit positional parameter. Some people always use braces around shell variable names to emphasize which characters are in the variable name.

Note that both:

${1}1
$11

expand to the contents of the 1st positional parameter followed by the character "1" (even if there are eleven or more positional parameters), but both:

${X11}
$X11

expand to the contents of the variable X11 even if X1 is defined and X11 has not been defined.

Corona688 · October 3, 2014, 9:51am

ksh has printf, too. And especially, printf "%s\n" "string" works the same everywhere, while "print" requires ksh.

What if you want to print $some_string_postfix where _postfix is not part of the variable? That will mess up. ${some_string}_postfix will work. { } are just good habits.