Dear all,
I use awk quite a bit for data wrangling ... today I find weird behavior that I cannot wrap my head around.
if I execute the following command (simplified to illustrate the behavior ... nothing to do with the real command)
bash-3.2$ awk 'BEGIN{for(i=1;i<=100000000;i++){for(j=1;j<=10;j++){s="a";}}}'
things are fine. I have relative constant memory usage (a couple of K). If I modify it slightly to
bash-3.2$ awk 'BEGIN{for(i=1;i<=100000000;i++){for(j=1;j<=10;j++){s="a"j;}}}'
I can watch this command starting to take gigabytes of memory (I am looking at it using "top").
It seems concatening a string and the number j causes problems, although in principle I store only one variable at any point in time. I am pretty sure that this never happened before. This is happening on MacOS.
If anyone has seen this, please do let me know what to do!
1 Like
Interesting. I could reproduce this, it is quite strange. It seems like a bug in BSD awk. I noticed gawk and mawk do not display this behavior, but BSD awk does...
An alternative way of doing it does not display the behavior:
awk 'BEGIN{for(i=1;i<=100000000;i++) for(j=1;j<=10;j++) s=sprintf("a%d",j) }'
but
awk 'BEGIN{for(i=1;i<=100000000;i++) for(j=1;j<=10;j++) s=sprintf("a%s",j) }'
does!
--
Exploring this further, the problem seems to be in the number-to-string conversion:
Problem:
awk 'BEGIN{for(i=1;i<=100000000;i++) for(j=1;j<=10;j++) s=2 "a" }'
No Problem
awk 'BEGIN{for(i=1;i<=100000000;i++) for(j=1;j<=10;j++) s="2" "a" }'
--
$ awk --version
awk version 20070501
$ uname -rs
Darwin 16.0.0
MacOS Sierra 10.12
4 Likes
Dear Scrutinizer,
thanks for the workaround(s). The sprintf trick works also for me without memory problem. Will have to check how this plays out in more complicated scripts.
In case you ever stumble upon an explanation, please do let me know. I would be more than interested in understanding this,
Thanks again!
P.S. just for comparability
bash-3.2$ awk --version
awk version 20070501
bash-3.2$ uname -rs
Darwin 15.4.0
bash-3.2$
I downloaded and compiled the latest version of "One True Awk" (using the Apple XCode command line tools) and with the resulting binary the problem did not occur, so apparently this was fixed somewhere along the line...
$ awk-latest --version
awk version 20121220
I filed a bug report with Apple..
---
https://developer.apple.com/download/more
GitHub - danfuzz/one-true-awk: Archive and history of One True Awk
--
FWIW, here are the steps I followed:
I had to make one adjustment in the makefile (comment out one of the YACC lines) to make it work:
YACC = bison -d -y
#YACC = yacc -d -S
then run:
make
sudo cp a.out /usr/local/bin/awk-latest
sudo cp awk.1 /usr/local/share/man/man1/awk-latest.1
--
BTW: this version also fixes the -vsomevar=someval
issue (there needs to be a space after -v
) that plagues the 20070501 version.
1 Like