Out of memory message

I have written a script to generate trigrams. The script runs perfectly on a relatively small file. When a large file is given, around 700 KB, I get the message

Counting trigrams ... 1614375 lines done.gawk32: trigrams.gk:12: (FILENAME=urmono.txt FNR=1614376) fatal: newnode: nextfree: can't allocate memory (No error)

I have Windows 10 with 32 Gb ram. I believe DOS cannot access all that ram and hence gives the above message.
Just in case my program has an error I am giving below the program

{ 
# $0 = tolower($0)                                   
   gsub(/[.,:;!?"<>\[\]#(){}]/,"")                     
    for(i=1; i<=NF; i++){ 
       trigram = word1 " " word2" " $i         
       word1 = word2                                  
       word2 = $i                                        
       count[trigram]++                               
      }
       printf "\rCounting trigrams ... %6d lines done.", NR > "CON"
  }
END {for (w in count)
      print count[w], "\t" w 

Many thanks for any solution to this problem. Truncating the file or dividing it into small chunks and running the script on them is not very satisfactory, since trigrams are repeated in different outputs and I need to combine all of them using another awk script, which makes things cumbersome.

Is 700 KB a mistake ?
Doesn't sound like a large file to me....

Can you show input and required output (a small portion of of course).

What 'DOS' are you referring to, what awk are you using in windows 10 environment (gnutools, cygwin, WSL ?)

Regards
Peasant.

1 Like

Hi,

I just checked your script on a linux system without any output with a file with 1/3 million words in it (filesize 2700 KB: I used this file: wordlist.xz).

Takes < 0.5 second and uses 100 MB of ram.

{
   gsub(/[.,:;!?"<>\[\]#(){}]/,"")
    for(i=1; i<=NF; i++){
       trigram = word1 " " word2" " $i
       word1 = word2
       word2 = $i
       count[trigram]++
      }
  }
/usr/bin/time awk -f t.awk wordlist
0.39user 0.02system 0:00.41elapsed 99%CPU (0avgtext+0avgdata 101800maxresident)k
0inputs+0outputs (0major+25315minor)pagefaults 0swaps

System Info:

CPU: Quad Core Intel Core i7-2600 (-MT MCP-) speed/max: 3051/3800 MHz Kernel: 4.19.0-8-amd64 x86_64 Up: 4h 08m 
Mem: 9670.3/15995.6 MiB (60.5%) Storage: 931.51 GiB (89.1% used) Procs: 284 Shell: bash 5.0.3 inxi: 3.0.32 

In a windows 10 virtual machine with 4 GB of RAM and with GNU awk 3.1.6 - downloaded as compiled binary it takes only little more time.

So it boils down to the question Peasant asked already...

3 Likes

Many thanks to all. I was using an older version of AWK. I installed the new version and got the results.