I have written a script to generate trigrams. The script runs perfectly on a relatively small file. When a large file is given, around 700 KB, I get the message
Counting trigrams ... 1614375 lines done.gawk32: trigrams.gk:12: (FILENAME=urmono.txt FNR=1614376) fatal: newnode: nextfree: can't allocate memory (No error)
I have Windows 10 with 32 Gb ram. I believe DOS cannot access all that ram and hence gives the above message.
Just in case my program has an error I am giving below the program
{
# $0 = tolower($0)
gsub(/[.,:;!?"<>\[\]#(){}]/,"")
for(i=1; i<=NF; i++){
trigram = word1 " " word2" " $i
word1 = word2
word2 = $i
count[trigram]++
}
printf "\rCounting trigrams ... %6d lines done.", NR > "CON"
}
END {for (w in count)
print count[w], "\t" w
Many thanks for any solution to this problem. Truncating the file or dividing it into small chunks and running the script on them is not very satisfactory, since trigrams are repeated in different outputs and I need to combine all of them using another awk script, which makes things cumbersome.