Split file by number of words

aavv · September 23, 2010, 9:56am

Dear all
I am trying to divide a file using the number of words as a condition. Alternatively, I would at least like to be able to retrieve the first x words of a given file. Any tips?
Thanks in advance.

jim_mcnamara · September 23, 2010, 10:12am

get a fixed number of words:

wcnt=200 # get first 200 words
awk -v w=$wcnt  '{for(i=1;i<=NF && cnt<w; i++) 
                                    {printf("%s ", $i); cnt++}   look here 
                           print
                          if(cnt>=w) {exit} } '  inputfile > outputfile

aavv · September 23, 2010, 10:19am

Thanks for the reply. I got an error:

awk: syntax error at source line 1
context is
{for(i=1;i<=NF && cnt<w; >>> i++, <<<
awk: illegal statement at source line 1
awk: illegal statement at source line 1

Thank you

jim_mcnamara · September 23, 2010, 10:42am

Made a change - my awk liked what I did, I forgot other awks may not. My bad. See old post above

If you are on Solaris use nawk not awk.

frans · September 23, 2010, 11:12am

N=200
while read L
do
for W in $L
do
((i++))
echo $W
((i>=N)) && break 2
done
done <file.txt

aavv · September 23, 2010, 11:55am

Thank you for your replies. jim mcnamara's suggestion worked best, as frans way of doing it works, but unfortunately messes up the original text format (writes one word in each line), which I'd like to retain. Thank you both, though.

frans · September 23, 2010, 12:07pm

just modified line 7. as follows

      echo -n "$W "

and added an echo between lines 9 and 10.

rdcwayx · September 23, 2010, 5:08pm

xargs -n1 <infile |head -200