awk for splitting file in constant chunks

Hi gurus,
I wanted to split main file in 20 files with 2500 lines in each file. My main file conatins total 2500*20 lines. Following awk I made, but it is breaking with error.

awk '{ for (i = 1; i <= 20; i++)  { starts=2500*$i-1; ends=2500*$i; NR>=starts && NR<=ends {f=My$i".txt"; print >> f; close(f)}  } }' main_file

I am new to awk, but I wanted to split using awk. Following is the error ->

awk: syntax error at source line 1
 context is
    { for (i = 1; i <= 20; i++)  { starts=2500*$i-1; ends=2500*$i; NR>=starts && NR<=ends >>>  { <<< 
awk: illegal statement at source line 1
awk: illegal statement at source line 1

Hi

 awk '!(NR%2500){i++;}{print > "f"i;}' i=1 file

Guru

1 Like

Can't you just use the split command which has especially been designed for such kind of task ?

man split

Remember that the entire outer code block runs once per line. 'print' doesn't cause it to read another line, even if you didn't have syntax errors your code would just be printing the same line over and over.

Also, $ doesn't mean "variable", $ means "column". If you just want the value of the variable you don't need $.

How the above program works is

# If NR is a multiple of 2500, increment i.
!(NR%2500){i++;}
# Print into fi.
{print > "f"i;}
1 Like

Thanks guys. I understood the concept. But this is ending up in error -
awk: syntax error at source line 1

awk '!(NR%2500){i++;}{print > "f"i;}' i=1 file
 context is
    !(NR%2500){i++;}{print > >>>  "f"i <<< ;}
awk: illegal statement at source line 1

Try:

awk '!(NR%2500){close(f); f="f" i++}{print>f}' i=1 file

--
(or use split as ctsgnb suggested)

Split is taking a lot of time, so I wanted to check with awk. Also the output of split is suffixed with aa, ab, ac ... which is not desired, so I need to rename all after splitting.
Dear Scrutinizer
your suggested code is also giving following error -

awk '!(NR%2500){f="f" i++}{print>f}' i=1 main_file 
awk: null file name in print or getline
 input record number 1, file main_file
 source line number 1

OK, try:

awk '!((NR-1)%2500){close(f); f="f" i++}{print>f}' i=1 main_file
1 Like

Thanks Scrutinizer
it worked :slight_smile:

By the way, is your file an ascii text file or does it contain some binary or raw datas?

Its ascii text file.