awk for splitting file in constant chunks

mukesh.lalwani · June 20, 2012, 10:07am

Hi gurus,
I wanted to split main file in 20 files with 2500 lines in each file. My main file conatins total 2500*20 lines. Following awk I made, but it is breaking with error.

awk '{ for (i = 1; i <= 20; i++)  { starts=2500*$i-1; ends=2500*$i; NR>=starts && NR<=ends {f=My$i".txt"; print >> f; close(f)}  } }' main_file

I am new to awk, but I wanted to split using awk. Following is the error ->

awk: syntax error at source line 1
 context is
    { for (i = 1; i <= 20; i++)  { starts=2500*$i-1; ends=2500*$i; NR>=starts && NR<=ends >>>  { <<< 
awk: illegal statement at source line 1
awk: illegal statement at source line 1

guruprasadpr · June 20, 2012, 10:24am

Hi

 awk '!(NR%2500){i++;}{print > "f"i;}' i=1 file

Guru

ctsgnb · June 20, 2012, 10:27am

Can't you just use the split command which has especially been designed for such kind of task ?

man split

Corona688 · June 20, 2012, 10:29am

Remember that the entire outer code block runs once per line. 'print' doesn't cause it to read another line, even if you didn't have syntax errors your code would just be printing the same line over and over.

Also, $ doesn't mean "variable", $ means "column". If you just want the value of the variable you don't need $.

How the above program works is

# If NR is a multiple of 2500, increment i.
!(NR%2500){i++;}
# Print into fi.
{print > "f"i;}

mukesh.lalwani · June 21, 2012, 6:14am

Thanks guys. I understood the concept. But this is ending up in error -
awk: syntax error at source line 1

awk '!(NR%2500){i++;}{print > "f"i;}' i=1 file

 context is
    !(NR%2500){i++;}{print > >>>  "f"i <<< ;}
awk: illegal statement at source line 1

Scrutinizer · June 21, 2012, 6:22am

Try:

awk '!(NR%2500){close(f); f="f" i++}{print>f}' i=1 file

--
(or use split as ctsgnb suggested)

mukesh.lalwani · June 21, 2012, 6:43am

Split is taking a lot of time, so I wanted to check with awk. Also the output of split is suffixed with aa, ab, ac ... which is not desired, so I need to rename all after splitting.
Dear Scrutinizer
your suggested code is also giving following error -

awk '!(NR%2500){f="f" i++}{print>f}' i=1 main_file

awk: null file name in print or getline
 input record number 1, file main_file
 source line number 1

Scrutinizer · June 21, 2012, 6:56am

OK, try:

awk '!((NR-1)%2500){close(f); f="f" i++}{print>f}' i=1 main_file

mukesh.lalwani · June 21, 2012, 7:18am

Thanks Scrutinizer
it worked

ctsgnb · June 21, 2012, 4:20pm

By the way, is your file an ascii text file or does it contain some binary or raw datas?

mukesh.lalwani · June 22, 2012, 2:11am

Its ascii text file.