Split a file into 10 different files

OS : RHEL 6.7
Shell : bash

I have a text file with 5.97 million lines.

I want to split this big file into 12 different files (in sequential order) so that each file will contain roughly 500K lines. I tried the following awk command after googling. But, it just created 2 files (huge_data.txt11 and huge_data.txt12) from the source file.

Any idea how I can split the file into 12 different files?

$ wc -l huge_data.txt
5970387 huge_data.txt

$ awk -vLN=500000 '{print > ("huge_data.txt" 12-(NR>LN))}' huge_data.txt
$
$ ls -lh
total 6.5G
-rw-rw-r-- 1 appusr appusr 3.3G Dec 16 17:04 huge_data.txt
-rw-rw-r-- 1 appusr appusr 3.0G Dec 19 11:45 huge_data.txt11
-rw-rw-r-- 1 appusr appusr 276M Dec 19 11:45 huge_data.txt12
$
$
$ wc -l huge_data.txt11
5470387 huge_data.txt11
$
$ wc -l huge_data.txt12
500000 huge_data.txt12
$


Hi,

I would advise that you look at the man pages for your system, you could try man split it's nearly always there.

To put it all back together look at man cat for starters.

Regards

Gull04

I agree with gull04 that split is a better way to do this (without reinventing the wheel). If you must do it with awk , you might want to try something more like:

awk -v LN=500000 '
!((NR - 1) % LN) {
	if(NR > 1) close(f)
	f = sprintf("huge_data%03d.txt", 1 + int((NR - 1) / LN))
}
{	print > f
}' huge_data.txt

With the filenames generated by this script, you can split a file into up to 1000 files and easily process them in sequential order.

If someone wants to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk .

Hi,

Just a quick update, now that I have a bit of time.

-bash-3.2$ ls -l test*
-rw-r-----   1 e415243  other    4037689 Dec 19 14:35 test_file_01.txt
-bash-3.2$ wc -l test*
  139186 test_file_01.txt
-bash-3.2$ split -l 14000 test_file_01.txt
-bash-3.2$ ls x*
xaa  xab  xac  xad  xae  xaf  xag  xah  xai  xaj
-bash-3.2$ cat xa* >> test_file_02.txt
-bash-3.2$ diff test_file_01.txt test_file_02.txt
-bash-3.2$

As you'll be able to see the file was split by lines and joined up using cat, adjust the values to suit.

Regards

Gull04

1 Like

Thank You Don, gull
For some reason, Clicking on 'Thanks' button is not getting reflected except for the last post by gull.
I am using google chrome, later, I will try from Firefox

I appreciate your thanks even if the Thanks button isn't working.

It seems that some code that is used to apply Thanks is using some old code that has been deprecated and is starting to fail with newer revisions of some browsers. When the Thanks button disappears in a post, but your user name doesn't appear in the list of users that have said Thank You, sometimes you can copy the URL that was generated when you hit the Thanks button into a new tab in your browser and send it off and get it to apply your Thanks to that post.

The code for this site is being upgraded from PHP5.3.x to PHP7 (a long, tedious process). When that has been completed, everything should be working again (and running on a new faster server); but we don't have a completion date for that project yet.