How to cut some data from big file

almanto · August 14, 2009, 8:14pm

How to cut data from big file

my file around 30 gb

I tried "head -50022172 filename > newfile.txt ,and tail -5454283 newfile.txt. It's slowy.

afer that I tried sed -n '46467831,50022172p' filename > newfile.txt ,also slow

Please recommend me , faster command to cut some data from Big file

Thanks.

durden_tyler · August 14, 2009, 11:07pm

Well, a 30 GB file is a *HUGE* file and any shell command you run on it will take time for processing.

Maybe you want to split up the file and then work on the smaller components ? Run the command:

man split

to see what your options are.

tyler_durden

danmero · August 15, 2009, 12:44am

You can try awk:

# time awk 'NR >= 46467831 && NR <= 50022172' big_file > new_big_file

real    0m46.536s
user    0m43.761s
sys     0m1.487s

# wc -l < new_big_file
 3554342

almanto · August 15, 2009, 6:10am

Thank you danmero.

So faster
My file has size 18 GB

time nawk 'NR >= 77930597 && NR <= 86671221' bigfile > newfile
real    2m41.942s
user    2m10.469s
sys     0m18.560s

but this command rather consume cpe 25%

jp2542a · August 15, 2009, 7:01am

Any method that works on a variable length record size requires scanning the data to find a record. This will always incur a cpu penalty. But if your records are of a fixed sized, then we will be able to calculate the offset to the beginning and end of the section of interest and use more efficient ways of copying the data that remove the cpu intensive bits of the operation..