How to cut data from big file
my file around 30 gb
I tried "head -50022172 filename > newfile.txt ,and tail -5454283 newfile.txt. It's slowy.
afer that I tried sed -n '46467831,50022172p' filename > newfile.txt ,also slow
Please recommend me , faster command to cut some data from Big file
Thanks.
Well, a 30 GB file is a *HUGE* file and any shell command you run on it will take time for processing.
Maybe you want to split up the file and then work on the smaller components ? Run the command:
man split
to see what your options are.
tyler_durden
You can try awk:
# time awk 'NR >= 46467831 && NR <= 50022172' big_file > new_big_file
real 0m46.536s
user 0m43.761s
sys 0m1.487s
# wc -l < new_big_file
3554342
Thank you danmero.
So faster
My file has size 18 GB
time nawk 'NR >= 77930597 && NR <= 86671221' bigfile > newfile
real 2m41.942s
user 2m10.469s
sys 0m18.560s
but this command rather consume cpe 25%
Any method that works on a variable length record size requires scanning the data to find a record. This will always incur a cpu penalty. But if your records are of a fixed sized, then we will be able to calculate the offset to the beginning and end of the section of interest and use more efficient ways of copying the data that remove the cpu intensive bits of the operation..