edit sed command

SkySmart · May 29, 2012, 7:45am

how can i make this sed command run faster?

sed '51000000,51347442!d' file

and

sed '51347442,$ !d' file

File is a 9GB in size.

it runs on sunos 5.10 and linux red hat 6 servers and i use bash.

Scrutinizer · May 29, 2012, 8:02am

Perhaps:

tail +51000000 < file | sed 347443q

Or

{ head -51000000 >/dev/null; sed 347443q; } < file

Additionally, you could try using ksh instead of bash, that might make a difference too..

--
Or maybe this even?

sed -n '51000000,51347442p;51347442q' file

or

sed '1,50999999d;51347442q' file

Franklin52 · May 29, 2012, 8:10am

Or you could try:

awk '/51000000/{p=1};p;/51347442/{exit}' file

SkySmart · May 29, 2012, 10:33am

thank you guys. i tried all the commands provided but all of them still stalls. i dont get back a command line until i CONTROL-C out of it.

basically, if i cant get the last 500,000 lines of file, i would like any amount that's the most feasible and closest to that number.

200,000, 100,000, i'll be happy with either of these.

Corona688 · May 29, 2012, 11:54am

Of course it "stalls", it has to read 9 whole gigs of data before it can do anything else and tries to keep the entire 500,000 lines of data in memory at once. Processing 9 gigs of data is going to take a while no matter how you cut it, and it won't find the data it wants until it's nearly done. To get the last 500,000 lines, it has to figure out where the end is, then back up from there...

Some implementations of 'tac' however, work by seeking, which would avoid having to read the first 99% of the file. It'd deliver them in reverse order so you could just head to get the number of lines you want, then reverse it again. I'm not sure Sun's does, but it's worth a shot:

tac filename | head -n 500000 > /tmp/$$
tac /tmp/$$ > /tmp/$$-forward
rm -f /tmp/$$

If Sun's tac doesn't do that, maybe you can install GNU tac.

binlib · May 29, 2012, 2:01pm

If your file is of fixed length (meaning that each line has the same length), then you can seek to the position of the target line and read the desired number lines.