retain last 1000 line in a file

nss280 · May 11, 2010, 3:40pm

I have large file with around 100k+ lines. I wanted to retain only the last 100 lines in that file. One way i thought was using

 
tail -1000 filename > filename1
mv filename1 filename

But there should be a better solution.. Is there a way I can use sed or any such command to change the same file so that I have just the last 1000 lines are retained? So that we dont need to copy the output to another file and rename it back?

Many Thanks

Reboot · May 11, 2010, 3:45pm

No, you can't use the command to change the same file.You will have to put the desired output,after applying proper filters, in a seprate file then rename it to the original one...:D:D

nss280 · May 11, 2010, 3:50pm

Thank you Reebot !

But any faster operation then tail that can help out here ???

methyl · May 11, 2010, 4:28pm

The approach will depend on the Operating System and the related tools, whether the file is currently open by a process (like syslogd), and whether it is a normal unix text file suitable for processing with shell tools.

Worth looking at the actual size of 1000 records because most versions of "tail" are limited in how much data they will buffer (you could for example ask for 1000 lines and only get 300).

If we assume that the file is a static normal unix text file and not open by another process, my first inclination would be to use "wc" to count the records then "sed" to output the required number of records to a temporary file.

Reboot · May 11, 2010, 4:38pm

Try Following :

sed -e :a -e '$q;N;1001,$D;ba'  file_name

Hope this works for you....

curleb · May 11, 2010, 4:39pm

I'm thinking your issue is more I/O related than an issue with tail; maybe the onus is on the write speed as opposed to your accessing the last 1000 lines of the file. You could even read it into memory and then spit it back out to overwrite your file, but be careful as it may bite you if it's too big.

Otherwise, tail is pretty fast since alternatives would need to parse logic surrounding your line population. Some are better than others, but tail is very simple in its approach: $((EOF - integer)):

-> wc -l /opt/worklibs/tmp/i*114054.dat
  434641 /opt/worklibs/tmp/i_PPGLBL_20081231114054.dat

-> time sed -n "400000,1000p" /opt/worklibs/tmp/i_PPGLBL_20081231114054.dat >/dev/null

real    0m7.91s
user    0m3.96s
sys     0m3.94s

-> time tail -1000 /opt/worklibs/tmp/i_*114054.dat >/dev/null                                

real    0m0.03s
user    0m0.02s
sys     0m0.01s

-> time awk 'NR>=400000 && NR<=401001 {print $0;}' /opt/worklibs/tmp/i*114054.dat >/dev/null 

real    0m23.77s
user    0m23.06s
sys     0m0.71s

-> time nawk 'NR>=400000 && NR<=401001 {print $0;}' /opt/worklibs/tmp/i*114054.dat >/dev/null

real    0m4.57s
user    0m4.06s
sys     0m0.50s

Reboot · May 11, 2010, 4:53pm

OR Else you can use two commands in a single stratch as :

sed -e :a -e '$q;N;1001,$D;ba'  orig_file >copy_file ;mv copy_file  orig_file

methyl · May 11, 2010, 4:54pm

If you go for "tail" first run a quick test to see if your tail command is suitable.

tail -1000 filename | wc -l

Hopefully the answer is 1000 .

nss280 · May 11, 2010, 4:55pm

Thank you Very much Curleb, for that detailed explanation and info. I should be going with tail. That will solve my purpose..
Also thank you Reebot and Methyl for your answers ..

Many Thanks & Regards.

alister · May 11, 2010, 10:29pm

You can edit a file in place with ed. To delete all but the last 1000 lines:

ed -s file <<'EOF'
1,-1000d
wq
EOF

or, more clumsily

printf '1,-1000d\nwq\n' | ed -s file

Regards,
Alister