I have large file with around 100k+ lines. I wanted to retain only the last 100 lines in that file. One way i thought was using
tail -1000 filename > filename1
mv filename1 filename
But there should be a better solution.. Is there a way I can use sed or any such command to change the same file so that I have just the last 1000 lines are retained? So that we dont need to copy the output to another file and rename it back?
Many Thanks
No, you can't use the command to change the same file.You will have to put the desired output,after applying proper filters, in a seprate file then rename it to the original one...:D:D
Thank you Reebot !
But any faster operation then tail that can help out here ???
The approach will depend on the Operating System and the related tools, whether the file is currently open by a process (like syslogd), and whether it is a normal unix text file suitable for processing with shell tools.
Worth looking at the actual size of 1000 records because most versions of "tail" are limited in how much data they will buffer (you could for example ask for 1000 lines and only get 300).
If we assume that the file is a static normal unix text file and not open by another process, my first inclination would be to use "wc" to count the records then "sed" to output the required number of records to a temporary file.
1 Like
Try Following :
sed -e :a -e '$q;N;1001,$D;ba' file_name
Hope this works for you....
I'm thinking your issue is more I/O related than an issue with tail; maybe the onus is on the write speed as opposed to your accessing the last 1000 lines of the file. You could even read it into memory and then spit it back out to overwrite your file, but be careful as it may bite you if it's too big.
Otherwise, tail is pretty fast since alternatives would need to parse logic surrounding your line population. Some are better than others, but tail is very simple in its approach: $((EOF - integer)):
-> wc -l /opt/worklibs/tmp/i*114054.dat
434641 /opt/worklibs/tmp/i_PPGLBL_20081231114054.dat
-> time sed -n "400000,1000p" /opt/worklibs/tmp/i_PPGLBL_20081231114054.dat >/dev/null
real 0m7.91s
user 0m3.96s
sys 0m3.94s
-> time tail -1000 /opt/worklibs/tmp/i_*114054.dat >/dev/null
real 0m0.03s
user 0m0.02s
sys 0m0.01s
-> time awk 'NR>=400000 && NR<=401001 {print $0;}' /opt/worklibs/tmp/i*114054.dat >/dev/null
real 0m23.77s
user 0m23.06s
sys 0m0.71s
-> time nawk 'NR>=400000 && NR<=401001 {print $0;}' /opt/worklibs/tmp/i*114054.dat >/dev/null
real 0m4.57s
user 0m4.06s
sys 0m0.50s
1 Like
OR Else you can use two commands in a single stratch as :
sed -e :a -e '$q;N;1001,$D;ba' orig_file >copy_file ;mv copy_file orig_file
1 Like
If you go for "tail" first run a quick test to see if your tail command is suitable.
tail -1000 filename | wc -l
Hopefully the answer is 1000 .
Thank you Very much Curleb, for that detailed explanation and info. I should be going with tail. That will solve my purpose..
Also thank you Reebot and Methyl for your answers ..
Many Thanks & Regards.
You can edit a file in place with ed. To delete all but the last 1000 lines:
ed -s file <<'EOF'
1,-1000d
wq
EOF
or, more clumsily
printf '1,-1000d\nwq\n' | ed -s file
Regards,
Alister