Delete first 100 lines from a BIG File

unohu · June 17, 2012, 6:29am

Hi,

I need a unix command to delete first n (say 100) lines from a log file. I need to delete some lines from the file without using any temporary file. I found sed -i is an useful command for this but its not supported in my environment( AIX 6.1 ). File size is approx 100MB.

Thanks in advance.

drl · June 17, 2012, 7:53am

Hi.

Most versions of sed that support "-i" will use a temporary file:

`-i[SUFFIX]'
`--in-place[=SUFFIX]'
     This option specifies that files are to be edited in-place.  GNU
     `sed' does this by creating a temporary file and sending output to
     this file rather than to the standard output.(1).

     This option implies `-s'.

     When the end of the file is reached, the temporary file is renamed
     to the output file's original name. 

excerpt from info sed, q.v.

One way to really re-write in-place is to hold the data in memory. Here's one such solution with a short, no-frills perl script. The driving shell script will compare inodes using first sed and then perl. The sed solution shows that the file is different because of the rename, whereas the perl will keep the same inode:

#!/usr/bin/env bash

# @(#) s1	Demonstrate REAL re-write in place if enough memory, perl.

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
edges() { local _f _n _l;: ${1?"edges: need file"}; _f=$1;_l=$(wc -l $_f);
  head -${_n:=3} $_f ; pe "--- ( $_l: lines total )" ; tail -$_n $_f ; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C sed perl

FILE=data1
cp sacred $FILE

pl " Short perl code, p1:"
cat p1
pe "# --- end of perl code"

pl " Results sed:"
pl " Data file $FILE before (inode: $( stat -c " %i" $FILE )):"
head $FILE

sed -i '1,2d' $FILE

pl " Data file $FILE after  (inode: $( stat -c " %i" $FILE )):"
cat $FILE

# perl
cp sacred $FILE

pl " Results perl:"
pl " Data file $FILE before (inode: $( stat -c " %i" $FILE )):"
cat $FILE

./p1 1,2 $FILE

pl " Data file $FILE after  (inode: $( stat -c " %i" $FILE )):"
cat $FILE

exit 0

producing:

% ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian GNU/Linux 5.0.8 (lenny) 
bash GNU bash 3.2.39
sed GNU sed version 4.1.5
perl 5.10.0

-----
 Short perl code, p1:
#!/usr/bin/env perl

# @(#) p1	Demonstrate feature (minimal).

use strict;
use warnings;

my ( $debug, $f, $file, @all, $first, $last, $i );

$debug = 1;
$debug = 0;

my ($lines_to_delete) = shift || die " Must supply line numbers.\n";
( $first, $last ) = split( /,/, $lines_to_delete );
$first-- ; $last--;
print " delete lines $first,$last\n" if $debug;
$file = shift || die " Must supply file name.\n";

open( $f, "<", $file ) || die " Cannot open file $file for input.\n";

@all = <$f>;
close($f);

open( $f, ">", $file ) || die " Cannot open file $file for output.\n";
for ( $i = 0; $i <= $#all; $i++ ) {
  print " working on line $i\n" if $debug;
  next if ( $i >= $first and $i <= $last );
  print $f $all[$i];
}

exit(0);

# --- end of perl code

-----
 Results sed:

-----
 Data file data1 before (inode:  334057):
Now is the time
for all good men
to come to the aid
of their country.

-----
 Data file data1 after  (inode:  334060):
to come to the aid
of their country.

-----
 Results perl:

-----
 Data file data1 before (inode:  334060):
Now is the time
for all good men
to come to the aid
of their country.

-----
 Data file data1 after  (inode:  334060):
to come to the aid
of their country.

Both solutions delete lines 1,2 of the file. The sed result uses a new file created from the temporary. The perl solution uses the same file, but requires as much memory as will hold the entire file.

Best wishes ... cheers, drl

alister · June 17, 2012, 11:55am

ed or ex can delete the lines in place for you.

Regards,
Alister

drl · June 17, 2012, 2:23pm

Hi.

@alister

The GNU ed code:

ed GNU Ed 0.7

seems to use a scratch file. Using strace, one can see even for a 4-line file:

open("/tmp/tmpfXhei5v", O_RDWR|O_CREAT|O_EXCL, 0600) = 3
unlink("/tmp/tmpfXhei5v")               = 0

and it then goes on to use the file, using lseeks and not open/close to position the file.

Whether this counts as without using any temporary file in the OP's view is unknown ... cheers, drl

alister · June 17, 2012, 3:29pm

Excellent observation, drl.

Regards,
Alister

drl · June 17, 2012, 4:40pm

Hi.

Here is a generalization of the perl script I posted earlier. This does not do any sed work. What this expects is text coming in. It saves it in memory and then writes to the file of one's choice. Not very exciting, however, the key idea is that it allows any utility to write in-place. It was inspired by sponge, part of moreutils:moreutils

#!/usr/bin/env bash

# @(#) s1	Demonstrate REAL re-write in place if enough memory, perl.

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
edges() { local _f _n _l;: ${1?"edges: need file"}; _f=$1;_l=$(wc -l $_f);
  head -${_n:=3} $_f ; pe "--- ( $_l: lines total )" ; tail -$_n $_f ; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C sed perl absorb-memory

SACRED=${1-sacred}
FILE=data1
cat -n $SACRED > $FILE

pl " Short perl code, absorb-memory $( wc -l < ~/bin/absorb-memory) lines:"
cat ~/bin/absorb-memory | sanitize
pe "# --- end of perl code"

pl " Results sed:"
pl " Data file $FILE before (inode: $( stat -c " %i" $FILE )):"
specimen 3 $FILE

time sed -i '1,2d' $FILE

pl " Data file $FILE after  (inode: $( stat -c " %i" $FILE )):"
specimen 3 $FILE

# perl
cat -n $SACRED > $FILE

pl " Results sed (no -i) into absorb-memory (perl):"
pl " Data file $FILE before (inode: $( stat -c " %i" $FILE )):"
specimen 3 $FILE

time sed '1,2d' $FILE | absorb-memory $FILE

pl " Data file $FILE after  (inode: $( stat -c " %i" $FILE )):"
specimen 3 $FILE

rm -f $FILE
exit 0

producing:

./s2 /tmp/100-mb.txt 

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian GNU/Linux 5.0.8 (lenny) 
bash GNU bash 3.2.39
sed GNU sed version 4.1.5
perl 5.10.0
absorb-memory - ( local: RepRev 1.7, ~/bin/absorb-memory, 2012-06-17 )

-----
 Short perl code, absorb-memory 59 lines:
#!/usr/bin/perl

# @(#) absorb-memory	Read STDIN to memory, write to file at EOF, work-alike for sponge.
# $Id: absorb-memory,v 1.7 2012/06/17 19:01:49 drl Exp drl $

## Modification history: when / who / what: most recent at top.
#  Relocate to end if grows too long, or re-sequence.
#
# 2012.06.17 / drl / Version that uses memory only.
#
# 2011.07.15 / drl / Do initial test for write permission, abort
# if output file lacks it.
#
# 2010.11.09 / drl / Rename to absorb, avoiding conflict with
# sponge itself.
#
# 2009.04.06 / drl / original.

use warnings;
use strict;
use Carp;

my ($debug);
$debug = 1;
$debug = 0;

# Avoid hang for argument matching "-version","--version", etc.
exit(0) if @ARGV && $ARGV[0] =~ /-version/;

$/ = 0777;

my ( $file, $f, $memory );
if ( !$ARGV[0] ) { $ARGV[0] = "-"; }
$file = shift;

# Preliminary basic tests on output file.
if ( $file ne "-" ) {
  if ( not -f $file ) {
    croak("not a plain file, $file");
  }
}

$memory = do { local $/; <> };

my ($len) = length($memory);
print STDERR " Length of file in memory variable: $len\n" if $debug;

if ( $file eq "-" ) {
  open( $f, ">-" ) || die " Cannot open STDOUT for writing.\n";
}
else {
  open( $f, ">", $file ) || die " Cannot open file \"$file\" for write.\n";
}
print " debug write - file is :$file:\n" if $debug;

print $f "$memory";

END { close(STDOUT) || die "can't close stdout: $!" }
exit(0);
# --- end of perl code

-----
 Results sed:

-----
 Data file data1 before (inode:  334060):
Edges: 3:0:3 of 1777700 lines in file "data1"
     1	Preliminary Matter.  
     2	
     3	This text of Melville's Moby-Dick is based on the Hendricks House edition.
   ---
1777698	THEY GLIDED BY AS IF WITH PADLOCKS ON THEIR MOUTHS; THE SAVAGE SEA-HAWKS SAILE
1777699	D WITH SHEATHED BEAKS. +ON THE SECOND DAY, A SAIL DREW NEAR, NEARER, AND PIC
1777700	KED ME UP AT LAST. +IT WAS THE DEVIOUS-CRUISING +RACHEL, THAT IN HER RETRACIN

real	0m13.351s
user	0m1.384s
sys	0m11.189s

-----
 Data file data1 after  (inode:  334061):
Edges: 3:0:3 of 1777698 lines in file "data1"
     3	This text of Melville's Moby-Dick is based on the Hendricks House edition.
     4	It was prepared by Professor Eugene F. Irey at the University of Colorado.
     5	Any subsequent copies of this data must include this notice  
   ---
1777698	THEY GLIDED BY AS IF WITH PADLOCKS ON THEIR MOUTHS; THE SAVAGE SEA-HAWKS SAILE
1777699	D WITH SHEATHED BEAKS. +ON THE SECOND DAY, A SAIL DREW NEAR, NEARER, AND PIC
1777700	KED ME UP AT LAST. +IT WAS THE DEVIOUS-CRUISING +RACHEL, THAT IN HER RETRACIN

-----
 Results sed (no -i) into absorb-memory (perl):

-----
 Data file data1 before (inode:  334061):
Edges: 3:0:3 of 1777700 lines in file "data1"
     1	Preliminary Matter.  
     2	
     3	This text of Melville's Moby-Dick is based on the Hendricks House edition.
   ---
1777698	THEY GLIDED BY AS IF WITH PADLOCKS ON THEIR MOUTHS; THE SAVAGE SEA-HAWKS SAILE
1777699	D WITH SHEATHED BEAKS. +ON THE SECOND DAY, A SAIL DREW NEAR, NEARER, AND PIC
1777700	KED ME UP AT LAST. +IT WAS THE DEVIOUS-CRUISING +RACHEL, THAT IN HER RETRACIN

real	0m2.890s
user	0m1.212s
sys	0m1.516s

-----
 Data file data1 after  (inode:  334061):
Edges: 3:0:3 of 1777698 lines in file "data1"
     3	This text of Melville's Moby-Dick is based on the Hendricks House edition.
     4	It was prepared by Professor Eugene F. Irey at the University of Colorado.
     5	Any subsequent copies of this data must include this notice  
   ---
1777698	THEY GLIDED BY AS IF WITH PADLOCKS ON THEIR MOUTHS; THE SAVAGE SEA-HAWKS SAILE
1777699	D WITH SHEATHED BEAKS. +ON THE SECOND DAY, A SAIL DREW NEAR, NEARER, AND PIC
1777700	KED ME UP AT LAST. +IT WAS THE DEVIOUS-CRUISING +RACHEL, THAT IN HER RETRACIN

This is a busy output, but there are some items of interest. First, one can use essentially the same sed command (omitting the "-i") as for a sed that knows about "-i". Second, the times are noticeably better for the in-memory version. Third, note that the inodes are different in the case of sed, proving that a temporary file was used, and was renamed as the main input file. For the absorb-memory case, the inode is that same.

The same perl code can make any standard utility(STDIN, STDOUT compatible) into a utility that can do in-place processing.

Best wishes ... cheers, drl

Scrutinizer · June 17, 2012, 6:02pm

It seems to me any solution should always make use a temporary intermediate file for safety reasons. If we read the whole file into memory and then write it back to the same file, we run the risk of losing the original in case of power failure during the write-back phase..

With a temporary file there is only mv involved, which is only a rename if the temporary file is in the same dir on the same file system, so a temporary file in the same directory instead of /tmp for example may be preferably. If we use /tmp for example for the intermediate file, a temporary rename to .bak of the original until the move from /tmp may be required and safest will probably be to keep the .bak until the user deletes it..

drl · June 17, 2012, 6:22pm

Hi, Scrutinizer.

When was the last time you know that that happened to anyone? Most important *nix boxes will be backed up with a UPS. I think it is far more likely to fall prey to plain old user error.

However, I sometimes allow myself the luxury of following:

-- Unix philosophy - Wikipedia, the free encyclopedia

Best wishes ... cheers, drl

Scrutinizer · June 17, 2012, 6:52pm

Well that is probably because nobody does it that way

Backed up with a UPS hey? I have seen systems go down or heard of systems going down because...

Two power chords of two were plugged into the same phase of the ups...
Two power chords of three were plugged into the same phase of the ups and when that group failed the remaining power supply could not handle it
The second power supply had been broken and so was the monitoring system, so when the other one failed..
Someone took out both power chords.
Someone knocked out both power chords
Hard disks failed and not all logical volumes were mirrored
Controllers failed and the other channel was on the same controller
Memory died..
A power failure during maintenance of the UPS
The UPS ran out and the diesel generator ran out of diesel
The UPS ran out and the power relay that was supposed to switch the generator on did not function because the battery of the tiny UPS that served only that relay was way overdue and had gone dead..
Un uneven current draw on one phase and down went the UPS and it took everything with it..
A high availabiliy cluster failover for whatever reason..
Systems with officially no SPOFs had SPOFs
etcetera
and so forth

Indirectly, most of this is human error of course, for example because manuals or procedures were ignored, but the reality is... that that is reality, since most teams do not consist entirely of the cream of the crop and Murphy is having a ball in data centers..

methyl · June 18, 2012, 11:14am

@unohu
Please explain why do not want to use a workfile and the context in which this logfile is created?

If the log file is open by a process, there is no method to shorten the file without using a workfile (and even that method risks losing new log entries). However it is possible to retain the same inode (is that your issue?).

Check whether the file is open with the fuser command.

unohu · June 18, 2012, 12:12pm

If I create new file with same name it remains as blank file.

methyl · June 19, 2012, 8:18am

@unoho
Sounds like your logfile is open by an application.
The usual technique when a logfile is open (like with some of the system logs) is to copy the file, then null it. This retains any file permissions and usually does not upset the application but be aware that there is a small time window where you might lose a message.

# Copy the logfile
cp -p /path/logfile /path/newname
# Null the logfile
> /path/logfile

This technique does not work for data files.
It is however much better to do logfile maintenance while the application is turned off, but in the case of some system processes this may not be possible.

If your system has the logrotate command, check it out.

drl · June 19, 2012, 10:02am

Hi, methyl.

I wonder if unoho meant that he did something like:

sed 'mumble' infile > infile

which would zero infile before executing sed. Hard to guess, though, if he's not forthcoming and specific about what he did and what error message or condition he got ... cheers, drl

unohu · June 20, 2012, 12:08pm

Hi methyl
Yes, log file is opened by another application. If I change the inode of log file while taking backup then application can not detect new inode. As a result I get blank file. The easiest solution may be execute a script(same as you suggested) when application is down.

logrotate

command is not available.

@drl I tried 2-3 possibilities which more/less same as given below.
My logfile name is test.log which has 100lines(say) and backup file is back.txt . Here the problem is test.log remains blank even when application is active and should have some log statement.

cp test.log back.log
sed '1,100d' test.log>temp.log 
mv temp.log test.log

drl · June 20, 2012, 12:14pm

Hi.

Given that information, I would try:

cp test.log back.log
sed '1,100d' test.log>temp.log 
# mv temp.log test.log
cp temp.log test.log

(comment, "#", only for illustration, not needed in the sequence)
that should preserve the inode for test.log ... cheers, drl

Corona688 · June 20, 2012, 12:22pm

Truncating an app's logfile only works if the application supports it. If it doesn't, you're stuck with restarting the application.

jim_mcnamara · June 20, 2012, 12:23pm

Does cp explicitly guarantee the same inode? Looking at

cp

-- explains how open will be called on the target file, which would imply the same inode.

methyl · June 20, 2012, 7:27pm

IMHO @drl's solution in post #15 nearly fulfils the requirement.

Reading between the lines, we are again presented with a faulty solution without a definition of the actual problem.

Taking into account Corona688's comment about inodes, this code would be safer on most unix/Linux Operating Systems:

cp test.log back.log
sed '1,100d' test.log>temp.log 
cat temp.log > test.log

I have used a similar code construct (albeit with grep -v )to remove persistent irrelevant entries from a system log where it was causing unwanted alerts (pending a fix).

unohu · June 21, 2012, 12:22pm

Thanks for your reply. inode problem is resolved.