Delete specific parts in a .txt file

Hi all,

I desperately need a small script which deletes everything in a particular .txt file when "Abs = {" appears till "},", and also when "B-1 = {" appears till "},"

I would like all the text in between of those instances to be deleted, however, other text to be unedited (kept as it is).

Could anyone help? :confused:

Thank you!

I think it will be better if you can post sample data and desired output in code tags.

Hi,
I just joined today to ask a question in another forum (AIX), so I thought I'd give something back to unix.com to help out.

Here's a started, with the limited detail's you provided.

text.txt:

line1
line2
Abs = {start marker}
line3
line4
B-1 = {end marker}
line5
line6

read.pl

#!/usr/bin/perl -w

use strict;
use warnings;
use diagnostics;

print_inside(); 
print "\n";
print_outside(); 
 
sub print_inside {
     
    my $filename = 'text.txt';
    open(my $fh, '<', $filename) or die "Could not open file '$filename' $!";
    my $output = 0;
    
    while (<$fh>) {
        chomp;
        # print "$_ \n";
        
        if ( /Abs.*=.*{(.*)}/i ) {
            print "===S===> " . $1 . "\n";
            $output = 1;
        } elsif ( /B-1.*=.*{(.*)}/i ) {
            print "===E===> " . $1 . "\n";
            $output = 0;
        } elsif ($output) {
            print $_ . "\n"   
        }
        
    }
}

sub print_outside {

    my $filename = 'text.txt';
    open(my $fh, '<', $filename) or die "Could not open file '$filename' $!";
    my $output = 1;
     
    while (<$fh>) {
        chomp;
        # print "$_ \n";
        
        if ( /Abs.*=.*{(.*)}/i ) {
            $output = 0;
        } elsif ( /B-1.*=.*{(.*)}/i ) {
            $output = 1;
        } elsif ($output) {
            print $_ . "\n"   
        }
        
    }
}

output:

 $ ./read.pl
===S===> start marker
line3
line4
===E===> end marker

line1
line2
line5
line6

Data:

@article{Test1:2011aa,
    Abs = {Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.},
    Author = {Lorem Ipsum and Ipsum Lorem},
    Date-Added = {2013-02-27 2:55:51 +0200},
    Title = {Geometric analogue of holographic reduced representation},
    Year = {2009},
    B-1 = {http://www.google.com}}

@book{Test2:2012bb,
    Abs = {Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.},
Author = {Lorem Ipsum and Ipsum Lorem},
    Date-Added = {2012-02-27 3:44:41 +0400},
    Year = {2010},
    B-1 = {http://www.google.com}}

Desired output:

@article{Test1:2011aa,
    Author = {Lorem Ipsum and Ipsum Lorem},
    Date-Added = {2013-02-27 2:55:51 +0200},
    Title = {Geometric analogue of holographic reduced representation},
    Year = {2009}}

@book{Test2:2012bb,
    Author = {Lorem Ipsum and Ipsum Lorem},
    Date-Added = {2012-02-27 3:44:41 +0400},
    Year = {2010}}
awk '!/^[ \t]+Abs/ && !/^[ \t]+B-1/' filename

Thanks! But one more thing... after deleting:

B-1 = {http://www.google.com}}

I would like the last brackets to be taken to the previous row, and the comma in the previous row to be deleted... so that the last line it ends up like this.

Year = {2010}}

Notice that, the last line can be "Year" or anything else, but always the B-1 comes to the end, and I would like the above thing to happen (the last brackets to be taken to the previous row, and the comma in the previous row to be deleted).

Thanks!

awk ' $0 ~ /^[ \t]+Abs.*/  {
                next
} $0 !~ /^[ \t]+B-1.*/ {
                p = (p=="")?RS $0:p RS $0
} $0 ~ /^[ \t]+B-1.*/ {
                sub(/,$/,"}",p)
                print p
                p = ""
} ' file

give this a shot:

#!/usr/bin/perl -w

use strict;
use warnings;
use diagnostics;

strip_file(); 
 
sub strip_file {
     
    my $filename = 'text.txt';
    open(my $fh, '<', $filename) or die "Could not open file '$filename' $!";    
        
    my $last = <$fh>; # read ahead 1 line
    while (<$fh>) {
        chomp;
        my $cur = $_;
               
        if ( $last =~ /\@/ ) {             
            print $last;
        } elsif ( $last =~ /(.*).*=.*{(.*)}/i ) {
            my $key = $1;
            my $val = $2;
            
            if ( $cur =~ /B-1.*=.*{(.*)}?(.*)}/i ) {
                print "$key = {$val}}" . "\n\n";
                $cur = '';
            } elsif ( $cur ne '') {                
                print $last . "\n"                
            }            
        }
        $last = $cur;
    }
}

I save the script in Desktop as script.awk and then try to execute it, but I get an error like:

awk -f script.awk 
awk: syntax error at source line 1 source file script.awk
 context is
	awk &gt;&gt;&gt;  ' &lt;&lt;&lt; 
awk: bailing out at source line 1

This is what I have inside the script.awk

awk '!/^[ \t]+Abs/ && !/^[ \t]+B-1/' random.txt

Help?

If you are using SunOS or Solaris, then use nawk instead of awk

nawk ' $0 ~ /^[ \t]+Abs.*/  {
                next
} $0 !~ /^[ \t]+B-1.*/ {
                p = (p=="")?RS $0:p RS $0
} $0 ~ /^[ \t]+B-1.*/ {
                sub(/,$/,"}",p)
                print p
                p = ""
} ' filename

OR

nawk '!/^[ \t]+Abs/ && !/^[ \t]+B-1/' filename

I use Mac OS X, and if type nawk then -> -bash: nawk: command not found

if I type awk, then:

usage: awk [-F fs] [-v var=value] [-f progfile | 'prog'] [file ...]

Oh my bad. I didn't notice you are running program in a file!

Edit your file: script.awk put only code fragment:

!/^[ \t]+Abs/ && !/^[ \t]+B-1/

To run your program:

awk -f script.awk your_input_file

Follow the same for other code fragment in script.awk :

$0 ~ /^[ \t]+Abs.*/  {
                next
} $0 !~ /^[ \t]+B-1.*/ {
                p = (p=="")?RS $0:p RS $0
} $0 ~ /^[ \t]+B-1.*/ {
                sub(/,$/,"}",p)
                print p
                p = ""
}

To run your program:

awk -f script.awk your_input_file

Thanks a lot, you're a life saver! I may get to this post again, I hope you will find time to help me. Cheers!