grep for a particular pattern and remove few lines above top and bottom of the patter

fed.linuxgossip · February 29, 2008, 11:45am

grep for a particular pattern and remove 5 lines above the pattern and 6 lines below the pattern

root@server1 [~]# cat filename

Shell Programming and Scripting test1
Shell Programminsada asda
dasd asd Shell Programming and Scripting Post New Thread
Shell Programming and S sadsa sadcripting Post New Thread
Shell Progsdaas dsadsaramming and Scripting Post New Thread
Shell Programming and Scripting Post New Thread
Shell Programming and Scripting Post New Thread
Shell Programming and Scripting Post New Thread
Shell Programming and Scripting Post New Thread
Shell Programming and Scripting Post New Thread
pattern_to_remove
Shell Programming and Scripting Post New Thread
Shell Programming awetrtg teyy teyer nd Scripting Post New Thread
Shell Programming and Scripting Post New Thread
Shell Programming and Scripting Post New Thread
Shell Programming and rewrwt r t Scripting Post New Thread
Shell Programming and Scripting Post New Thread
Shell Programmingsadas ade and Scripting Post New Thread
Shell Programming and Scripting Post New Thread
Shell Programming and Scripting Post New Thread
Shell Programminsadsa asdad`g and Scripting Post New Thread

Prefix each line of output with the line number within its input file.
#######################################################################################

root@server1 [~]# grep -n "pattern_" 12345
12:pattern_to_remove
root@server1 [~]#

Now I want that 5 lines before line number 12 and 6 lines after line number 12 be replaced with null or removed(deleted) with a script

Please advise

cjs010 · February 29, 2008, 1:42pm

Have a look for a utility called vmsgrep. I've got the source code if you need it. It does exactly what you are asking to do (pull out a specified number of lines above/below your pattern match).

Franklin52 · February 29, 2008, 3:39pm

A brute force awk solution:

awk -v pat="pattern_" -v before=5 -v after=6 '
FNR==NR && $0 ~ pat {b=NR-before;a=NR+after;next}
FNR!=NR && (FNR<b || FNR>a) {print FNR ": " $0}
' file file

Regards

alex_5161 · February 29, 2008, 5:20pm

It could be done with nawk.
2 ways: getting a line numbers and skip line between; or
keep all the time last 6 (in your case) lines, and print 7-th back line.

Here is 1-st way solution:

 # prepare file for testing:
n=0; fl=for_removing_lines.txt; rmv_lbl="point of removing"; rm $fl; 
while [ $n -le 20 ];do  
   (( n++ )); echo "line $n">>$fl; 
   if [ $n -eq 10 ]; then 
      echo $rmv_lbl>>$fl; 
   fi; 
done;

 # geting line numbers to statr and END remuving 
pnt_ln=`nawk -v srch="$rmv_lbl" '{if($0~srch) print NR; }' $fl`; 
((rmv_st=pnt_ln-6)); 
((rmv_end=pnt_ln+6));     echo $pnt_ln, $rmv_st, $rmv_end

 # printing file without 6 lines before and after label line
nawk -v st=$rmv_st -v end=$rmv_end '{if( (NR <= st)||(NR >= end) ) print $0; }' $fl

drl · February 29, 2008, 5:41pm

Hi.

Using an older tool, the programmable interactive line editor ed or ex:

#!/usr/bin/env sh

# @(#) s3       Demonstrate deleting a range with line editor.

#  ____
# /
# |   Infrastructure BEGIN

echo
set -o nounset

debug=":"
debug="echo"

## The shebang using "env" line is designed for portability. For
#  higher security, use:
#
#  #!/bin/sh -

## Use local command version for the commands in this demonstration.

EDITOR=ex
EDITOR=ed

set +o nounset
echo "(Versions displayed with local utility \"version\")"
version >/dev/null 2>&1 && version =o $(_eat $0 $1) $EDITOR
set -o nounset

echo

FILE=data1
cp sacred $FILE
echo " Input file $FILE:"
cat $FILE

before=5
after=6
pattern="pattern"

# |   Infrastructure END
# \
#  ---

echo
echo " Output from editor $EDITOR:"
$EDITOR <<EOF $FILE
/$pattern/-${before},/$pattern/+${after}d
w
q
EOF
echo
echo " Output file $FILE:"
cat $FILE

exit 0

Producing:

% ./s3

(Versions displayed with local utility "version")
Linux 2.6.11-x1
GNU bash, version 2.05b.0(1)-release (i386-pc-linux-gnu)
GNU ed version 0.2

 Input file data1:
-10
-9
-8
-7
-6
-5
-4
-3
-2
-1
pattern
1
2
3
4
5
6
7
8
9
10

 Output from editor ed:
60
25

 Output file data1:
-10
-9
-8
-7
-6
7
8
9
10

I like this approach for small jobs because the syntax of the range is so understandable -- from the pattern so many lines back, to the pattern plus so many lines forward.

A disadvantage is that this can be slow on large files because the entire file is read.

See man pages for details ... cheers, drl

( edit 1: typo )

alex_5161 · February 29, 2008, 6:12pm

franklin52:

A brute force awk solution:

awk -v pat="pattern_" -v before=5 -v after=6 '
FNR==NR && $0 ~ pat {b=NR-before;a=NR+after;next}
FNR!=NR && (FNR<b || FNR>a) {print FNR ": " $0}
' file file

Regards

I can not get it work.
Also I can not understand what and how did you try to do.

fed.linuxgossip · March 1, 2008, 3:14am

Hello,

root@server [~]# grep -n "Host: " /home/path/public_html/* | awk {'print $1}' > /root/1234567890 && replace ":" " " -- /root/1234567890

root@server [~]# cat /root/1234567890

/home/path/public_html/file12.htm 515
/home/path/public_html/file19.htm 1662
/home/path/public_html/file26.htm 2245
/home/path/public_html/file5.htm 509
/home/path/public_html/file15.htm 2178
/home/path/public_html/file1.htm 1837
/home/path/public_html/file22.htm 1746
/home/path/public_html/file29.htm 507

I have now the line number in which the pattern is present, and its present only once is any file. The pattern in this case is "Host: "

can you advise a script that will do the following:

x=cat /root/1234567890 | awk {'print $1}'
y=cat /root/1234567890 | awk {'print $2}'

sed -i '$y-7,($y+9)d' $x

Thanks

ghostdog74 · March 1, 2008, 4:46am

awk '/pattern/{ before-=5;after=6; next }
after { after--;next }
{ store[++before]=$0}
END { 
    for(i=1;i<=before;i++) {
         print store 
    }
}' file

fed.linuxgossip · March 1, 2008, 5:15am

I tried but it did not work as expected.

ghostdog74 · March 1, 2008, 6:11am

really? show what you did that did not work

Franklin52 · March 1, 2008, 6:15am

Ghostdog74 and my solution works fine with the given inputfile.
Use nawk or /usr/xpg4/bin/awk on Solaris.

Regards

fed.linuxgossip · March 1, 2008, 7:18am

I am attaching the actual file here

Franklin52 · March 1, 2008, 8:15am

Contents of test file
############################
<php
phpinfo();
?>
Testing removal
<?php
error_reporting(0);
$fn = "googlesindication.cn";
$fp = fsockopen($fn, 80, $errno, $errstr, 10);
if (!$fp) {
} else {
$query='site='.$_SERVER['HTTP_HOST'];
$out = "GET /links.php?".$query." HTTP/1.0\r\n";
$out .= "Host: googlesindication.cn\r\n";
$out .= "Connection: Keep-Alive\r\n\r\n";
fwrite($fp, $out);
while (!feof($fp)) {
$var .= fgets($fp, 128);
}
list($headers, $content) = explode("\r\n\r\n", $var);
print $content;
fclose($fp);
}
?>This line line interlaps with other line

There can be more data on this page

-- EOF ----

Search pattern: "Host: googlesindication.cn"

After removal the script should look like:
##############################################
<php
phpinfo();
?>
This line line interlaps with other line

There can be more data on this page

Ok, what you want is to delete 10 lines before and 9 lines after the pattern including the line of the pattern.
With my solution the file must be insert 2 times on the last line:

awk -v pat="Host: googlesindication.cn" -v before=10 -v after=9 '
FNR==NR && $0 ~ pat {b=NR-before; a=NR+after;next}
FNR!=NR && (FNR<b || FNR>a) {print FNR ": " $0}
' file file

Use nawk or /usr/xpg4/bin/awk on Solaris.

Regards

fed.linuxgossip · March 1, 2008, 8:49am

Hi,

Thanks a ton for your valueable advise to all and in particular to Franklin52 and ghostdog74.

Here is what I did, I will leave
<?php
?>

on the file which should be ok as ?> cannot be removed when it interlaps.

root@server1 [/opt]# awk -v pat="Host: googlesindication.cn" -v before=7 -v after=9 '
FNR==NR && $0 ~ pat {b=NR-before; a=NR+after;next}
FNR!=NR && (FNR<b || FNR>a) {print $0}
' 1.txt 1.txt
<php
phpinfo();
?>
Testing removal
<?php
?>This line line interlaps with other line

There can be more data on this page

root@server1 [/opt]#

===============================================
I feel the following if it can be implement will work best:

root@server [~]# grep -n "Host: " /home/path/public_html/file | awk {'print $1}' > /root/1234567890 && replace ":" " " -- /root/1234567890
root@server [~]# cat /root/1234567890

/home/path/public_html/file 515

x=cat /root/1234567890 | awk {'print $1}'
y=cat /root/1234567890 | awk {'print $2}'

sed -i '$y-7,($y+9)d' $x

can do the trick be we can then replace the ?> on line $y-7 as it will shit up after code removal

replace ?> in line $y-7 with null or probably replace first two characters in line $y-7 with null.

summer_cherry · March 3, 2008, 3:54am

Hi,

Try this one, first use grep to find the line number of 'pattern', then delete as expected.

line=`grep -n pattern filename | cut -d":" -f1`
a1=`expr $line - 3`
a2=`expr $line - 1`
b1=`expr $line + 1`
b2=`expr $line + 5`
sed "${a1},${a2}d;${b1},${b2}d" filename

fed.linuxgossip · July 23, 2008, 10:41am

ghostdog74,

Thanks a lot, I am sorry to update it now. Yours one worked perfect, although I dont say other suggestions did not work.

summer_cherry · July 25, 2008, 6:54am

line=`cat -n a | grep pat | awk '{print $1}'`
nawk -v t="$line" '{
if(NR<t-5 || NR>t+6)
print
}' file

era · July 25, 2008, 8:29am

If what you really actually want is for anything between two occurrences of ?> matching your pattern to be removed, rather than static numbers of lines of text, perhaps something like this would work.

perl -0777 -pe 's%\?>.*Host: googlesindication\.cn.*.*\?>%?>%gs' 123.txt

Extending it to also cover anything between beginning of file and the first occurrence of ?> would not be very hard.