How to delete all lines before a particular pattern when the pattern is defined in a variable?

Soham · September 25, 2017, 2:29pm

I have a file

Line 1 a
Line 22
Line 33
Line 1 b
Line 22
Line 1 c
Line 4
Line 5

I want to delete all lines before last occurrence of a line which contains something which is defined in a variable. Say a variable var contains 'Line 1', then I need the following in the output.

Line 4
Line 5

jgt · September 25, 2017, 3:03pm

cat /dev/null >outfile
var="line 1"
while read line
do
   echo "$line" >>outfile
   if [ "$var" = "$line" ]
       then
       cat /dev/null >outfile
   fi
done <inputfile

Soham · September 25, 2017, 3:11pm

Thx.. But this will not work. Lines are not just 'Line 1'. They contains some more text a;so. And I need to delete all lines before the last occurrence.

I could not construct a simple sed or awk script.

disedorgue · September 25, 2017, 3:27pm

Hi,
with your example file:

$ cat /tmp/bob2 
Line 1 a
Line 22
Line 33
Line 1 b
Line 22
Line 1 c
Line 4
Line 5
$ XX="Line 1"
$ awk -vRS="$XX(\n| [^\n]+\n)" -vORS="" 'END{print}' /tmp/bob2
Line 4
Line 5
$ XX="Line 2"
$ awk -vRS="$XX(\n| [^\n]+\n)" -vORS="" 'END{print}' /tmp/bob2
Line 1 a
Line 22
Line 33
Line 1 b
Line 22
Line 1 c
Line 4
Line 5
$ XX="Line 22"
$ awk -vRS="$XX(\n| [^\n]+\n)" -vORS="" 'END{print}' /tmp/bob2
Line 1 c
Line 4
Line 5

Regards.

Soham · September 25, 2017, 4:00pm

I tried this but I am not getting any output !! FYI, I am using ksh. I tried sh also but the same result.

Corona688 · September 25, 2017, 4:12pm

There's far better ways to put variables in awk than that, and cramming it into RS is liable to produce gigantic records that will be truncated.

PAT="Line 1"
awk 'NR==FNR { if(match($0, PAT)) P=NR ; next } FNR > P' PAT="Line 1" inputfile inputfile

Note that the input file is given twice, once to find the last pattern, the second time to print everything after it.

If this doesn't work for you, please show exactly how you used it, word for word, letter for letter, keystroke for keystroke.

Soham · September 25, 2017, 4:22pm

Now some compilation error

ksh: cat e
Line 1
Line 22
Line 33
Line 1
Line 22
Line 1
Line 4
Line 5
ksh:
ksh: PAT="Line 1"
ksh: awk 'NR==FNR { if(match($0, PAT)) P=NR ; next } FNR > P' PAT="Line 1" e e
awk: syntax error near line 1
awk: illegal statement near line 1
ksh:

disedorgue · September 25, 2017, 4:40pm

Ok,
What 's your operating system ?

Soham · September 25, 2017, 5:06pm

SUN OS

disedorgue · September 25, 2017, 5:26pm

Could you try awk 's solutions with /usr/xpg4/bin/awk ?

drl · September 25, 2017, 6:02pm

Hi.

Here we start saving after a pattern match, so over-writing everything before the last match, then print.

#!/usr/bin/env bash

# @(#) s1       Demonstrate delete all previous line before last matching line, awk, gawk.

PATTERN=${1-"Line 1"}

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
LC_ALL=C ; LANG=C ; export LC_ALL LANG
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
em() { pe "$*" >&2 ; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C awk

pl " Pattern: \"$PATTERN\""

FILE=data1
N=${FILE//[A-Za-z]/}
E=expected-output$N

pl " Input data file $FILE:"
cat $FILE

pl " Expected output:"
cat $E

pl " Results:"
awk -vPATTERN="$PATTERN" '
BEGIN   { i = 0 }
$0 ~ PATTERN    { i =  0; next }
                { i++ ; a = $0 } 
END     { size = length(a) ; for (i=1;i<=size;i++) { print a } }
' $FILE |
tee f1

pl " Verify results if possible:"
C=$HOME/bin/pass-fail
[ -f $C ] && $C f1 $E || ( pe; pe " Results cannot be verified." ) >&2

exit 0

producing:

$ ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 3.16.0-4-amd64, x86_64
Distribution        : Debian 8.9 (jessie) 
bash GNU bash 4.3.30
awk GNU Awk 4.1.1, API: 1.1 (GNU MPFR 3.1.2-p3, GNU MP 6.0.0)

-----
 Pattern: "Line 1"

-----
 Input data file data1:
Line 1 a
Line 22
Line 33
Line 1 b
Line 22
Line 1 c
Line 4
Line 5

-----
 Expected output:
Line 4
Line 5

-----
 Results:
Line 4
Line 5

-----
 Verify results if possible:

-----
 Comparison of 2 created lines with 2 lines of desired results:
 Succeeded -- files (computed) f1 and (standard) expected-output1 have same content.

Although this was run in Linux, our Solaris has gawk available, so I would expect similar results:

OS, ker|rel, machine: SunOS, 5.11, i86pc
Distribution        : Solaris 11.3 X86
gawk GNU Awk 3.1.8

Best wishes ... cheers, drl

bakunin · September 25, 2017, 8:04pm

Exactly this was jgt's solution (see post #2) and it got disregarded, probably without any test. Your solution will perhaps meet the same fate.

Note also, that the data shown is not the real data, the expression is not a real expression and something tells me that everything is different including the clock spinning backwards. Good luck writing scripts to solve unknown requirements on unknown data with unknown restrictions.

bakunin

drl · September 25, 2017, 10:00pm

Hi.

Yes, same idea, but not using files, rather memory -- not a big difference for small files, but for very large files, awk will be probably faster up to a memory limit, after which the shell solution could be a winner just for not using too much memory. That'd be a big file.

The sample suggests not-huge data.

Apologies to jgt for not recognizing the same solution, thanks to bakunin for pointing it out... cheers, drl

MadeInGermany · September 26, 2017, 3:21am

jgt's solution has UUOC: >outfile simply truncates a file.
The following should be more efficient, and uses a partial *glob* match.
Yet untested, I hope that exec works like that in all shells

var="Line 1"
exec 3>outfile
while read line
do
  case $line in
  *$var*) exec 3>outfile
  ;;
  *)echo "$line" >&3
  ;; 
  esac
done <inputfile

RudiC · September 26, 2017, 4:20am

May not be the most efficient:

tac file | sed -n "/$var/q; p;" | tac
Line 4
Line 5

disedorgue · September 26, 2017, 6:12am

Just for fun (with gnu awk) :

$ XX="Line 1"
$ awk  'BEGIN{X=0;ARGV[ARGC++]=ARGV[ARGC-1]}FNR==NR && /'"$XX"'/ {X=FNR} FNR !=NR && FNR > X' file
Line 4
Line 5

Work as Corona688's idea.
Regards.

rbatte1 · September 26, 2017, 6:34am

As a very different approach, how about:-

Req_Line="^Line 1 "                   # Note the leading carat to anchor to begginning of line (if that's what you want) and the trailing space to avoid matching Line 11
                                      # This is used as an Extended Regular Expression, so you can adjust this to suit your needs

IFS=":" read lastline rest < <(grep -En "$Req_Line" filename |tail -1)
((lastline=$lastline+1))

sed -n "$lastline,\$p" filename

For a large file, this has the overhead of perhaps reading the file twice, so you would have to trial it for performance.

It would be sensible to add some error checking, such as what to do if the expression does not match. As it is, this would display the whole file, which might not be what you want.

I hope that this helps, or at least gives an alternate.
Robin

drl · September 26, 2017, 8:25am

Hi.

Perhaps not the most efficient, but, in the absence of requirements from OP, it certainly is simple. I like simple ... cheers, drl

Soham · September 26, 2017, 8:50am

Thanks all.

Disedorgue's 'just for fun' solution worked for me with nawk.

RudiC · September 26, 2017, 9:00am

This approach may be more efficient than others for large files and the pattern found towards the end of the file, as tac opens the file from the end (output of strace ):

.
.
.
open("TMPFILE", O_RDONLY)               = 3
lseek(3, 0, SEEK_END)                   = 37790
.
.
.

and, if sed found the pattern and exits, it quits due to a broken pipe

.
.
.
write(1, "sr/bin/gpgsplit\n26696\t/usr/bin/g"..., 4096) = -1 EPIPE (Broken pipe)
--- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=14645, si_uid=1000} ---

NOT reading the entire input file.