Remove certain lines from file based on start of line except beginning and ending

nwalsh88 · February 20, 2013, 5:08am

Hi, I have multiple large files which consist of the below format:

I am trying to write an awk or sed script to remove all occurrences of the 00 record except the first and remove all of the 80 records except the last one.

Any help would be greatly appreciated.

pamu · February 20, 2013, 5:31am

Is this what you want...?

awk 'NR==1
s=0
NR>1 && !/^00/
/^00/{s=1}
END{if(s){print}}' file

nwalsh88 · February 20, 2013, 5:40am

Thanks for your reply.

Yes that is along the lines of what i need.

That script removes the 00 records except the 1st

I also need it to remove all the 80 records but leave the last one as it is.

Any ideas....?

pamu · February 20, 2013, 6:22am

Forgot to add this condition..

try

awk 'NR==1
s=0
NR>1 && !/^00|^80/
/^80/{s=1}
END{if(s){print}}' file

nwalsh88 · February 20, 2013, 6:29am

It's getting there now

That edit to the script removes all the 80 records including the last one.

How could i get it to leave the last 80 record there??

pamu · February 20, 2013, 6:38am

Try now

awk '!s && /00/{s=1;print}
/80/ && a{K=$0;for(i=1;i<=a;i++){print X};a=0}
!/^00|80/{X[++a]=$0}
END{if(K){print K}
for(i=1;i<=a;i++){print X}
}' file

nwalsh88 · February 20, 2013, 6:50am

Same again pamu

Everything is perfect apart from the final line of the file which should be the 80 record but is the 70 record

pamu · February 20, 2013, 7:02am

Please check below

1) I am assuming we need to print only first occurrence of a 00 entry and remove remaining all.
2) Remove all instances of an entry starting with 80 except last one
3) Keep other entries as it is.

check..

$ cat file
00..................
06..................
06..................
06..................
06..................
70..................
80..................
00..................
06..................
06..................
06..................
06..................
70..................
80..................
00..................
06..................
06..................
06..................
06..................
70..................
80..................

$ awk '!s && /00/{s=1;print}
/80/ && a{K=$0;for(i=1;i<=a;i++){print X};a=0}
!/^00|80/{X[++a]=$0}
END{if(K){print K}
for(i=1;i<=a;i++){print X}
}' file

00..................
06..................
06..................
06..................
06..................
70..................
06..................
06..................
06..................
06..................
70..................
06..................
06..................
06..................
06..................
70..................
80..................

Please let me know if i need to correct anything..

pamu

---------- Post updated at 05:32 PM ---------- Previous update was at 05:29 PM ----------

For more harder input

$ cat file
06..................
00..................
06..................
06..................
06..................
06..................
70..................
80..................
00..................
06..................
06..................
06..................
06..................
70..................
80..................
00..................
06..................
06..................
06..................
06..................
70..................
80..................
06..................

$ awk '!s && /00/{for(i=1;i<=a;i++){print X};s=1;a=0;print}
/80/ && a{K=$0;for(i=1;i<=a;i++){print X};a=0}
!/^00|80/{X[++a]=$0}
END{if(K){print K}
for(i=1;i<=a;i++){print X}
}' file

06..................
00..................
06..................
06..................
06..................
06..................
70..................
06..................
06..................
06..................
06..................
70..................
06..................
06..................
06..................
06..................
70..................
80..................
06..................

drl · February 20, 2013, 9:04am

Hi.

An alternative awk solution:

#!/usr/bin/env bash

# @(#) s1	Demonstrate filter for first and last matches.

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C awk

FILE=${1-data1}

lines=$( wc -l < $FILE )
pl " Input data file edges of $lines lines in $FILE:"
head -3 $FILE ; pe "..." ; tail -3 $FILE

awk '
BEGIN	{ zz = ""; ee = "" }
$0 ~ /^00/ && zz == ""	{ zz = $0 ; print ; next }
$0 ~ /^80/	{ ee = $0 ; next }
$0 !~ /^00/	{ print }
END	{ print ee }
' $FILE > f1
lines=$( wc -l < f1 )
pl " Output data file edges of $lines lines in f1:"
head -3 f1 ; pe "..." ; tail -3 f1

exit 0

producing:

% ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian GNU/Linux 5.0.8 (lenny) 
bash GNU bash 3.2.39
awk GNU Awk 3.1.5

-----
 Input data file edges of 23 lines in data1:
00.first............
06..................
00..................
...
80..................
70..................
80.last.............

-----
 Output data file edges of 17 lines in f1:
00.first............
06..................
06..................
...
06..................
70..................
80.last.............

For production, just extract the awk code. The other code is support for data and version display, etc.

Best wishes ... cheers, drl

nwalsh88 · February 20, 2013, 9:16am

Thanks Guys,

Both of the above solved my issue.

Best Regards to both of you

alister · February 20, 2013, 3:50pm

Compared to awk and sed, ed's a much more suitable tool for this task (memory permitting).

printf '%s\n' '/^00/+1,$ g//d' 1 '1,?^80?-1 g//d' w q | ed -s infile >/dev/null

Regards,
Alister