Hi, I have multiple large files which consist of the below format:
I am trying to write an awk or sed script to remove all occurrences of the 00 record except the first and remove all of the 80 records except the last one.
Any help would be greatly appreciated.
pamu
February 20, 2013, 5:31am
2
Is this what you want...?
awk 'NR==1
s=0
NR>1 && !/^00/
/^00/{s=1}
END{if(s){print}}' file
1 Like
Thanks for your reply.
Yes that is along the lines of what i need.
That script removes the 00 records except the 1st
I also need it to remove all the 80 records but leave the last one as it is.
Any ideas....?
pamu
February 20, 2013, 6:22am
4
Forgot to add this condition..
try
awk 'NR==1
s=0
NR>1 && !/^00|^80/
/^80/{s=1}
END{if(s){print}}' file
It's getting there now
That edit to the script removes all the 80 records including the last one.
How could i get it to leave the last 80 record there??
pamu
February 20, 2013, 6:38am
6
Try now
awk '!s && /00/{s=1;print}
/80/ && a{K=$0;for(i=1;i<=a;i++){print X};a=0}
!/^00|80/{X[++a]=$0}
END{if(K){print K}
for(i=1;i<=a;i++){print X}
}' file
Same again pamu
Everything is perfect apart from the final line of the file which should be the 80 record but is the 70 record
pamu
February 20, 2013, 7:02am
8
Please check below
1) I am assuming we need to print only first occurrence of a 00 entry and remove remaining all.
2) Remove all instances of an entry starting with 80 except last one
3) Keep other entries as it is.
check..
$ cat file
00..................
06..................
06..................
06..................
06..................
70..................
80..................
00..................
06..................
06..................
06..................
06..................
70..................
80..................
00..................
06..................
06..................
06..................
06..................
70..................
80..................
$ awk '!s && /00/{s=1;print}
/80/ && a{K=$0;for(i=1;i<=a;i++){print X};a=0}
!/^00|80/{X[++a]=$0}
END{if(K){print K}
for(i=1;i<=a;i++){print X}
}' file
00..................
06..................
06..................
06..................
06..................
70..................
06..................
06..................
06..................
06..................
70..................
06..................
06..................
06..................
06..................
70..................
80..................
Please let me know if i need to correct anything..
pamu
---------- Post updated at 05:32 PM ---------- Previous update was at 05:29 PM ----------
For more harder input
$ cat file
06..................
00..................
06..................
06..................
06..................
06..................
70..................
80..................
00..................
06..................
06..................
06..................
06..................
70..................
80..................
00..................
06..................
06..................
06..................
06..................
70..................
80..................
06..................
$ awk '!s && /00/{for(i=1;i<=a;i++){print X};s=1;a=0;print}
/80/ && a{K=$0;for(i=1;i<=a;i++){print X};a=0}
!/^00|80/{X[++a]=$0}
END{if(K){print K}
for(i=1;i<=a;i++){print X}
}' file
06..................
00..................
06..................
06..................
06..................
06..................
70..................
06..................
06..................
06..................
06..................
70..................
06..................
06..................
06..................
06..................
70..................
80..................
06..................
1 Like
drl
February 20, 2013, 9:04am
9
Hi.
An alternative awk solution:
#!/usr/bin/env bash
# @(#) s1 Demonstrate filter for first and last matches.
# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C awk
FILE=${1-data1}
lines=$( wc -l < $FILE )
pl " Input data file edges of $lines lines in $FILE:"
head -3 $FILE ; pe "..." ; tail -3 $FILE
awk '
BEGIN { zz = ""; ee = "" }
$0 ~ /^00/ && zz == "" { zz = $0 ; print ; next }
$0 ~ /^80/ { ee = $0 ; next }
$0 !~ /^00/ { print }
END { print ee }
' $FILE > f1
lines=$( wc -l < f1 )
pl " Output data file edges of $lines lines in f1:"
head -3 f1 ; pe "..." ; tail -3 f1
exit 0
producing:
% ./s1
Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution : Debian GNU/Linux 5.0.8 (lenny)
bash GNU bash 3.2.39
awk GNU Awk 3.1.5
-----
Input data file edges of 23 lines in data1:
00.first............
06..................
00..................
...
80..................
70..................
80.last.............
-----
Output data file edges of 17 lines in f1:
00.first............
06..................
06..................
...
06..................
70..................
80.last.............
For production, just extract the awk code. The other code is support for data and version display, etc.
Best wishes ... cheers, drl
1 Like
Thanks Guys,
Both of the above solved my issue.
Best Regards to both of you
alister
February 20, 2013, 3:50pm
11
Compared to awk and sed, ed's a much more suitable tool for this task (memory permitting).
printf '%s\n' '/^00/+1,$ g//d' 1 '1,?^80?-1 g//d' w q | ed -s infile >/dev/null
Regards,
Alister