amir07
October 27, 2008, 12:28pm
1
AWK help:
I have a file with following format. I need to remove any entries which are repeated based on first 3 characters. So from the following files I need to remove any entries start with "mas".
mas01bct
mas02bct
mas03bct
mas01bct
mas01bct
mas01bct
mas11bct
mas01bct
mas01bct
mas01bct
mas01bct
mas01bct
mas01bct
mas01bct
mas01bct
pas00abc
mrk01abc
lbc02mis
So the output file should contain:
pas00abc
mrk01abc
lbc02mis
Thanks and appreciate your help.
joeyg
October 27, 2008, 12:36pm
2
> cut -c1-3 file01 | sort | uniq -c | awk '$1<2 {print $2}' >file01k
> egrep -f file01k file01
pas00abc
mrk01abc
lbc02mis
Use nawk or /usr/xpg4/bin/awk on Solaris:
awk 'END {
while (++i <= c)
if (f[substr(s,1,3)] < 2)
print s
}
{
f[substr($0,1,3)]++
s[++c]=$0
}' infile
Another one:
awk '
NR==FNR{a[substr($0,1,3)]++;next}
a[substr($0,1,3)]<2' file file
Use nawk or /usr/xpg4/bin/awk on Solaris.
Regards
amir07
October 27, 2008, 2:00pm
5
Thanks, but I am interested to remove only desired repeated item e.g. "mas" only or any other what I would like to remove not all. So I need to hardcode "mas" .
Thanks.
awk 'END {
while (++i <= c)
if (t && s !~ "^"p)
print s
}
f["^"p]++ { t = 1 }
{ s[++c] = $0 }' p=mas infile
amir07
October 28, 2008, 11:01am
7
Thanks, but I am getting error:
$ ./remove_device_type.mas
awk: syntax error near line 3
awk: illegal statement near line 3
awk: syntax error near line 6
awk: bailing out near line 6
Did you try nawk or /usr/xpg4/bin/awk (assuming Solaris OS)?
i suspect he is using Solaris, so i don't think its version of uniq supports it (i may be wrong)
It could be, but it's good to know anyway (for the OP, for me and for the other readers of this thread). And it's available on Linux for sure.