removing items with repeated first 3 character

AWK help:

I have a file with following format. I need to remove any entries which are repeated based on first 3 characters. So from the following files I need to remove any entries start with "mas".

mas01bct
mas02bct
mas03bct
mas01bct
mas01bct
mas01bct
mas11bct
mas01bct
mas01bct
mas01bct
mas01bct
mas01bct
mas01bct
mas01bct
mas01bct
pas00abc
mrk01abc
lbc02mis

So the output file should contain:

pas00abc
mrk01abc
lbc02mis

Thanks and appreciate your help.

> cut -c1-3 file01 | sort | uniq -c | awk '$1<2 {print $2}' >file01k
> egrep -f file01k file01
pas00abc
mrk01abc
lbc02mis

Use nawk or /usr/xpg4/bin/awk on Solaris:

awk 'END { 
  while (++i <= c) 
    if (f[substr(s,1,3)] < 2) 
      print s 
      }
{ 
  f[substr($0,1,3)]++ 
  s[++c]=$0 
  }' infile

Another one:

awk '
NR==FNR{a[substr($0,1,3)]++;next}
a[substr($0,1,3)]<2' file file

Use nawk or /usr/xpg4/bin/awk on Solaris.

Regards

Thanks, but I am interested to remove only desired repeated item e.g. "mas" only or any other what I would like to remove not all. So I need to hardcode "mas" .

Thanks.

awk 'END { 
  while (++i <= c)
    if (t && s !~ "^"p)
      print s 
      }
f["^"p]++ { t = 1 } 
{ s[++c] = $0 }' p=mas infile

Thanks, but I am getting error:

$ ./remove_device_type.mas
awk: syntax error near line 3
awk: illegal statement near line 3
awk: syntax error near line 6
awk: bailing out near line 6

Did you try nawk or /usr/xpg4/bin/awk (assuming Solaris OS)?

if your uniq supports

uniq  -uw 3 file

Nice!

i suspect he is using Solaris, so i don't think its version of uniq supports it (i may be wrong)

It could be, but it's good to know anyway (for the OP, for me and for the other readers of this thread). And it's available on Linux for sure. :slight_smile: