removing items with repeated first 3 character

amir07 · October 27, 2008, 12:28pm

AWK help:

I have a file with following format. I need to remove any entries which are repeated based on first 3 characters. So from the following files I need to remove any entries start with "mas".

mas01bct
mas02bct
mas03bct
mas01bct
mas01bct
mas01bct
mas11bct
mas01bct
mas01bct
mas01bct
mas01bct
mas01bct
mas01bct
mas01bct
mas01bct
pas00abc
mrk01abc
lbc02mis

So the output file should contain:

pas00abc
mrk01abc
lbc02mis

Thanks and appreciate your help.

joeyg · October 27, 2008, 12:36pm

> cut -c1-3 file01 | sort | uniq -c | awk '$1<2 {print $2}' >file01k
> egrep -f file01k file01
pas00abc
mrk01abc
lbc02mis

radoulov · October 27, 2008, 1:00pm

Use nawk or /usr/xpg4/bin/awk on Solaris:

awk 'END { 
  while (++i <= c) 
    if (f[substr(s,1,3)] < 2) 
      print s 
      }
{ 
  f[substr($0,1,3)]++ 
  s[++c]=$0 
  }' infile

Franklin52 · October 27, 2008, 1:22pm

Another one:

awk '
NR==FNR{a[substr($0,1,3)]++;next}
a[substr($0,1,3)]<2' file file

Use nawk or /usr/xpg4/bin/awk on Solaris.

Regards

amir07 · October 27, 2008, 2:00pm

Thanks, but I am interested to remove only desired repeated item e.g. "mas" only or any other what I would like to remove not all. So I need to hardcode "mas" .

Thanks.

radoulov · October 27, 2008, 3:27pm

awk 'END { 
  while (++i <= c)
    if (t && s !~ "^"p)
      print s 
      }
f["^"p]++ { t = 1 } 
{ s[++c] = $0 }' p=mas infile

amir07 · October 28, 2008, 11:01am

Thanks, but I am getting error:

$ ./remove_device_type.mas
awk: syntax error near line 3
awk: illegal statement near line 3
awk: syntax error near line 6
awk: bailing out near line 6

radoulov · October 28, 2008, 11:07am

Did you try nawk or /usr/xpg4/bin/awk (assuming Solaris OS)?

ghostdog74 · October 28, 2008, 12:43pm

if your uniq supports

uniq  -uw 3 file

radoulov · October 28, 2008, 12:45pm

Nice!

ghostdog74 · October 28, 2008, 1:06pm

i suspect he is using Solaris, so i don't think its version of uniq supports it (i may be wrong)

radoulov · October 28, 2008, 4:05pm

It could be, but it's good to know anyway (for the OP, for me and for the other readers of this thread). And it's available on Linux for sure.