Numbering duplicates

Hi,

I have this large file and sometimes there are duplicates and I want to basically find them and figure how many there are.

So I have a file with multiple columns and the last column (9) has the duplicates.

eg.

yan
tar
tar
man
ban
tan
tub
tub
tub

Basically what I want to do is label non duplicates as "0" and duplicates as "0", "1" and in the case of triplicates "0", "1" and "2"

So the output file will look like this

yan 0
tar 0
tar 1
man 0
ban 0
tan 0
tub 0
tub 1
tub 2

thanks

Kylle:confused:

awk '
NF >= 9 { word[$9]++ }
END { for (w in word) {
            print w,word[w]
            }
       }
'  inputfile
        

you didnt tell what delimeter you have but Try this...

awk '{print $9,word[$9]++}' yourfile

malcomex999 is better reader :), use it.

Hi its tab deliminted,

thanks but Im not sure if that does what I want it to do. It counted how many are unique and how many are replicates. Basically what i want it to do is this:

Before...

yan
tar
tar
man
ban
tan
tub
tub
tub

yan unique
tar unique
tar duplicate
man unique
ban unique
tan unique
tub unique
tub duplicate
tub triplicate

thanks

---------- Post updated at 12:53 PM ---------- Previous update was at 12:46 PM ----------

Hi its tab deliminted,

thanks but Im not sure if that does what I want it to do. It counted how many are unique and how many are replicates. Basically what i want it to do is this:

Before...

yan
tar
tar
man
ban
tan
tub
tub
tub

yan unique
tar unique
tar duplicate
man unique
ban unique
tan unique
tub unique
tub duplicate
tub triplicate

thanks

Assuming you're using the 9th column:

awk '{print $9, a[$9]++?" duplicate":" unique"}' file

Did you try

version ?
It give result:

yan 0
tar 0
tar 1
man 0
ban 0
tan 0
tub 0
tub 1
tub 2

Which is just that what you have in your 1st definition. Your field delimeter is tab, which is one of the default delimeter. If your data include also space in data, then you need set FS value:

awk -F "\t" '{print $9,word[$9]++}' yourfile