Sort & Uniq -u

Antony_Ankrose · May 21, 2015, 2:04pm

Hi All,

Below the actual file which i like to sort and Uniq -u

/opt/oracle/work/Antony/Shell_Script> cat emp.1st
2233|a.k. shukula               |g.m.           |sales          |12/12/52       |6000
1006|chanchal singhvi           |director       |sales          |03/09/38       |6700
1265|s.n. dasguptha             |manager        |sales          |12/09/63       |5600
2476|anil aggarwal              |manager        |sales          |01/05/59       |5000
9876|jai sharma                 |director       |production     |12/03/50       |7000
5678|sumit chakrobarty          |d.g.m          |marketing      |19/04/43       |6000
6521|lalit chowdury             |director       |marketing      |09/26/45       |8200
2365|barun sengupta             |director       |personnel      |11/05/47       |7800
5423|n.k gupta                  |chairman       |admin          |30/08/56       |5400
6213|karuna ganguly             |g.m.           |accounts       |05/06/62       |6300
4290|jayant choudhury           |executive      |production     |07/09/50       |8200
3213|shyam saksena              |d.g.m.         |accounts       |12/12/55       |6000
3564|sudhir Ahagwal             |executive      |personnel      |06/07/47       |7500
2345|j.b. saxena                |g.m.           |marketing      |12/03/45       |8000
0110|v.k agarwal                |g.m.           |marketing      |31/12/40       |9000

When i issue below command it is giving me output

/opt/oracle/work/Antony/Shell_Script> cut -d'|' -f4 emp.1st | sort | uniq -u
admin
marketing
sales

Where i am suppose to get

admin
marketing
sales
production     
marketing 
personnel      
Accounts

Where i have done mistake ? please advice

senhia83 · May 21, 2015, 2:25pm

what is your objective? Do you want a unique list of 4th field?

If you go step by step..

1. cut -d'|' -f4 file

Then depending on what you want,

If you want sorted unique list

cut -d'|' -f4 file | sort -u

If you want to maintain order and get unique list of 4th field.

awk -F"|" '!a[$4]++{print $4}' file

If you want to collapse clustered values and get a list (may have repeats)

cut -d'|' -f4 file | uniq

Antony_Ankrose · May 21, 2015, 2:28pm

Thanks for your suggestion

Even if i sort with sort -u, why am i getting sales and Marketing twice ?

/opt/oracle/work/Antony/Shell_Script> cut -d'|' -f4 emp.1st | sort -u
accounts
admin
marketing
marketing
personnel
production
sales
sales

senhia83 · May 21, 2015, 2:35pm

antony ankrose:

Thanks for your suggestion

Even if i sort with sort -u, why am i getting sales and Marketing twice ?
/opt/oracle/work/Antony/Shell_Script> cut -d'|' -f4 emp.1st | sort -u
accounts
admin
marketing
marketing
personnel
production
sales
sales

I`m getting the desired result, not sure of whats with your dataset.
Do you have \r carriage returns?

 cut -d'|' -f4 tmp | sort -u
accounts
admin
marketing
personnel
production
sales

Scrutinizer · May 21, 2015, 2:36pm

There may be extra space or hidden characters in your input file...

senhia83 · May 21, 2015, 2:40pm

Can you do

cut -d'|' -f4 emp.1st | sort -u | od -c

and let us know what you get?

Antony_Ankrose · May 21, 2015, 2:44pm

/opt/oracle/work/Antony/Shell_Script> cut -d'|' -f4 emp.1st | sort -u | od -c
0000000    a   c   c   o   u   n   t   s  \t  \n   a   d   m   i   n  \t
0000020   \t  \n   m   a   r   k   e   t   i   n   g  \t  \n   m   a   r
0000040    k   e   t   i   n   g      \t  \n   p   e   r   s   o   n   n
0000060    e   l  \t  \n   p   r   o   d   u   c   t   i   o   n  \t  \n
0000100    s   a   l   e   s  \t  \t  \n   s   a   l   e   s      \t  \t
0000120   \n
0000121

senhia83 · May 21, 2015, 2:48pm

please use code tags from now..
can you see the extra tabs in your data?

Removing tabs with a pipe should give you unique list

cut -d'|' -f4 emp.1st | sed 's/\t//g' | sort -u

Antony_Ankrose · May 21, 2015, 3:09pm

This works , thanks for giving me the solution. Now i understand that issue is with the extra space.

You are awesome