Help with remove duplicate content and only keep the first content detail

patrick87 · December 20, 2010, 4:54am

Input

data_10 SSA
data_2 TYUE
data_3 PEOCV
data_6 SSAT
data_21 SSA
data_19 TYUEC
data_14 TYUE
data_15 SSA
data_32 PEOCV
.
.

Desired Output

data_10 SSA
data_2 TYUE
data_3 PEOCV
data_6 SSAT
data_19 TYUEC
.
.

From the above data, if the data in column two is same (eg. data_10, data_21, and data_15 all got SSA), I would only keep the data which appear first (eg. keep data_10 SSA, remove data_21 SSA, and data_15 SSA)
Thanks.

system · December 20, 2010, 4:58am

cat input_file | cut -f2 | uniq | while read line
do
    grep "$line" input_file | head -1 >> output_file
done

anurag.singh · December 20, 2010, 5:05am

awk '{if(!a[$2]) print;a[$2]++;}' inPutfile

patrick87 · December 20, 2010, 5:14am

Hi ROHON,
I just try it out.
It seems like can't get desired output result?
Thanks.

---------- Post updated at 05:14 AM ---------- Previous update was at 05:05 AM ----------

Thanks for your awk command.
It able to remove the duplicate line in column two successfully.
Unfortunately, its (duplicate data in column) respectively column one detail still keep at the data?

anurag.singh · December 20, 2010, 5:18am

Didn't get you. If you are looking for a different output, pls post expected output

system · December 20, 2010, 5:19am

cat input_file | cut -f2 | uniq | while read line
do
   grep " ${line}$" input_file | head -1 >> output_file
done

patrick87 · December 20, 2010, 9:08am

Hi singh,

I just edit my question.
Hopefully it is more clear now.
Thanks for your advice.

radoulov · December 20, 2010, 9:11am

Or even:

awk '!_[$2]++' infile

To the OP: please elaborate more on how the output from anurag.singh's command is wrong.

anurag.singh · December 20, 2010, 9:15am

I believe command in post #3 is doing the same.
@radoulov, a shorter/better command.

radoulov · December 20, 2010, 9:17am

Yes,
I just wanted to show that you can post increment and check with a single expression