Request to check:remove duplicates and write sytematically

manigrover · July 23, 2012, 10:13pm

Hi all

I have a file with following input

It contains 5 columns

gene name drug drug ID disease approved

Now the same gene is repeated many times with different data in column2,3 ,4,5

I want to arrange dat in such a way that there shuld be one entry in the column(no repeated entries) column 2,3,4,5 shuld remain as it is

so output shuld be like this:

Kindly let me know scripting regarding this.

jim_mcnamara · July 23, 2012, 11:21pm

awk '!arr[$0]++' inputfile | sort > outputfile

Please try that.

manigrover · July 24, 2012, 11:18am

Hi all

sorry the output shuld contain only once the entry in first columns like thisL

---------- Post updated at 10:32 PM ---------- Previous update was at 10:22 PM ----------

Hi Jim

The output stiil contain repeated entries its just sorted it alphabetically. using this coding

it shows

I want the output shuld be

1,3-Beta-Glucan synthase Anidulafungin DAP000546 Fungal infections Approved
Caspofungin DAP000547 Fungal infections Approved
Cilofungin DCL000331 Candida infections Discontinued
Eraxis/Vfend DCL000522 Beta-D Glucan Synthase Inhibitor, Cyp P450 Mediated Alpha-lanosterol Demethylation Phase III
Micafungin DAP000548 Fungal infections Approved
16S rRNA
[/quote]

---------- Post updated 07-24-12 at 10:18 AM ---------- Previous update was 07-23-12 at 10:32 PM ----------

[/quote]