Search specific name in a file and fetch specific entries

Hi all,

I have 2 files, One file contain data like this

FHIT 
CS 
CHRM1 
PDE3A 
PDE3B 
HSP90AA1 
PTK2 
HTR1A 
ESR1 
PARP1 
PLA2G1B

These names are mentioned in the second file(Please see attached second file) as

# Drug_Target_X_Gene_NameWhere X can be any number (1-1000)

Now the second file contain data like this where these entries (Drug_Target_X_Gene_Name) are present

#BEGIN_DRUGCARD DB0xxxx (0001- 8820)

# Drug_Target_X_Gene_Name

Description

#END_DRUGCARD DB0xxxx (0001- 8820)

So, if any entry of first file matches with second file

Drug_Target_X_Gene_Name

I want to fetch following entry in a separate file

# Description:

So, if the CHRM1 from first file is present in Drug Card 00001 of second file as # Drug_Target_X_Gene_Name
the output shuld be

CHRM1       (Description in front like thid for eg: Lepirudin is identical to        natural hirudin except for substitution of  .....)

It is also possible CHRM1 is present in more than one drug cards, in that case there will be two different descriptions from two different drug cards

CHRM1       (Description in front like this for eg: Lepirudin is identical to        natural hirudin except for substitution of  .....) 
CHRM1       (Description in front like this for eg: Dornase alfa is a biosynthetic form of human deoxyribunuclease I (DNase I) enzyme. It is produced in genetically modified    Chinese hamster ovary   .....)

In same way for all entries of first file I need description from drug cards of second file if these entries are present in any of the drug card of second file.

Any help will be really appreciated.

Have you tried using

grep -f file1 file2

Hi

Thanks for reply. I am matching a specific entry to fetch the description heading details only for each drug card if entry will match that why I think I have to use some other commands as well

Kindly guide if possible.

Mani

I think you have given wrong files. There is no match between those two files..

Hi

Thanks for reply. Both of the files are big. The first file contain 137 entries but the second file is very very big therefore I have part of it here.

If you can have a look on the attached second file on page 11, it is mentioned as

# Drug_Target_1_Gene_Name:
F2

these entries has to be matched with firs file and if matched I have to fetch #description , #indication and # pharmacology headings of that Drug card in which it is matched because each drug card contain moret han one

# Drug_Target_X_Gene_Name

but only once #description , #indication and # pharmacology headings .

Kindly guide if possible.

I have already told you that, I am not able to find any relation between those two files.
And i think others too..:smiley:

So its better to give some example which has relation between each other. So that it will be very helpful to understand.Please post your Input files(which are related to each other) and desired output.

It's not possible to give solution all the time by assuming everything.

Hope this helps you...:slight_smile:

Hi

I have attached a small second file

and first file is like this

F2
CHRM1
TLS3
CPS3

Now I want to search entries in first file in the # Drug_Target_X_Gene_Name (X= 1-1000)heading (Mentioned on page 11 of second file as sample)

if it matched as here F2 is matching then it should fetch following entries

#description , #indication and # pharmacology from each Drug card

So output will be

F2   #description Lepirudin is identical to natural hirudin except for  substitution of leucine for isoleucine at the N-terminal end of the molecule and the absence of a sulfate group on the tyrosine at position 63. It is produced via yeast cells.

#indication   For the treatment of heparin-induced thrombocytopenia


# pharmacology Lepirudin is used to break up clots and to reduce thrombocytopenia. It binds to thrombin and prevents thrombus or clot formation. It is a highly potent, selective, and essentially irreversible inhibitor of thrombin and clot-bond thrombin. Lepirudin requires no cofactor for its anticoagulant action. Lepirudin is a recombinant form of hirudin, an endogenous anticoagulant found in medicinal leeches.

Note: Drug_Target_X_Gene_Name entries are present between drug cards and drug card starts with

#BEGIN_DRUGCARD DB0xxxx (0001- 8820)

#END_DRUGCARD DB0xxxx (0001- 8820)

Hope it helps

Please let me know about above mentioned questionas even if I can get only #description in the output that shuld fine as well if the entries of first file matches with second file Gene_name entries.

Kindly guide if possible.