Hi
I ahve a lookup file wiht seven words
CD
HT
CAD
HT
T1D
T2D
BD
another file contain data like this
CHRM1 P11229 Pirenzepine DAP000492 Peptic ulcer disease Approved T2D
CHRM1 P11229 Glycopyrrolate DAP001116 Anesthetic Approved T2D
CHRM1 P11229 Clidinium DAP001117 Abdominal/stomach pain Approved T2D
CHRM1 P11229 Dicyclomine DAP001118 Irritable bowel syndrome Approved T2D
CHRM1 P11229 Ethopropazine DAP001119 Parkinson's disease Approved T2D
CHRM1 P11229 Cycrimine DAP001120 Parkinson's disease Approved T2D
CHRM1 P11229 Benztropine DAP001121 Parkinson's disease Approved T2D
CHRM1 P11229 Trihexyphenidyl DAP001122 Parkinson's disease Approved T2D
CHRM1 P11229 Propantheline DAP001123 Excessive sweating (hyperhidrosis) Approved T2D
CHRM1 P11229 Oxyphenonium DAP001124 Spasm Approved T2D
CHRM1 P11229 Biperiden DAP001125 Parkinson's disease Approved T2D
CHRM1 P11229 Talsaclidine isomer DCL000268 Alzheimer's disease Discontinued T2D
CHRM1 P11229 Sabcomeline hydrochloride DCL000279 Cardiovascular diseases Phase IIa T2D
CHRM1 P11229 Talsaclidine fumarate DCL000303 Alzheimer's disease Discontinued T2D
CHRM1 P11229 Xanomeline tartrate DCL000328 Alzheimer's disease Phase II T2D
CHRM1 P11229 GSK573719 DCL000381 Chronic Obstructive Pulmonary Disease (COPD) Phase II T2D
CHRM1 P11229 GSK961081 DCL000397 Chronic Obstructive Pulmonary Disease (COPD) Phase II completed T2D
CHRM1 P11229 GSK1034702 DCL000402 Schizophrenia, Dementia Phase I completed T2D
CHRM1 P11229 Darotropium DCL000514 COPD Suspended in Phase II in GSK 2009 Report T2D
CHRM1 P11229 Darotropium + 642444 DCL000515 COPD Phase III T2D
CHRM1 P11229 Revatropate DCL000957 Chronic obstructive pulmonary disease Discontinued in Phase I T2D
FLT1 P17948 Sorafenib DAP000006 Advanced renal cell carcinoma Launched CAD
FLT1 P17948 Sorafenib DAP000006 Hepatocellular carcinoma, NSCLC, melanoma Phase III CAD
FLT1 P17948 Sorafenib DAP000006 Myelodyspalstic syndrome, AML, head & neck cancer, breast, colon, ovarian, pancreatic cancer Phase II CAD
FLT1 P17948 Ranibizumab DAP001260 Age-related macular degeneration Approved CAD
FLT1 P17948 Ranibizumab DAP001260 Diabetic macular edema and retinal vein occlusion Phase III CAD
FLT1 P17948 Telbermin DCL001016 Diabetic foot ulcers Discontinued in Phase II CAD
KDR P35968 Sunitinib DAP000005 Advanced renal cell carcinoma Launched CAD,CD,CD
KDR P35968 Sunitinib DAP000005 Advanced renal cell carcinoma Phase II CAD,CD,CD
KDR P35968 Pazopanib HCl DAP001550 Renal cell carcinoma Approved CAD,CD,CD
KDR P35968 CYC116 DCL000010 Solid Tumors Terminated in Phase I CAD,CD,CD
KDR P35968 XL999 DCL000011 Advanced Malignancies Phase I CAD,CD,CD
KDR P35968 CT-322 DCL000096 Cancer/Tumors Phase I CAD,CD,CD
KDR P35968 CT-322 DCL000096 Macular Degeneration Preclinical CAD,CD,CD
KDR P35968 XL647 DCL000263 Cancer Phase I completed CAD,CD,CD
KDR P35968 XL647 DCL000263 Carcinoma, Non-Small-Cell Lung Phase II completed CAD,CD,CD
KDR P35968 XL880 DCL000265 Solid Tumors Phase I CAD,CD,CD
KDR P35968 XL880 DCL000265 Gastric Cancer, Renal Cell Carcinoma, Squamous Cell Cancer of the Head and Neck Phase II CAD,CD,CD
KDR P35968 SU-6668 DCL000342 Advanced solid tumours Discontinued CAD,CD,CD
[/CODE]
I am using following code
awk -F'\t' 'FNR==NR{a[$0]=1;next} {
gsub(/Approved */,"",$6)
n=split($6,b,",")
$6=""
for(i=1;i<=n;i++)
if(b in a)
print $0, "Approved" > "file_" b ".txt"
}' OFS='\t' lookupfile mainfile
But I m receiving seven file but output doesnot contain allt he data according to second input file
For eg one part of the output for T2D file is
CHRM1 P11229 Pirenzepine DAP000492 Peptic ulcer disease Approved
CHRM1 P11229 Glycopyrrolate DAP001116 Anesthetic Approved
CHRM1 P11229 Clidinium DAP001117 Abdominal/stomach pain Approved
CHRM1 P11229 Dicyclomine DAP001118 Irritable bowel syndrome Approved
CHRM1 P11229 Ethopropazine DAP001119 Parkinson's disease Approved
CHRM1 P11229 Cycrimine DAP001120 Parkinson's disease Approved
CHRM1 P11229 Benztropine DAP001121 Parkinson's disease Approved
CHRM1 P11229 Trihexyphenidyl DAP001122 Parkinson's disease Approved
CHRM1 P11229 Propantheline DAP001123 Excessive sweating (hyperhidrosis) Approved
CHRM1 P11229 Oxyphenonium DAP001124 Spasm Approved
CHRM1 P11229 Biperiden DAP001125 Parkinson's disease Approved
But, the expected output is
CHRM1 P11229 Pirenzepine DAP000492 Peptic ulcer disease Approved
CHRM1 P11229 Glycopyrrolate DAP001116 Anesthetic Approved
CHRM1 P11229 Clidinium DAP001117 Abdominal/stomach pain Approved
CHRM1 P11229 Dicyclomine DAP001118 Irritable bowel syndrome Approved
CHRM1 P11229 Ethopropazine DAP001119 Parkinson's disease Approved
CHRM1 P11229 Cycrimine DAP001120 Parkinson's disease Approved
CHRM1 P11229 Benztropine DAP001121 Parkinson's disease Approved
CHRM1 P11229 Trihexyphenidyl DAP001122 Parkinson's disease Approved
CHRM1 P11229 Propantheline DAP001123 Excessive sweating (hyperhidrosis) Approved
CHRM1 P11229 Oxyphenonium DAP001124 Spasm Approved
CHRM1 P11229 Biperiden DAP001125 Parkinson's disease Approved
CHRM1 P11229 Talsaclidine isomer DCL000268 Alzheimer's disease Discontinued
CHRM1 P11229 Sabcomeline hydrochloride DCL000279 Cardiovascular diseases Phase IIa
CHRM1 P11229 Talsaclidine fumarate DCL000303 Alzheimer's disease Discontinued
CHRM1 P11229 Xanomeline tartrate DCL000328 Alzheimer's disease Phase II
CHRM1 P11229 GSK573719 DCL000381 Chronic Obstructive Pulmonary Disease (COPD) Phase II
CHRM1 P11229 GSK961081 DCL000397 Chronic Obstructive Pulmonary Disease (COPD) Phase II completed
CHRM1 P11229 GSK1034702 DCL000402 Schizophrenia, Dementia Phase I completed
CHRM1 P11229 Darotropium DCL000514 COPD Suspended in Phase II in GSK 2009 Report
CHRM1 P11229 Darotropium + 642444 DCL000515 COPD Phase III
CHRM1 P11229 Revatropate DCL000957 Chronic obstructive pulmonary disease Discontinued in Phase I
[/CODE]So in out put its showing only those lines which cotain word "approved" on right hand side but others should also be there
---------- Post updated 09-10-12 at 05:29 AM ---------- Previous update was 09-09-12 at 11:56 PM ----------
Hi
Whether I will be able to get result after editing "approved" word but I have to choose many other words in the following code to make it worthwile
awk -F'\t' 'FNR==NR{a[$0]=1;next} {
gsub(/Approved */,"",$6)
n=split($6,b,",")
$6=""
for(i=1;i<=n;i++)
if(b in a)
print $0, "Approved" > "file_" b ".txt"
}' OFS='\t' lookupfile mainfile