I have to fetch above mentioned bold headings in my first post as columns in my expected output.
This input file is sample of my big input file which contain many drug cards
So,
My out put shoulbe like following six coulmns
F2 lepirudin refuldan approved " sentence under indication" "sentence under machanism of action"
As my real input file is big so there will be many rows with different names like this and with my current code these rows are overlapping!
For my above attached sample input expected output is 6 columns like this:
F2 Lepirudin Refludan Approved For the treatment of heparin-induced thrombocytopenia Lepirudin forms a stable non-covalent complex with alpha-thrombin, thereby abolishing its ability to cleave fibrinogen and initiate the clotting cascade. The inhibition of thrombin prevents the blood clotting cascade.
here 6 columns represent following headings from my input file:
Above command is giving me only one line output rom my big input file and it's like this:
bash-3.2$ awk 'BEGIN {cnt = split ("# Drug_Target_.*_Gene_Name|# Brand_Name|# Generic_Name|# Drug_Type|# Indication|# Mechanism_Of_Action", SA, "|")}
> {for (i=1; i<=cnt; i++) if (match ($1, SA)) Out=$2}
> END {for (i=1; i<=cnt; i++) printf "%-28s", SA; printf "\n";
> for (i=1; i<=cnt; i++) printf "%-28s", Out; printf "\n" }
> ' FS="\n" RS="\n\n" drugbank.txt
# Drug_Target_.*_Gene_Name # Brand_Name # Generic_Name # Drug_Type # Indication # Mechanism_Of_Action
CFTR Kalydeco Ivacaftor Approved For the treatment of cystic fibrosis (CF) in patients age 6 years and older who have a G551D mutation in the CFTR gene.Cystic fibrosis is caused by any one of several defects in a protein, cystic fibrosis transmembrane conductance regulator, which regulates fluid flow within cells and affects the components of sweat, digestive fluids, and mucus. The defect, which is caused by a mutation in the individual's DNA, can be in any of several locations along the protein, each of which interferes with a different function of the protein. One mutation, G551D, lets the CFTR protein reach the epithelial cell surface, but doesn't let it transport chloride through the ion channel. Ivacaftor is a potentiator of the CFTR protein. The CFTR protein is a chloride channel present at the surface of epithelial cells in multiple organs. Ivacaftor facilitates increased chloride transport by potentiating the channel-open probability (or gating) of the G551D-CFTR protein.
bash-3.2$
I tried another similar file it has given me only one result
It seems to me error is related to check the whole file with many similar entries!
This is EXACTLY what you have requested, there is NO error. Your sample file had one single record only, you did not provide a representative sample file, no hint was given on how records can be separated, identified, or checked for completeness, nor how data fields are interrelated. Pls use the code example in my post and extend/improve it to your expanded requirement.