Find matches and write the data before it

Hi all

I am here for help once again

I have two files

One file is like this with one columns

F2
B2
CAD
KGM
HTC
CSP

Second file is like this in 5 columns where firs column contain sometime entries of first file with space and other entries

F2 XYZ CDT CAD          it is part of agriculture    it is part of university   it is part of ...             it is used for.... 

KGM HTC CSP      it is part of agriculture    it is part of university   it is part of ...             it is used for.... 

If there is a match then I have to separate like this in 5 columns

F2  it is part of agriculture    it is part of university   it is part of ...             it is used for.... 
CAD  it is part of agriculture    it is part of university   it is part of ...             it is used for.... 


KGM it is part of agriculture    it is part of university   it is part of ...             it is used for.... 

HTC  it is part of agriculture    it is part of university   it is part of ...             it is used for.... 

CSP  it is part of agriculture    it is part of university   it is part of ...             it is used for.... 


please help me out

gawk '{
if(NR==FNR){
	_[$1] = 1
}
else{
	for(i=1;i<=NF;i++){
		if(_[$i] == 1){
			for(j=i;j<=NF;j++){
				printf $j" "
			}
			print ""
		}
	}
}
}
' a b

Thankyou very much dear.

Its seemd good code but its not working completely as my output is like this if F2 matches or HTc matches

F2 XYZ CDT CAD          it is part of agriculture    it is part of university   it is part of ...             it is used for.... 

KGM HTC CSP      it is part of agriculture    it is part of university   it is part of ...             it is used for...

But I want to remove other non matched entries of first column so that output wilbe

F2  it is part of agriculture    it is part of university   it is part of ...             it is used for.... 
CAD  it is part of agriculture    it is part of university   it is part of ...             it is used for.... 


KGM it is part of agriculture    it is part of university   it is part of ...             it is used for.... 

HTC  it is part of agriculture    it is part of university   it is part of ...             it is used for.... 

CSP  it is part of agriculture    it is part of university   it is part of ...             it is used for....

Means there should be only matched entry in the first columnin the output.

Guide me please if possible

Hi,

Try this one,

awk 'BEGIN{FS=OFS="\t";}NR==FNR{a[$0]=1;next;}{split($1,f," ");for(i=1;i<=length(f);i++){p=f;if(a[p]==1){print p,$2,$3,$4,$5;}}}' file1 file2

Assumptions:

  1. The field separator is tab(\t).
  2. The field length is fixed(5 fields).

Cheers,
Ranga :slight_smile:

Hi

Thanks for reply.

but this time output file is completely blank!:frowning:

but yeah, in the input second file there are more than 5 columns therefore, what I wanted is just write whatever is front of common match is present as it is and for sure in columns as input!

And, I checked in the previous output file there are not at all any columns rather entries of 5 columns are row wise..

and regarding tab seaparation entries are like this here each colur represent each column so in input file there are 8 columns.

so if FCGR2A is present in first file then output will be

hmmm seems complex!

Considering your inputs from post 1 this should work..

awk 'NR==FNR{X[$1]=$0;next}{n=split($1,P," ");sub($1,"",$0);for(i=1;i<=n;i++){if(X[P]){print P,$0}}}' file1 FS="  +" file2

If not, Please provide real inputs from your files.

pamu

Yes, it didnt wrk as output is just first file and

the sample which I provided is above is exactly from real file

shall I attach file?

let me know but its same as my sample provided above.

I don't know what you are trying.

Please check..

$ cat file1
F2
B2
CAD
KGM
HTC
CSP
$ cat file2
F2 XYZ CDT CAD          it is part of agriculture    it is part of university   it is part of ...             it is used for....

KGM HTC CSP      it is part of agriculture    it is part of university   it is part of ...             it is used for....
$ awk 'NR==FNR{X[$1]=$0;next}{n=split($1,P," ");sub($1,"",$0);for(i=1;i<=n;i++){if(X[P]){print P,$0}}}' file1 FS="  +" file2
F2           it is part of agriculture    it is part of university   it is part of ...             it is used for....
CAD           it is part of agriculture    it is part of university   it is part of ...             it is used for....
KGM       it is part of agriculture    it is part of university   it is part of ...             it is used for....
HTC       it is part of agriculture    it is part of university   it is part of ...             it is used for....
CSP       it is part of agriculture    it is part of university   it is part of ...             it is used for....

Is this what you want..?

I hope this helps:)

pamu

Hi

It seems some error

Finally I am attaching both BD(first) 1diseasedrug(second) files and output file(see)

Pleas check it

these are just part as the files are big and I also got one error

bash-3.2$ awk 'NR==FNR{X[$1]=$0;next}{n=split($1,P," ");sub($1,"",$0);for(i=1;i<=n;i++){if(X[P]){print P,$0}}}' BD FS="  +" diseasedrugbank >see
awk: (FILENAME=diseasedrugbank FNR=471) fatal: Unmatched ( or \(: /DRD2 ADRA1A  Droperidol      DHBP    DRD2 ADRA1A     The exact mechanism of action is unknown, however, droperidol causes a CNS depression at subcortical levels of the brain, midbrain, and brainstem reticular formation. It may antagonize the actions of glutamic acid within the extrapyramidal system. It may also inhibit cathecolamine receptors and the reuptake of neurotransmiters and has strong central antidopaminergic action and weak central anticholinergic action. It can also produce ganglionic blockade and reduced affective response. The main actions seem to stem from its potent Dopamine(2) receptor antagonism with minor antagonistic effects on alpha-1 adrenergic receptors as well.      A butyrophenone with general properties similar to those of haloperidol. It is used in conjunction with an opioid analgesic such as fentanyl to maintain the patient in a calm state of neuroleptanalgesia with indifference to surroundings but still able to cooperate with the surgeon. It is also used as a premedicant, as an antiemetic, and for the control of agitation in acute psychoses. (From Martindale, The Extra/
bash-3.2$ 

so output is not as expected in these files

From your sample file it looks like your file2 has no pattern to get the required result.

still try using "\t"

awk 'NR==FNR{X[$1]=$0;next}{n=split($1,P," ");sub($1,"",$0);for(i=1;i<=n;i++){if(X[P]){print P,$0}}}' file1 FS="\t" file2

Hi Pamu

This semed to be a good code I apllied to many other files and suddenly I realise it didnt wrok fo rmy many other files and my hard goes waste!

Becasuse it was matching with just first entry of second file

Kindly help me as I have to again run on all those files.

I have attachedone of those files

May be it has happened becuase second file contain first oclumn with entire separarated by comma.........

sorry for inconvenience.

Kindly guide me:o

I doesnt wrk even on these once

first file pHAMRGKBT2D

second file Pharmgkbdrugdisease3.txt :o:confused: