I have two file file1 and file2 respectively.
File1:
Protein1 Protein2
Streb.10G021010.1 : Streb.9G023710.1
Streb.9G019140.1 : Streb.7G013440.1
Streb.10G021010.1 : Streb.9G023710.1
Streb.2G015700.1 : Streb.9G023710.1
Streb.3G019820.1 : Streb.7G013440.1
Streb.3G008920.1 : Streb.1G025210.1
Streb.9G019140.1 : Streb.3G014030.1
Streb.1G034750.1 : Streb.9G009640.1
Streb.1G035920.1 : Streb.3G016240.1
Streb.2G040440.1 : Streb.7G013440.1
Streb.1G041180.1 : Streb.7G013440.1
Streb.2G035340.1 : Streb.10G024960.1
Streb.3G008920.1 : Streb.9G028230.1
Streb.1G040670.1 : Streb.2G014140.1
Streb.3G019820.1 : Streb.3G014030.1
Streb.1G000350.1 : Streb.2G032090.1
Streb.2G022000.1 : Streb.4G006020.1
Streb.10G022300.1 : Streb.9G018870.1
Streb.1G040670.1 : Streb.2G014140.1
File2:
|protein|domain|
|Streb.9G000290.1|PF00574.18|
|Streb.9G000290.1|PF01343.13|
|Streb.9G025660.1|PF00069.20|
|Streb.9G025660.1|PF07714.12|
|Streb.9G011140.1|PF00388.14|
|Streb.9G011140.1|PF00387.14|
|Streb.9G011140.1|PF00168.25|
|Streb.9G011140.1|PF09279.6|
|Streb.9G011140.1|PF03998.8|
|Streb.9G023250.1|PF13976.1|
|Streb.9G023250.1|PF00665.21|
|Streb.9G024400.1|PF03619.11|
|Streb.9G014700.1|PF05078.7|
|Streb.9G014700.1|PF12430.3|
|Streb.9G008200.1|PF01926.18|
|Streb.9G008200.1|PF02421.13|
|Streb.9G008200.1|PF08701.6|
|Streb.9G008200.1|PF00009.22|
|Streb.9G008200.1|PF03193.11|
Expected result:
I have to extract all the protein domain interaction for Protein1:protein2 with respect to protein1. protein-domain interaction may have more than one interaction for protein1. we have to extract all of them.i have use this awk command :
awk 'FNR==NR {a[$1]++; next} a[$1]' protein protein-domain > me.txt
final result has been shorted and protein1:protein2 interaction get disturb. i want the domain interaction with respect to protein1 without disturbing the protein protein interaction.
kindly help me through this.