Last update: Sep 2005
     

We would appreciate it very much if you can send us an email with your enquiries that are not covered in the current FAQ.

1. What is a protein-protein contrast?
2. How did you construct BioContrasts database?
3. What is the definition for 'Contrast_OBJ' in a presupposed property?
4. What is a domain-domain contrast?
5. How can I download BioContrasts database?
6. How do I cite BioContrasts?
7. What is the statistics of BioContrasts database?

1. What is a protein-protein contrast?

A protein-protein contrast (PPC) is a contrast between two proteins. BioContrasts database contains contrasts between Swiss-Prot protein entries. Such contrast consists of three elements: a positive protein, a negative protein, and a biological activity. This contrast indicates that the positive protein but not the negative protein is involved in the activity. We call the proteins that are so contrasted, 'focused proteins', and the activity 'presupposed property'. For example, the contrast between eIF4A (a positive protein) and eIF4E (a negative protein) with respect to the event of binding to NAT1 indicates that eIF4A but not eIF4E binds to NAT1.

2. How did you construct BioContrasts database?

We have automatically extracted protein-protein contrasts by matching contrastive negation patterns such as 'A but not B' to sentences from MEDLINE abstracts. We have then cross-linked the focused protein names, which are matched to the variables A and B, to Swiss-Prot. We have identified the presupposed property of a contrast by extracting the rest of the sentence except the focused protein names.

3. What is the definition for 'Contrast_OBJ' in a presupposed property?

'CONTRAST_OBJ' in a presupposed property is a variable that can be instantiated by the positive protein but not by the negative protein. For example, the presupposed property "NAT1 binds CONTRAST_OBJECT" of the contrast between eIF4A (a positive protein) and eIF4E (a negative protein) indicates that NAT1 binds eIF4A but not eIF4E.

4. What is a domain-domain contrast?

A domain-domain contrast (DDC) is a contrast between two domain sets. It can be inferred from a protein-protein contrast. For example, if a protein with two domains {D1, D2} is contrasted with another protein with three domains {D2, D3, D4}, we can infer the contrast between two domain sets, i.e., {D1} and {D3, D4}, from the protein-protein contrast. Thus, a DDC is a pair of two subsets of domains that are not shared by the proteins that are so contrasted. Note that one domain set or both domain sets of a DDC can be null sets, i.e., {}. Furthermore, if a protein has both a parent domain and its child domain, a DDC from a contrast of this protein includes only the parent domain.


5. How can I download BioContrasts database?

You can download the database here. An example protein-protein contrast is shown below.

.BEGIN PROTEIN-PROTEIN CONTRAST // Beginning of a PPC
.POSITIVE SWISS-PROT ENTRY: GAP1_SCHPO // Swiss-Prot entry for the positive protein in the PPC
.NEGATIVE SWISS-PROT ENTRY: ARF1_SCHPO // Swiss-Prot entry for the negative protein in the PPC
.EVIDENCE START // Beginning of evidence(s) from MEDLINE abstract(s)
.PMID: 11422940 // MEDLINE ID
.INPUT SENTENCE: mSec12 promotes efficient guanine nucleotide exchange on Sar1, but not Arf1 or Rab GTPases.
// The sentence in the abstract from which the PPC is extracted
.POSITIVE PROTEIN NAME: sar1 // The name of the positive protein in the sentence
.NEGATIVE PROTEIN NAME: arf1 // The name of the negative protein in the sentence
.PRESUPPOSED PROPERTY: // The presupposed property of the PPC
.EVIDENCE END // End of evidences (note that a PPC may have multiple evidences)
.SHARED GO CODES: // The Gene Ontology codes that the focused proteins share
.SHARED KEGG PATHWAYS: // The KEGG pathway IDs that the focused proteins share
.SHARED INTERPRO DOMAINS: // The InterPro domain IDs that the focused proteins share
.DOMAIN-DOMAIN CONTRAST: IPR008936 IPR011575 vs. IPR001806 IPR006689
// The DDC that is inferred from the PPC
.END PROTEIN-PROTEIN CONTRAST // End of the PPC

6. How do I cite BioContrasts?

Jung-jae Kim, Zhuo Zhang, Jong C. Park, and See-Kiong Ng, BioContrasts: Extracting and Exploiting Protein-Protein Contrastive Relations from Biomedical Literature, Bioinformatics Advance Access published December 20, 2005. (pdf)

Jung-jae Kim and Jong C. Park, Extracting Contrastive Information from Negation Patterns in Biomedical Literature, ACM Transactions on Asian Information Processing, 2005. (to appear)

7. What is the statistics of BioContrasts database?

Proteins in BioContrast

  • are linked to 14,083 entries from release 46 of Swiss-Prot (total 181,571, entries)
  • contain 1,982 human proteins (14.1%)

    Protein-Protein Contrasts (PPCs)

  • 41,471 unique PPCs extracted from MEDLINE abstracts
    • 16,601 PPCs between proteins from the same species (ContrastA)
    • 24,857 PPCs between homologous proteins from different species (ContrastB)
  • The PPCs of ContrastA show sequential and/or functional similarities between focused proteins
    • sharing of InterPro domains: 7,474 (45.0%)
      • sharing all the domains: 5,015 (30.2%)
    • sharing of Gene Ontology codes: 3,194 (19.2%)
    • homology in sequence (NCBI BLASTP with E-value e-10): 1,720 (10.4%)
    • belonging to the same KEGG pathway: 905 (5.4%)
    • binding to the same protein (BIND, DIP): 517 (3.1%)
    • binding to each other (BIND, DIP): 220 (1.3%)

Domain-Domain Contrasts (DDCs)

  • 3,332 unique DDCs deduced from ContrastA
    • 1,756 DDCs are deduced from two or more PPCs
  • 1,732,869 candidate PPCs inferred from the 1,756 DDCs

copyright © I2R & KAIST

NLP & CL Lab Korea Advanced Institute of Science and Technology (KAIST), South Korea
Knowledge Discovery Department Institute for Infocomm Research(I2R), Singapore