Logo BMBF
 
Logo NGFN
Home
_


2.5 Functional Genome Annotation

Responsible: Dr. Tobias Doerks, EMBL, Heidelberg.

Background:

Complex eukaryotic genomes present severe drawbacks with respect to experimental approaches. Information transfer from other model organisms to the Human thus provides an attractive study approach, at least for conserved functional modules. Here we intend to establish a framework for primary annotations using homology-based methods for gene function annotation and to combine advanced protein interaction prediction methods as well as benchmarked information transfer protocols to include manually curated information of cellular function in several model organisms. The framework will be combined with other information produced within this SMP and offered to the NGFN-2 community.

Planned Work:

Our primary goal is to develop an annotation/quantitative prediction system for protein interactions and functional modules of human genes. This implies the implementation of data structures that can cope with complex network issues as well as the development of methods to predict protein networks and functional modulesof Human genes. Moreover, it requires the establishment of pipelines for gene annotation beyond current standards (e.g. by utilising orthology and synteny information within metazoan). Integrated data sets, established by other groups in this SMP will be normalised and added to the resource. The following goals are envisioned:

  1. Establishment of orthology and synteny identification pipelines. This should allow the mapping of human genes onto a number of important metazoan model organisms. It requires the identification of pseudogenes that hamper orthology identification as well as the identification of recent duplications (inparalogs) in all model organisms included.
  2. Generation of synonyms for gene names in human and several model organisms. As most of the large-scale datasets on protein interactions or other functional features of proteins (e.g. expression, localisation etc) often refer to different annotation systems and most of the literature on the genes only points to gene names that differ from species to species, an extensive mapping task has to be performed that helps to connect human annotation schemes to the supporting literature and to its orthologs in other species.
  3. Normalisation of data on protein interactions and other functional associations. A unique scoring system has to be developed to enable the integration of heterogenous data e.g. from complex purification, co-expression, yeast two hybrid or computational predictions of different organisms. For this task, several benchmark schemes have to be developed.
  4. Development of protein interaction prediction methods. The groups developed previously methods to predict functional associations of genes in prokaryotes. Here we want to extend such genomic context analysis to eukaryotes and eventually to human.
  5. Identification of functional modules, and development of relevant data structures. As pairwise interactions can be combined to networks, methods for the extraction of relevant functional modules have to be developed (see preliminary work in Von Mering et al., 2003; Tornow and Mewes, 2003). This implies data structures that enable visualising different networks, derived by different methods and allowing users to handle modularity within such networks.
  6. Integration of customised datasets from other SMP partners or NGFN2 participants. These data can be normalised using the established pipeline and for which the context on predicted functional associations can be derived.