Logo BMBF
 
Logo NGFN
Home
_
Quality Management "Generation of Open Reading Frame (ORF) Resources"

Quality management coordinator:
Dr. Stefan Wiemann
German Cancer Research Center (DKFZ)
Div. of Molecular Genome Analysis
Phone: + 49 6221 424702
E-Mail:
s.wiemann@dkfz.de
 
Project leader:
Dr. Ruth Wellenreuther
German Cancer Research Center (DKFZ)
Div. of Molecular Genome Analysis
Phone: + 49 6221 424767
E-Mail:
r.wellenreuther@dkfz.de


The cloning of Open Reading frames (ORFs) has become a high throughput operation and necessitates the transfer of reliable information and high-quality resources between the different partners. Therefore, a tight quality control of processes and material is one of the most relevant issues in the process of generating ORFs. In order to achieve and maintain high throughput and quality standards in the ORF-clone resources, our activities of quality management and standardization are structured according to three main goals:
  • Quality control of molecular biological resources
  • Establishment and optimization of standardized protocols
  • Contribution to the Quality Management initiatives within the NGFN and beyond in order to establish common standards that shall increase the quality and reliability of data and the comparability of deduced information.
     

    Quality Control of Open Reading Frame (ORF) Resources

    The SMP-Cell follows a systematic strategy towards the identification and functional validation of novel genes with the aim to validate proteins of clinical relevance and to select targets for diagnostics and therapy.

    ORFs and splice variants are primarily amplified via RT-PCR from RNA of either cell lines or tissues that have been identified to express the desired ORF/splice variant. In addition, the cDNA clones that were generated within the German Human Genome Project (DHGP) and in the German National Genome Research Network (NGFN) are a rich source for potential full-ORF cDNAs that are utilized whenever possible as templates for amplification (Fig. 1). 

    The PCR-amplified ORFs are recombined into the entry vector with help of a “clonase” of the Invitrogen’s Gateway® cloning system (1). The SOPs are freely available (see below and at the SMP-Cell web site http://www.smp-cell.org/).

    Prior to shuttling into expression vectors, all entry clones are sequence verified. Sequencing of entry clones is an essential step in the cloning process to provide the experimental projects only with quality controlled and standardized expression constructs. Only the sequencing process could help to rule out errors that could have been introduced in the ORFs amplification and cloning processes, where frame shift mutations are the most deleterious. This type of mutation renders the respective clones not useful for downstream experimentation and necessitates re-cloning of the respective ORF.However base substitutions could as well have dramatic effects on the functionality of the encoded proteins and at least require detection and proper annotation.

    Sequence validation of ORF clones is performed at AGOWA (Dagmar Heubner), DKFZ (Stefan Wiemann), GBF (Helmut Blöcker), Medigenomix (Birgit Ottenwälder), Qiagen (André Bahr) and the University of Düsseldorf (Karl Köhrer). Results from sequencing feed back into the ORF cloning project, as for instance error rates of DNA polymerases can be evaluated quantitatively and the optimal enzymes selected for amplification (see below)

    Thus far we have sequenced more than 1,200 entry clones, another 1,500 ORF clones will be generated and sequence validated within the second funding phase of the National Genome Research Network.

    Only entry clones that have passed the quality control check are used for shuttling into expression vectors and feeding into cellular assays (Wiemann S et al. From ORFeome to biology). ORFs are subcloned into both, N- and C-terminally tagged GFP expression vectors because subcellular localization studies in mammalian cells have shown that both fusion constructs are required in order to unambiguously determine the subcellular localization of many proteins (Fig. 3).

    Expression clones are quality controlled by restriction digest for the presence of ORFs of the expected size. Over 6,500 different expression constructs have thus far been generated. All expression clones are stored as shown in Fig. 2, plate and filter coordinates are kept in the LIFEdb database for the tracking of individual clones.

    This cloning schema has been standardized and has been adopted by all partners within SMP-Cell. The standard protocols are freely available (see below and at the SMP-Cell web site http://www.smp-cell.org/).


    Standardized information flow and LIMS database

    The ORF cloning process is distributed among five partners of SMP-Cell. The DKFZ delivers gene models and, when available, matching templates to the partners. There the ORF cloning takes place, and entry clones and accompanying information are returned (Fig. 2). This requires a defined dataflow between these partners and the DKFZ, where all clones are maintained. 

    We use a Laboratory Information Management System (LIMS) that allows to follow the workflow and to observe the status of the respective ORFs at all levels of processing. This application is tailored for remote use by the different partners for the distributed cloning processes. Thus, Bench-workers are enabled to trace every ORF and any construct that has been produced, and to decide on the next applicable steps that need to be performed to obtain a final product (i.e. entry clone). Any data from cellular functional gene analysis can be related to the original clones and sequences it was based upon. Therefore, the exchange and cross-platform comparison of information that has been obtained by collaboration partners is facilitated.
    (Fig. 1). A standardized nomenclature has proven to be essential for the unambiguous tracing of resources and
    data from subsequent application of clones in cellular functional gene analysis. The LIMS generates PCR IDs, entry clone IDs, expression clone IDs, and automatically assembles 96-well plates for sequence validation and for further processing (Fig. 2). Thus, a standardized nomenclature is maintained during all steps of the cloning process. 

    All generated Data (experimental date, bioinformatics data) are integrated in a central database and complemented by relevant external data. Using the web-interface of the LIFEdb database (2) at http://www.lifedb.de/, this information is made publicly available. The database serves as a key component for data integration and dissemination in SMP-Cell. The information is queryable across the different databases and comprises:

    • annotation of genes and splice variants, ORFs, and cDNAs. After cloning is completed results-tables are sent back by the partners to the central database, again using XML documents.
    • data from ORF cloning and validation and the administration.
    • experimental results from cellular functional gene analysis, i.e. protein localization and cellular assays.
    • results from automated bioinformatic protein analyses (see PCE-S19T08)
    • data from NCBI (Gene, UniGene), EBI (IPI, GO) and SIB (Swiss-Prot)

    The data are uploaded by a specialized client-server application called SCISSORS. The software application SCISSORS serves two purposes and it consequently has two interfaces. Firstly, it serves as tracking database and administration tool for material and products. Secondly, this application is able to export data in XML files to transfer information from the central database to the collaboration partners and back. Thus, the SCISSORS application enables a user friendly selection of ORFs for processing by the different partners. After cloning is completed results-tables are sent back by the partners to the central database, again using XML documents.


    Researchers worldwide make use of the resources provided by the German cDNA Consortium (see http://www.genome.org/cgi/content/full/11/3/422 for citations of Reference Wiemann S et al. Towards a Catalogue). Especially the partners within SMP-Cell utilize the resources in large scale protein localization (5), high-throughput cellular functional gene analysis (6, 7), and increasingly in single gene analysis and annotation (8-10). Thus far the initial cDNAs required amplification and subcloning to produce Gateway entry clones, now the SMP-Cell generates such entry clones directly. The increase in the throughput of cloning and sequence validation has already enhanced the potential of the experimental projects to screen a larger number of proteins for their disease relevance, and to identify candidate genes and proteins for further analysis.



    Establishment and Optimization of Standards and Standard Protocols

    Several parameters required standardization due to the decentralized generation of the ORF resources and subsequent exploitation in cell-based experiments (Fig. 1):

  • All entry clones and other material are named following a common nomenclature which allows for the tracking of any data that is collected in cellular functional gene analysis back to the ORF and clone they derived from

  • The clone-IDs are automatically generated to avoid possible errors from manual typing. This is done with help of the LIFEdb LIMS system (Bannasch D et al. LIFEdb: A database for functional genomic....). This system ensures that all material receives unique identifiers and thus circumvents inconsistencies

  • The quality-controlled entry clones are centrally collected and maintained in the repository of the DKFZ where they are replicated and stored in 96-well plates and on IsoCode® cards (Schleicher & Schuell) (Fig. 2). Stock keeping is mirrored in silico, again through the LIFEdb database system (Bannasch D et al. LIFEdb: A database for functional genomic....). This system currently contains 3,500 entry clones from 1,400 different ORFs, most are sequenced to completion. The entry-clone resource is provided to the partners within SMP-cell and to collaborating SMPs and KGs. In addition, the clones are commercialized through the RZPD. 

    We are continually optimizing protocols to match the often changing needs within cooperative projects and to take reference to improved technical knowledge, tools, and methodologies.
    One illustrating example:

    The fidelity of the DNA polymerase is a key parameter for the successful cloning of intact ORFs. PCR-errors are among the major obstacles within the ORF cloning process and continuous effort is invested to keep their numbers low. With a growing number of high-fidelity DNA polymerases available, we conducted an evaluation of the fidelity and processivity of different such enzymes. We tested 20 high-fidelity DNA polymerases and polymerase mixes using the LacI assay (5). The results of this test were used to optimize the fidelity of the PCR step in our cloning procedure, while maintaining reasonable yield and cost as was required in our high-throughput experimental set-up.

    Standardized protocols and standards on nomenclature and plasmid expression vectors have been established to standardize the cloning process (see blow and see the SMP-Cell web-page http://www.smp-cell.org/).  


    ORF cloning

    1. ORF amplification from cloned material (i.e. cDNA)

  • 2. Cloning of PCR products (ORFs)

    • BP reaction (recombination of PCR products into entry vector)
    • LR reaction (recombination of ORFs from entry into expression vectors)

    Click here to see previous versions of protocols

    Authors involved in the development of standard protocols:
    Dr. Stefanie Bechtel, Dr. Ruth Wellenreuther, Dr. Stefan Wiemann (German Cancer Research Center (DKFZ), Molecular Genome Analysis,
    Functional profiling)

    A discussion forum has been implemented in the SMP-Cell intranet pages, where relevant issues on SOPs and quality control are posted and discussed.

    Establishment of common standards that shall increase the quality and reliability of data and the comparability of deduced information.



  • QM&S initiatives
    A consensus on standards has to be reached between the participating groups and projects as a prerequisite for the development of standardized nomenclatures, molecular biological material, and protocols. Common quality criteria for material need to be established and SOPs have to be worked out and adopted. We actively participate in this process within the NGFN and beyond (Fig. 3), as the tight quality control of processes and material is one of the most relevant issues in SMP-Cell, where the production and exploitation of resources is conducted at different locations. The continuous tracking of material and data needs to be realized for the collection of relevant and reproducible data and information.


    Outlook
    QM&S will be of increasing importance within the NGFN, and for genomic research in general. The cost efficient generation and sharing of standardized, high-quality material, and the integration of data from different functional genomic projects necessitate comparability of experimental conditions and data formats. Standardization of assay descriptions and data formats will be the focus of our work in the foreseeable future. SOPs as well as the emerging standards for genomics experiments will continue to be published In the web-pages of the QM&S project of SMP-Cell. Outlook
    With the switch from random sampling of cDNA libraries to the directed modelling, cloning, and sequence validation of ORF resources, the German cDNA Consortium has again taken a leading role in the international effort to provide the community with sequence validated full-length cDNA resources. Use of the Gateway system and the principle of selecting specifically the protein coding regions for cloning has greatly fostered the immediate applicability of these cDNAs. Sequence validation of all representative clones for every gene and splice variant, cloned in open and closed forms, generates a highly valuable resource for functional genomics. During NGFN-2 the sequencing capacities of the German cDNA Consortium will guarantee a tight QC of resources and lay the ground for a successful exploitation of the clone resources in SMP-Cell and beyond. The continuously extended physical resource of sequence-verified entry clones that is generated by the joint effort of the German cDNA Consortium in NGFN-2 is centrally stored, maintained and distributed by a repository at the DKFZ. Any new ORFs are centrally subcloned into expression vectors to extend the resource of expression clones for use in the functional analysis pipeline within SMP cell. In collaborative efforts with other SMPs and with KGs a growing number of genes and ORFs is analyzed, where this project again provides the resources for immediate functional exploitation. Once candidate proteins have been identified in cellular functional gene analysis, the detailed functional characterization requires further expression constructs to be generated in order to allow for the expression of the proteins e.g. under control of another promoter, without a protein tag, or in an different expression system. In this line further expression vectors will be developed according to the desired application, and the respective ORFs are subcloned into these vectors.
    Lit.: 1. Bannasch D et al. LIFEdb: A database for functional genomics experiments integrating information from external sources, and serving as a sample tracking system. Nucleic Acids Res. 2004 32(1):D505-8. 2. Boeckmann B et al. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucl Acids Res. 2003 Jan 1;31(1):D365-70. 3. Maglott D et al. Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res. 2005 Jan 1;33(1):D54-8. 4. Maglott D et al. Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res. 2005 Jan 1;33(1):D54-8. 5. Rodriguez-Tome P. EBI databases and services. Mol Biotechnol. 2001 Jul;18(3):199-212. 6. Wiemann S et al. From ORFeome to biology: a functional genomics pipeline. Genome Res. 2004 14(10b):2136-44. 7. Arlt DH et al. Functional profiling: from microarrays via cell-based assays to novel tumor relevant modulators of the cell cycle. Cancer Res. 2005 65(17):in press.









  • Lit.: 1. Wiemann S et al. Toward a catalog of human genes and proteins: sequencing and analysis of 500 novel complete protein coding human cDNAs. Genome Res. 2001 Mar;11(3):422-35. 2. Wiemann S et al. From ORFeome to biology: a functional genomics pipeline. Genome Res. 2004 Oct;14(10B):2136-44. 3. Wiemann S et al. The German cDNA network: cDNAs, functional genomics and proteomics. J Struct Funct Genomics. 2003 4(2-3):87-96. 4. Bannasch D et al. LIFEdb: a database for functional genomics experiments integrating information from external sources, and serving as a sample tracking system. Nucleic Acids Res. 2004 Jan 1;32:D505-8. 5. Arlt D et al. Functional profiling: from microarrays via cell-based assays to novel tumor relevant modulators of the cell cycle. Cancer Res. 2005 in press. 6. Starkuviene V et al. High-content screening microscopy identifies novel proteins with a putative role in secretory membrane traffic. Genome Res. 2004 Oct;14(10A):1948-56.
    Lit.: 1. Wiemann S et al. Toward a Catalog of Human Genes and Proteins: Sequencing and Analysis of 500 Novel Complete Protein Coding Human cDNAs. Genome Res. 2001 Mar;11(3):422-35. 2. Wiemann S et al. The German cDNA Network: cDNAs, functional genomics and proteomics. Journal of Structural and Functional Genomics. 2003 4(2-3):87-96. 3. Wiemann S et al. cDNAs in functional genomics and proteomics: The German cDNA Consortium. CRBiologies. 2003 326:1003-9. 4. Wellenreuther R et al. SMART amplification combined with cDNA size fractionation in order to obtain large full-length clones. BMC Genomics. 2004 5:36. 5. Simpson JC et al. Systematic subcellular localization of novel proteins identified by large scale cDNA sequencing. EMBO Rep. 2000 1(3):287-92. 6. Starkuviene V et al. High-content screening microscopy identifies novel proteins with a putative role in secretory membrane traffic. Genome Res. 2004 Oct;14(10):1948-56. 7. Arlt DH et al. Functional profiling: from microarrays via cell-based assays to novel tumor relevant modulators of the cell cycle. Cancer Res. 2005 65(17):in press. 8. Neubrand VE et al. Gamma-BAR, a novel AP-1 interacting protein involved in post-Golgi trafficking. EMBO J. 2005 24:1122-33. 9. Fleischer S et al. PML-Associated Repressor of Transcription (PAROT), a Novel KRAB-ZINC Finger Repressor is Regulated through Association with PML Nuclear Bodies. J Biol Chem. 2005:submitted. 10. Wiemann S et al. Alternative pre-mRNA processing regulates cell-type specific expression of the IL4l1 and NUP62 genes. BMC Biol. 2005 Jul 19;3(1):16. 11. Imanishi T et al. Integrative Annotation of 21,037 Human Genes Validated by Full-Length cDNA Clones. PLoS Biol. 2004 Apr;2(6):856-75.


  • Lit.: 1. Simpson JC et al. Systematic subcellular localization of novel proteins identified by large scale cDNA sequencing. EMBO Rep. 2000 1(3):287-92.
    2. Hartley JL et al. DNA cloning using in vitro site-specific recombination. Genome Res. 2000 Mar;10(11):1788-95. 3. Wiemann S et al. From ORFeome to biology: a functional genomics pipeline. Genome Res. 2004 14(10b):2136-44. 4. Wiemann S et al. Toward a Catalog of Human Genes and Proteins: Sequencing and Analysis of 500 Novel Complete Protein Coding Human cDNAs. Genome Res. 2001 Mar;11(3):422-35. 5. Bannasch D et al. LIFEdb: A database for functional genomics experiments integrating information from external sources, and serving as a sample tracking system. Nucleic Acids Res. 2004 32(1):D505-8.


    Website of the SMP Cell involved in this NGFN quality management project: http://www.dkfz.de/smp-cell/cell.org/

    Homepage
  •