- the design of easily transferable data formats for different proteomics technologies
- explorative data analysis and normalization for a core set of experimental methods
- the installation of the modelling and simulation resource PyBioS for the analysis of in silico experiments and dynamical characterization of biological models
We will implement a three level approach with defined interfaces:
1.) Standardization and normalization of experimental data
The rapid development and increasing complexity of proteomics techniques lead to a large amount of semi-structured high-throughput data. Data integration requires standardization of experimental techniques and the development of a common descriptive language for the heterogeneous data formats. In order to handle this situation we will develop an XML-based data format that covers an initial stock of experiments and serves as a common idiom for the import of high-throughput data into the analysis platform. Where possible we will adopt existing international standards. On this first level of data integration there will be tools for the normalization and grouping of homogenous experimental data. Consistency of the primary data will be controlled on the experimental level. Normalization requires the systematic identification of the influence factors that determine the experimental outcome and the elimination of the technical bias.
2.) Correlation and integration rules
Quality control tools will control the consistency and redundancy of the imported data. On this level heterogeneous data types will be correlated based on predefined integration rules. These rules will be iteratively extended and improved during the time of the project. The secondary data will be transformed using the integration rules into condensed higher structured data types combining different primary data types.
3.) Modeling and simulation
On the third data integration level consistency of the different secondary data structures will be controlled. The data structures will be integrated into biological processes, for example signalling pathways or regulatory networks. These objects will be either predefined or the result of the simulation and modeling approaches of the platform. Here, we incorporate our modeling and simulation platform, PyBioS, developed in the course of NGFN-1.
The sub-project will be closely related to the experimental sub-projects through the analysis and modeling of primary data, in particular with projects 2.2, 3.1, 3.2 and 3.3.