Project Description

Introduction

In 2010 the Human Proteome Organisation launched the Human Proteome Project (HPP), aimed at cataloguing the protein information arising from the plethora of worldwide proteomic based studies. To support complete coverage, one arm of the project will take a gene- or chromosomal-centric strategy (C-HPP). The approach to dividing labour in this international effort has been to assign each of the 24 human chromosomes to one or more countries. In this scheme, the Australian/New Zealand consortium has been assigned Chromosome 7, as this chromosome contains various genetic markers associated with diseases relevant to the Australian population.

Despite multiple large international biological databases housing genomic and protein data, there is currently no single system that integrates up-to-date pertinent information from each of these data repositories and assembles the information into a format suitable for a global proteomics effort of the type proposed by the C-HPP.

We have undertaken to produce a data integration and analysis software system for the C-HPP effort and to make data collections from this resource discoverable through ANDS's Research Data Australia. Whilst the software is being designed to be ultimately species and chromosome independent, the initial focus is on the development of a resource for Human Chromosome 7.

During the first phase, up to 4 widely used data sources are being integrated into a web browser interface designed to display an overview of the current evidence supporting the identification of various gene products across the chromosome, such as protein expression, modification and disease association, with the ability to drill down to the original data. Our design allows for easy addition of both new data sources and data categories which would be incorporated in the second phase.

The Proteome Browser will assist Australian and International efforts in completing a map of the Human Proteome. The mapping of the human proteome, even a partial mapping, will help elucidate biological and molecular function and provide advance diagnosis and treatment of diseases. It’s approach may also be applied to other animals and plants.


The Purpose (Mission)

The goal of this project is to develop a comprehensive data integration and analysis software tool that provides a snapshot of our current proteomic knowledge and will ultimately assist in analysing biological function, and the study of human disease.

In conjunction with the Proteomics community, the Proteome Browser team will develop an analysis tool which will integrate protein data from a number of source systems. It will provide a traffic light representation of proteomic data, where the X axis will relate to each gene (i.e. protein) ordered by default in the order found on the Chromosome, and the Y axis relates to the types of evidence that support the identification of proteins.  

The  traffic light system will be used to indicate cases where different types of data exists or does not exist for a particular protein. Various aspects of the underlying contributing information is envisaged to be available for further analyses using clustering and drill down/through capabilities. The screen will be dynamic showing relevant information by providing the ability to filter and will provide links to external systems where appropriate.

Produce an “easy to use” data integration and analysis software system for the Australia/New Zealand HPP effort, and to make data collections from these resource discoverable through ANDS’s Research Data Australia

  • The Proteome Browser project will implement Phase 1 in 2012 with Phase 2 completing in 2013.
  • The outcome will be an online tool for the international research community.
  • HUPO endorses this initiaitve.

These are the targets we want to meet

  • Integrate up to 4 data sources into the web interface during Phase 1
  • Integrate potentially 4 additional data sources during Phase 2
  • Register data collections from these sources, and associated parties, activities and services, with ANDS' Research Data Australia
  • Build an extensible platform that can be extended to potentially include other chromosomes and organisms

This is how our community will gain.

  • Assist Australian/NZ and International efforts in completing a map of the Human Proteome
  • Map of the human proteome, even a partial one, will help elucidate biological and molecular function and advance diagnosis and treatment of diseases
  • Approach may also be applied to other animals and plants

Approach

  • Development of the solution is being led by the Australian/NZ Proteomics community
  • International engagement and consultation is being sought
  • Solution will be owned by the international Proteomics community, not an organisation
  • Solution will be designed and developed in a way that it is relatively easy to install by research institutions
  • Developed software will be open source

Deliverables

Phase Deliverables Measure of success
Phase 1
  • Integration of 4 proteomic data sources
  • Traffic Light Matrix
  • Registration of sample RIF-CS collections with RDA
- Acceptance of the release by the Product Owners
- Majority acceptance by steering committee at the Steering Committee meeting in October
- Acceptance by ANDS of the sample collections
Phase 2
  • Integration of 4 more proteomics data sources
  • Updated Traffic Light Matrix for the 4 additional sources
  • Registration of data collections with RDA
  • Delivery of Demonstration of Value messages by the Research Champions
- Acceptance of the release by the Product Owners
- Majority acceptance by steering committee at the Steering Committee meeting in December
- Acceptance by ANDS of the collections and the final progress report
- Successful delivery of a value message to the Australian HPP consortium and Proteomics Australia by the Research Champion(s)

Project Constraints

  • Only data sources that are in the public domain or those that provide re-use licenses will be used
    How the constraint will be managed: Wherever feasible public domain (i.e. no constraints) proteomics data sources will be used. If a data source requires a re-use license this will be obtained as soon as possible.
Comments