EU

News

Latest News about the project

CNIO’s Eurocancercoms project team: building systems for improved data integration & extraction across domains

Wednesday, July 07, 2010

Eurocancercoms partner, the Spanish National Cancer Research Centre (CNIO) is leading efforts to integrate and connect vastly diverse information to improve data access and extraction for a wide range of users: including clinicians; oncologists; pathologists; molecular biologists and genomics/systems biologists.

The initial phase of this project has focused on arranging bioinformatics, the science of collecting and analyzing complex biological data such as genetic codes. The group has been involved in organising bioinformatic infrastructures to facilitate data integration from the molecular to the clinical level.

The team at the CNIO has also focused on developing a database infrastructure, implementing information extraction (text mining) specific components, constructing a general web service oriented software infrastructure, and adapting a flexible representation interface with querying and user- provided annotation capacities.
Given the technically complex aspects of this project, research has initially focused on a small-scale pilot project as a test-bed: the annotation of molecular, pharmacological and clinical information on melanoma, building on a manually curated database – already available at the time (MMMP project published in Melanoma Res. 2008).

Key accomplishments to date include:
• The adaptation of a web interface with a user querying and annotation capacity to integrate methods and information from different sources in a simple and dynamic way. The main molecular/genomics repositories have also been integrated.

• The construction of web service-oriented systems for the extraction of information from text and /or biological databases. Part of this infrastructure has been implemented as a metaserver to be able to run a query across a set of text mining servers distributed worldwide.

• Compilation and integration of information extraction modules for protein/gene names, protein interactions, gene control relations. Other modules for the extraction of experimental methods, disease symptoms and chemical compounds are also being tested. Various methods for the extraction of clinical data are being assessed.

• The information and structure of data on melanoma related genes, drugs and medical trials are currently being analysed and categorised. Main features have already been extracted to seed the text mining approaches described above.

This work has been carried out by Pieroantonio Zocchi (visiting masters student) under the supervision of Martin Krallinger at the CNIO.

The MMMP database is openly accessible here 
See Mocellin S, Rossi CR. The Melanoma Molecular Map Project. Melanoma Res.2008 Jun;18(3):163-5.
We acknowledge the help of Simone Mocellin.

Next steps
The second phase of the project will focus on implementing the prototype of melanoma as a working (publishable) system potentially useful for various communities working in this specific domain. The specifications and main software will be extracted to apply to other additional areas (cancer types) and sources of information.