Biological Data Management

From OLSUG Wiki

Contents

Biological Data Management

The standard relational database technology used by the discovery research in the pharmaceutical industry is Oracle. We shall discuss the challenges managing biological information.

Biological data management is the discipline of supporting biological scientists to capture, reduce and store their data. The term data management includes both mechanisms of ensuring data consistency and quality. In the biological sciences this is about inforcement of a consistant terminology for biological discipline. These ontologys must be enforced to ensure that data is reliably captured and can be effectively searched and mined.

Biological LIMS Software

A number of commercial solutions are available for biological data management: Leading products include:

These solutions are designed for routine testing such as HTS which work from a pre-defined SOP and are readily automated.

Designed for routine and non-routine work within the same platform.

Electronic Notebooks for Biologists

Many biologists work in less routine areas such as in vivo research. These experiments are much less routine and subject to change in dimensionality and process during the course of the study. Many pharmaceutical and biotechnology companies are exploring different approaches to address this problem. Some have invested in electronic laboratory notebooks (ELN) to protect the intellectual property generated during these studies. These capture the data in document form stored in the relational database with only a few meta tags such as time, ownership and project information to identify them. Such solutions then rely on text mining to explore the data. This approach has the advantage of protecting the intellectual property by moving data into a secure and audit checked environment where it cannot be tampered with or changed. However this approach fails to facilitate many of the questions that are fundamental to the research process.

A number of commercial ELN are available

These document stores will rapidly become huge in size and without a relational indexing scheme prove a challenge to mine effectively. Many are hoping that the semantic web will rescue this situation by providing effective mechanisms of extracting the relationships within the data that can be mined and searched with semantic web technology.

The Open source movement

BioRails in contrast uses established design patterns to approach this need for flexibility in experimentation. In contrast to the ELN approach it maintains a relational database of information which facilitates searching and reporting of data for the biologists and scales to the research volumes required by the industry.

image:Biorails_logo_2.0.png

BioRails is part of the Web2.0 movement, implemented in Ruby on Rails and making use of the Oracle database platform. BioRails takes a hybrid approach combining a relational approach to data management for structured data with a more ELN approach indexed using a text indexing approach. The link between structured and unstructured data allows traditional relational searches to link to unstructured content.

Unlike traditional solutions in this area it is open source and embraces a growing movement in IT towards service orientated offerings where customers pay for service rather than access to software.

Time will tell if the open source model will be adopted by discovery research organizations.

2009 update: The Enterprise edition of BioRails is now used in several top 20 pharmaceutical companies for all Biological research from early screening to late stage in vivo. The community take up of BioRails has yet to establish itself all known current usage is under the commercial license. So looks like discovery research organisations like open source and need want commercial support which is why the business model works.

Open source is a common model for the bioinformatics community and has been embraced from its inception. However this model remains essentially untried until now in the area of biological data management.

Other open source discovery informatics projects include:

  • Knime a modular data exploration platform

The open source model is increasing in popularity.