Oracle Life Science User Group Conference 2005

May 16-17, Boston MA


EVENTS
Full Schedule PDF
Keynotes
 M 1:30P  Knowledge Architectures for Pharmaceutical R&D Jim Golden, CTO, SAIC Life Science Office
 T 8:45A  The Cancer Biomedical Informatics Grid (caBIG) John Speakman, Memorial Sloan-Kettering Cancer Cancer
Presentations
 M 8:30A  OLSUG Meeting Introduction John Burke, Data Mgmt. Consultant
 M 8:45A  Oracle’s Platform for Life Sciences and Preview of 10g Release 2 New Features Charlie Berger, Sr. Dir. Prod. Mgmt., Life Sciences and Data Mining, Oracle
 M 9:45A  From Mainframe to the “Grid” – a Real World Experience Marcus Collins, Applera Corporation
 M 10:15A  Database storage of multiple conformations of molecular structures, application to structure determination, molecular docking and dynamics simulations Mark Forster, Syngenta R&D/IS, Jealott's Hill, UK.
 M 11:05A  Oracle's Solutions for Systems Biology Susie Stephens, Principal Product Mgr., Life Sciences, Oracle
 M 11:35A  Biological Traversal Engine (BTE) Shunguang Wang, BG Medicine
 M 12:05P  ISV Lightning Round 1 12:05P EKM Corporation
12:12P BioPoint Solutions
12:19P Inforsense
12:26P SciTegic/Accelrys
12:33P Insightful
12:40P Tom Sawyer Software
 M 2:15P  Maintaining Security and Identity Management in Life Sciences Roger Sullivan, VP Business Development Identity Management, Oracle
 M 2:45P  Rapid Application Development using InforSense Open Workflow and Oracle Chemistry Cartridge Technologies Anthony C. Arvanites, Cambria Biosci.
 M 3:15P  Clustering of Protein Space Using Oracle10g Gyorgy Babnigg and Carol S. Giometti, Argonne National Laboratory
 M 4:05P  PhiMS: An Extensible Phenotypic Management System for the Electronic Capture and Storage of CRF Data in the Non-Profit Environment Brent Richter, Channing Labs
 M 4:35P  Electronic Lab Notebooks in the Real World: A LABTrack case study documenting productivity gains & the functionality needed to be successful. Richard Stember, CEO, Scientific Div., EKM Corporation
 T 9:45A  Life Science Examples Using Oracle at SDSC Joshua Li and Shankar Subramaniam, SDSC/UCSD
 T 10:15A  Managing a Complex Migration Project – A Standards Based Approach Marcus Collins, Applera Corporation
 T 11:05A  Transforming a DBA group into a Database Center of Excellence - A Case Study Danielle Fleming, Vice President, TeraDBA Consulting
 T 11:35A  TeraGenomics - Migration of VLDB from Teradata to Oracle Eva Mitter, Development & Operations Manager, IMC
 T 12:05P  ISV Lightning Round 2 12:05P Waters Corporation
12:12P Sun Microsystems
12:19P CambridgeSoft
12:26P Thermo Electron
12:33P Symyx IntelliChem, Inc.
12:40P Applied Biosystems
 T 1:30P  Data warehouse development for integrative biomedical informatics research Hai Hu, Dir. Biomedical Informatics,
Michael Liebman, CSO, Windber Research Institute
 T 2:00P  Managing Research Collaboration with Oracle Collaboration Suite Workspaces Alok Srivastava, Sr. Group Mgr, Oracle
 T 2:30P  Enhanced Reporting for Oracle Clinical and Oracle AERS Prasad Inampudi, Oracle
Workshops
 M 9:45A  Text Mining with Oracle Raf Podowski, Sr. Prod. Mgr., Life Sciences, Oracle
 M 9:45A  BLAST and Regular Expression Searches John Burke, Data Management Consultant
Susie Stephens, Principal Prod. Mgr., Life Sciences, Oracle,
Shiby Thomas, Principal Member of Tech. Staff, Oracle
 M 11:05A  HTML DB Richard Landry, Oracle
 M 2:15P  Network Data Model for Biological Pathways Jack Wang, Principal Member of Tech. Staff, Oracle,
Ning An, Sr. Member of Tech. Staff, Oracle,
Brendan Madden, CEO, Tom Sawyer,
Susie Stephens, Principal Prod. Mgr., Life Sciences, Oracle
 M 2:15P  Managing Life Science and Medical Image Repositories Using Oracle interMedia 10g Release 2 Melliyal Annamalai, Principal Member of Tech. Staff, Oracle
 T 10:00A  Oracle Data Mining Carolyn Hamm, Walter Reed Medical Center,
Charlie Berger, Sr. Dir. Prod. Mgmt., Oracle
 T 10:00A  RDF Data Model for the Semantic Web Mike DiLascio, Senior Director, Siderean,
Souri Das, Consulting Member Tech. Staff, Oracle,
Susie Stephens, Principal Prod. Mgr., Life Sciences, Oracle
 T 1:30P  Statistical analysis of gene expression data with Oracle & “R” Pat Hoffman, Sr. Principal Analytical Consultant, Oracle,
Raf Podowski, Sr. Prod. Mgr., Life Sciences, Oracle
 T 1:30P  Loading Life Sciences Data into Oracle Database 10g Ellen Batbouta, Consulting Member of Tech. Staff, Oracle,
Paul Narth, Senior Group Manager, Warehouse Builder Prod. Mgmt., Oracle,
Susie Stephens, Principal Prod. Mgr., Life Sciences, Oracle,
Ray Swonger, Director Software Dev., Oracle
ABSTRACTS
Keynotes

Knowledge Architectures for Pharmaceutical R&D

There are several well-known and pressing needs within pharmaceutical discovery:

  • A decrease in pharmaceutical R&D productivity and the need to reduce compound attrition in drug discovery and development.
  • The need to realize the value of previous investments in new technologies such as genomics, proteomics and systems biology.
  • The need to make sense of the increasing volume of research data and to access and integrate information across internal silos and "data tombs".
  • The need to connect and make sense of information across R&D business units such as target biology, compound discovery, and clinical trials. This includes the need for pharmacovigilance and safety signal detection systems throughout the entire pharma value chain.
  • The need to share and protect IP and knowledge across alliances with other pharmaceutical companies, biotechnology companies and academic labs.

The goal of any IT system within pharma is to enable innovation or enhance productivity. Most internal initiatives in informatics and knowledge management have not yielded an overall IT architecture that enables hypothesis-driven drug discovery, allows researchers to make sense of all the experimental data within the R&D organization, identifies latent IP stored in data warehouses, and enables signal detection and communication across business units.

What is needed is a Knowledge Architecture for Pharmaceutical R&D - a blueprint for a new kind of IT architecture that enables high level reasoning, semantic integration and inference, and alliance and knowledge management across pharmaceutical R&D. By creating a knowledge model of pharmaceutical R&D and a supporting IT architecture, biopharmaceutical companies can enable innovation, enhance productivity and increase safety throughout the drug discovery, development and clinical trial processes. In this presentation I will talk about what this blueprint might look like, discuss the technologies needed to build these systems, and give an overview of how such an architecture would impact the R&D process.



The Cancer Biomedical Informatics Grid (caBIG)

The National Cancer Institute is engaging the community of sixty NCI designated Cancer Centers in the United States in an ambitious program to collaboratively build of a the Cancer Biomedical Informatics Grid (caBIG). caBIG is intended to be a virtual web of interconnected data, individuals, and organizations that will redefine how research is conducted, care is provided, and patients/participants interact with the biomedical research enterprise.

caBIG will consist of:

  • a common, widely distributed infrastructure to permit the cancer research community to focus on innovation
  • a shared vocabulary, data elements, data models to facilitate information exchange
  • a collection of interoperable applications developed to a common standard
  • sets of raw published cancer research data available for mining and integration

Domain areas of interest, determined largely by expressed needs in the cancer center community, include translational research, clinical trials management and tissue banking. But the key facet of caBIG that has the capability to transform cancer research and move it to the endgame is its promise of data sharing through syntactic and especially semantic interoperability. caBIG is looking to bring in other organizations to achieve these goals through the broad adoption of compatible standards.


Presentations

Oracle’s Platform for Life Sciences & Preview of 10g Release 2 New Features

Life sciences professionals have the daunting task of accessing, integrating, assimilating, analyzing, and interpreting data and sharing and their findings with other life science professionals. Life sciences professionals know that by integrating their data and findings, newly discovered relationships, insights, and patterns can often lead to promising new cures. Life sciences professionals need to collaborate with other researchers, yet maintain the security of your intellectual property. As a result of years investing in information technology, Oracle has emerged as the leading platform in life sciences with an 85% market share (per IDC). This presentation highlights the features in the Oracle 10g Database, Application Server and Collaboration Suite that have been found to be most of most value for life science customers:

  1. Access distributed data (genomic, proteomic, cheminformatic, pathways, clinical data, etc., external tables, other databases, etc.),
  2. Integrate a variety of data types (files, relational data, unstructured text, XML, images, etc,),
  3. Manage vast quantities of data (grid, real application clusters, streams, data pump, backup, etc.)
  4. Find patterns and insights (statistics, data and text mining, pathways analysis, BLAST, regular expression searches, etc.),
  5. Collaborate securely with other researchers inside and outside an organization (security, files and document sharing, etc., collaborative tools, etc.)

Oracle 10g provides the basis for an integrated information platform for storing, analyzing, and sharing life sciences data and information. A number of customer use cases will be highlighted as examples.



From Mainframe to the “Grid” – a Real World Experience

Applera Corp recently migrated their Life Science application portfolio off a Unix mainframe compute complex to a distributed Linux based “Grid Computing” architecture using commercial off the shelf (COTS) components. The application portfolio consists of a mix of commercial 3rd party applications and life scientific applications with a large component of in-house developed open source code. This session will review recent developments in hardware architecture, operating system capabilities and Grid Computing infrastructure; detail the key design objectives considered during the design of Grid Computing architectures; detailed design decisions around chipset, clustered file systems, processor count and Oracle 9i/10g configurations; and real-world experiences gained during a complex mainframe to distributed architecture migration.

Objectives

  • Gain an understanding of recent developments in hardware architectures and Grid Computing.
  • Gain an understanding of distributed architecture system design objectives.
  • Gain an understanding of distributed architecture migration techniques and lessons learnt.

Outline

  • Design Objectives of Distributed Architectures
    1. Server Provisioning and Reuse
    2. Server and Storage Capacity on Demand
    3. Storage Presentation on Demand
    4. Application Architecture (Service Oriented Architecture)
  • Detailed Design
    1. Chipset and CPU/Memory Ratios
    2. Cluster File Systems
    3. High Availability (Clustering)
    4. Virtual Machines (VMware)
    5. Oracle 9i/10g Configurations
  • Migration Techniques / Lessons Learnt
    1. Migration Techniques
    2. Lessons Learnt



Database storage of multiple conformations of molecular structures, application to structure determination, molecular docking and dynamics simulations

Investigation of molecular conformation of small molecules and/or biological macromolecules is a key activity in drug discovery and other aspect of life science research. Multiple conformations (i.e. multiple coordinate sets) can be produced by structure determination methods such as X-ray and NMR, as well as by computational tools for exploring small molecule and/or macromolecule conformation. The amount of data generated can be large even for individual molecular systems or models.

The output from the vast majority of computational tools available are often stored as a number of flat files. This eventually leads to data management problems, whereby the methods and series of conformations used in a set of calculations becomes unclear with time, and in addition flat file storage is typically wasteful in that certain information is needlessly stored over and over again. Even when database schema for storing molecular data are available, they are often aimed at storing just a single set of coordinates per molecule.

We will discuss a data model aimed at storing multiple instances of molecular conformations and apply this to data derived from structure determination methods, protein/ligand docking, molecular dynamics simulations and other tools for exploring or optimising molecular conformation. Descriptions of the data model in UML and SQL will be described, with examples to the above experimental and computational methods.



Oracle's Solutions for Systems Biology

In order to maximize success in drug discovery it is necessary to integrate all available data that relates to biological systems and disease. This is a considerable challenge due to the quantities of data being generated, the many data types, and the plethora of data sources. These integration challenges are compounded by the dynamic nature of scientific data and the lack of a mature naming convention. This presentation focuses on techniques that can be used to integrate data, and thereby allow new patterns and insights to be discovered.



BTE (Semantic Data Integration)

BG Medicine Inc (BGM), the leading company in the field of Systems Pharmacology, uses Oracle technology for its information infrastructure. Building around an Oracle database, we designed and developed the Biological Traversal EngineTM (BTE), a bioinformatics application tool for data discovery and information integration. BTE allows one to find the information about sets of biological entities among an array of databases based upon either rule-based or graph-based strategies. BTE illustrates how one can take advantage of the novel Oracle technologies, such as Network Data Model, that are provided for the life science community.



Maintaining Security and Identity Management in Life Sciences

Identity is the foundation of any audit and compliance system. However, knowing someone's identity is only part of the solution. This presentation will explore the breadth of Identity Management solutions toward a better understanding of the relevant technologies, key solution requirements, and strategic directions in Identity Management. The presentation will describe Oracle's approach to our customers' security and Identity Management requirements, outlining how these technologies are integrated within the Oracle product suites and how recent acquisitions have positioned Oracle as a leader in the Identity Management market space. The presentation will also outline related Identity Management standards initiatives such as SAFE and the Liberty Alliance and propose how these activities could significantly affect companies over the coming years.



Rapid Application Development using InforSense Open Workflow and Oracle Chemistry Cartridge Technologies

At innovative biotechnology companies the challenge is to provide researchers with an appropriate information environment that promotes discovery and improves productivity. Achieving this goal requires effective collaborative access to a very heterogeneous information environment - typically consisting of disparate data types and sources, and software tools and applications in multiple locations and from a variety of vendors. The lack of appropriate solutions to this problem can hinder internal collaboration, the capture and dissemination of best practices, and the fostering of innovation. A workflow-centric Rapid Application Development environment based on Oracle Database and Chemistry Cartridge technologies provides the ideal informatics solution to meet the requirements of multi-disciplinary teams within a typical biotechnology research organization.

I will describe how Cambria Biosciences will combine InforSense Open Workflow technology with Oracle Chemical Cartridge technology to rapidly develop and deploy solutions to solve problems in a number of key research areas. Particular attention will be given to:

  • Integrative cross domain processes (e.g. Hit-to-Lead processes that combine chemistry and biology).
  • Chemistry Cartridge integration featuring structure handling, fingerprint calculation and chemical searching.
  • In-Oracle analytics
  • Advanced informatics workflow constructs

The resultant production level workflows can be deployed by a variety of means, including web services and portals, to the research organization, thus providing a mechanism for enabling the required seamless and collaborative access to data, tools and applications.



Clustering of Protein Space Using Oracle10g

  • Attribute calculation for large number of sequences using Oracle10g and partitioned tables
  • Cluster computing and the use Java stored procedures created by JDeveloper10g (new symbolic math functions) and the use of Oracle10g for load balancing
  • Similarity searches using Oracle10g and a Linux Cluster
  • Sequence clustering using Oracle10g and a Linux cluster (use of partitioned table allows the clustering of hundreds of millions of relationships within hours)
  • Visualization of clustering results using an Oracle-based solution
  • Attribute importance overlay on cluster result



PhiMS: An Extensible Phenotypic Information Management System for the Electronic Capture and Storage of CRF Data in the Non-Profit Environment

Data collection, processing, and storage of case reports and questionnaires throughout the life-cycle of clinical trials, observational, and epidemiological studies is a challenging prospect but especially so within the non-profit, biomedical research institution. In addition to complexity of the types of data under collection, semantic inconsistency across studies, lack of metadata standards, and a changing collection protocol, the non-profit research institution maintains a non-trivial contstraint on resources due to the nature of publicly grant-funded research. This constraint is not only felt on the informatics and software systems professionals but also upon the project's research assistants, coordinators, and data managers. A system is needed, therefore, that can be efficiently built using existing open-source frameworks and third-party tools, enables data integration at all levels, and increases productivity not only for the informatics and software implementors but also for the project's study personnel and overall project budget.

Here we present PhiMS, the Phenotypic Information Management System that is currently developed and supporting a distributed clinical trial project in nutrition at the Channing Laboratory. Core architecture and infrastructure components of PhiMS uses the open-source frameworks of XML and web services, pulicly accessible ontologies for controlling vocabulary and semantics, along with specific, inexpensive technologies used for deployment. PhiMS employs ontology-aware, XSD-governed study design, data capture, and data harvesting infrastructure where all project data is captured via XML documents, encrypted, and securely transferred to a central Oracle repository for integration, archiving, and analysis. Development tools assist investigators in specifying data capture instruments and target data structures using standard vocabularies for methodological concepts (e.g., study design or assay methods) and substantive concepts (e.g., specific anatomical or functional properties defining study outcomes). Underlying representations have been chosen to permit very flexible rendering of data capture instruments and real-time validation of input data against XSD schemata, and to allow efficient adaptation of intake/database/data repository infrastructure to changes in study instrumentation.



Electronic Lab Notebooks in the Real World: A LABTrack case study documenting productivity gains & the functionality needed to be successful.

EKM's LABTrack Electronic Lab Notebook has been on the market for over 8 years and boosts a customer base of over 200 companies. This presentation describes a recent independent productivity study performed on an actual LABTrack implementation. The study found a 15% productivity gain achieved with carefully crafted user interface design and functionality.

  • What makes an Electronic Lab Notebook?
  • Maintaining a legally accepted document.
  • What ELN functions can actually improve productivity?
  • What features are needed for Research, R&D, QC and Services laboratories?
  • How does integration with existing systems impact productivity?
  • What features are important outside of the laboratory?



Life Science Examples Using Oracle at SDSC

We will show several examples using oracle database and application servers in the Molecule Pages Database, AfCS data center and BISTI projects. The Molecule Pages Database project will be used to illustrate how we use Oracle database and OC4J server. Putting a Java package inside our database as stored procedures and using text index in database will be covered. The BISTI project will be used to show how to integrate SMD Oracle schema and VAMP MySQL schema to one microarray database in Oracle 10g.



Managing a Complex Migration Project – A Standards Based Approach

Applera Corp recently migrated their Life Science application portfolio off a Unix mainframe compute complex to a distributed Linux based “Grid Computing” architecture using commercial off the shelf (COTS) components. One major reason for the success of this project was the use of structured approaches for both the overall project and for the detailed design phases. Systems Development Life Cycle (SDLC) was the methodology used for the overall project and an expanded RASM approach (sometime known simply as RAS) was used during the detailed design phases. This paper will outline the structured methodology (SDLC) used to plan and execute the migration project; the structured design approach for detailed design, with a focus on Grid Computing; outline real-world experiences of using these approaches during a complex migration project.

Objectives

  • Gain an overview of a structured methodology (SDLC).
  • Gain an overview of a structured design approach for Grid Computing systems (RASM).
  • Gain an understanding of how these approaches work in a real world migration project.

Outline

  • Structured Methodology (SDLC)
    1. Waterfall Model
    2. Initiation, Design, Development, Transition, Disposal
  • Structured Design (RASM)
    1. Reliability
    2. Availability
    3. Scalability
    4. Manageability
    5. Performance
    6. Measurability (Monitoring)
  • Implementation / Lessons Learnt
    1. Implementation
    2. Lessons Learnt



Transforming a DBA group into a Database Center of Excellence - A Case Study

One of the most valuable resources in any organization is the Database Administrator - the guardian of your corporation’s data. When these same valuable resources spend all of their time on non value-added activities, you're wasting them.

Creating a Database Center of Excellence delivers value to the organization, reduces TCO, retains your high value professionals, allows ITIL adoption, improves morale and teamwork throughout the organization and allows you to fully engage in those standardization and consolidation plans that you've had on the 'to do' list.

Hear how it is done.

What will be learnt?

  • You'll learn how to transform a group of individual DBA's into a fully functioning, team oriented Database Center of Excellence.
  • You'll learn techniques on how to reduce your operational costs and TCO.
  • You'll learn how to deliver value to the organization.
  • You'll learn how to retain your specialized, talented professionals.



TeraGenomics - Migration of VLDB from Teradata to Oracle

More than three years ago IMC, Inc., started to develop the TeraGenomics solution on the top of the Teradata database. TeraGenomics is a high-performance data warehousing solution for managing, analyzing, and sharing Affymetrix® GeneChip® gene expression microarray data. To extend the potential customer base IMC recently decided to enable that solution to run on the top of other databases that can effectively support very large data sets. Oracle is the first database IMC recently elected for the migration effort.

The presentation will focus on the issues and challenges the development team has to overcome. IMC currently hosts the TeraGenomics solution as an ASP. The database contains more that 7,000 microarrays and that translates to more that 4 billions rows of data to be managed. The solution can scale to support hundreds of users analyzing data from tens of thousands of GeneChip arrays.

TeraGenomics has a MIAME-compliant metadata structure comprised of 140 data elements within controlled vocabularies (e.g., for organism, anatomy, and disease type) accessed through drop-down menus. Microarray data in the warehouse can be queried by any of these elements.

TeraGenomics supports probe level pair-wise comparisons among dozens, hundreds, or even thousands of chips through a rapid point-and-click interface, and stores the comparison results in the warehouse for reuse. Among others TeraGenomics also supports the RMA analysis with no limits on the number of chips included.

The application is a browser-based thin client solution that can be securely deployed over the web or on an Intranet. The users can analyze their data using variety of visualization approaches (clustering, Venn diagrams, plots, etc.). TeraGenomics is integrated with the Affymetrix GeneChip Operating Software (GCOS) platform to support seamless uploads of experiments.

TeraGenomics is an ideal solution for scientists in multiple locations who perform experiments with GeneChip brand arrays to consistently manage their data and share it for collaborative research. The presentation will also mention some of the recent articles published in the scientific journals that came from research efforts using TeraGenomics.

TeraGenomics supports easy exporting of data to popular desktop tools such as GeneSpring® and Spotfire®, and to powerful 3-D neurogenomic visualization tools from Neurome, as well as in both MAGE-ML and CSV formats.



Data warehouse development for integrative biomedical informatics research

Windber Research Institute is conducting integrative high throughput research involving clinical, genomic, and proteomic platforms to produce terabyte levels of data. Working with Walter Reed Army Medical Center and other medical institutions we enroll subjects into study and gather data of about 500 fields ranging from demographics to pathological tissue annotations. High throughput research is conducted using DNA, RNA, and protein extracted from the collected blood, breast and other specimens. The major platforms include gene expression, genotyping, comparative genomic hybridization, protein isolation, and mass spectrometry protein identification. We have a data tracking system for most of those platforms, with additional modules being developed. Recently the need for image handling is surfaced.

To prepare such large scale of data as well as involved public data for biomedical informatics research, we envisioned that a data warehouse is the solution, and we opted for a hybrid structure. In the last two years we have contracted the data warehouse to a company using the Teradata hardware and RDBMS. Currently we have decided that to best meet our research needs we should re-design the data warehouse by ourselves, using a patient-centric and object-based structure, and the underlying database should be Oracle. This presentation will discuss this new design of the data warehouse as well as the development of visualization and analysis tools.



Managing Research Collaboration with Oracle Collaboration Suite Workspaces

Research Collaboration sits at the heart of Life Sciences research. Given the current tools for collaboration, researchers split their time between organizing research data, sharing it with fellow researchers, complying with rules and regulations and spending actual time analyzing it. This activity prolongs the time needed to conclude the research. It has been shown time and again that research organizations spend enormous amount of money in funding new research and studies. Any reduction in time and efforts needed to reach a conclusion directly adds to savings. Better organization of research project not only saves money, it also helps encourage re-use of information and learning from one project to another.

Oracle Collaboration Suite Workspaces provide a number of capabilities that can help simplify the management of research project. Research information and activities can be organized with minimal effort. Workspaces enable researchers to work with their favorite tools and organize their research data behind the scene. A number of content management features and ability to add-on records management capabilities for all content help researchers comply with regulatory compliance efforts with relatively small effort. Oracle Workspaces provide the following benefits to Life Sciences research community:

  • Enable team collaboration with the context of a project complete with document, discussions, meetings, tasks, notifications and announcements
  • Reduce unwanted copies of any information
  • Named links to establish relationship among documents and activities around them
  • Tasks assignments and links to documents
  • Privileged access to different content
  • Default content management services transparently applied to workspace content including:
    • Auto-attribution
    • Auto-versioning
    • Default workflows
    • Other document policies
  • Participation via email
  • Access to relevant workspace content from their personal productivity tools such as:
    • Documents from windows mounted OracleDrive with in-place editing and off-line access to documents
    • INBOX and Discussions accessible from IMAP compliant mail client
    • Workspace meetings in personal calendar
    • Optional email notification with summary and quick links
  • Quickly capture best-practices from a workspace in a template and use them to quickly create new workspaces
  • View to create focused access to workspace content
  • Full access to workspace APIs to enable integration into custom applications and tools without any duplication of information. This allows access to all the same information from tools provided with Collaboration Suite.



Enhanced Reporting for Oracle Clinical and Oracle AERS

This presentation will discuss how users can enhance reporting and analysis capabilities of their Oracle Clinical and Oracle AERS applications. Using Oracle Discoverer, personalized dashboards may be designed to enable clinical data management and pharmacovigilance users to interactively analyze clinical and safety information. This powerful tool can provide valuable insights into data management and pharmacovigilance processes, thereby enabling customers to expedite clinical trials through efficient tracking of problems in data management, optimize study design and improve regulatory compliance.

Examples of reporting and analysis for Oracle Clinical data include:

  • Trial status analysis (e.g. enrollment, progress, etc.)
  • Discrepancy listing analysis by study, site and patient
  • CRF page analysis
  • Study Design analysis (e.g. DCI, DCM, procedures, etc.)
  • Global Library analysis (e.g. Studies that utilized a parameterized question, question group, or particular DVG subset)
  • Lab test analysis (e.g. lab ranges, etc.)
  • TMS analysis (e.g. dictionary mapping, uncoded terms, etc.)

Examples of reporting and analysis for Pharmacovigilance data include:

  • Case status analysis (e.g. case progress)
  • Regulatory agency reporting metrics
  • Case origin analysis (e.g. by country, by product, etc.)


Workshops

Text Mining with Oracle

Oracle Technology is known for its ability to store vast quantities of data and bring data mining algorithms directly to the data. This capacity extends to unstructured data, such as literature, which is of great importance in life sciences. Oracle encompasses a set of capabilities designed specifically for text analysis, as well as more generic data mining algorithms which can be applied in text mining.

This workshop will present the use of some of the Oracle Text and Oracle Data Miner (ODM) features for text mining of MEDLINE. Practical methods of document searching, clustering and classification will be demonstrated along with enhancements through the use of ontologies, thesauri and the Oracle Text knowledge base. We will also show how to combine unstructured with structured data along with a few simple methods for results-visualization.



BLAST and Regular Expression Searches

In this workshop we discuss two database features in Oracle Database 10g: Oracle Data Mining (ODM) BLAST and Regular Expression Searches. Scientists can use ODM BLAST to perform sequence homology searches and Regular Expression Searches to perform pattern matching, inside the database. Performing data analysis in the database simplifies data management by minimizing the movement of data from disks to memory, allows pre-filtering and post-processing of data sets, and enables data to remain in a secure, highly available environment. These new database features enable scientists to take advantage of a new analytical paradigm to simplify data management in research.



HTML DB

Oracle HTML DB - a no-cost feature of the Oracle Database 10g - is a declarative web-based application development & deployment environment. With it, you can quickly create and deploy secure, scalable web applications.



Network Data Model for Biological Pathways

The Oracle Spatial Network Data Model (NDM) feature enables graph modeling and analysis in Oracle Database 10g. NDM explicitly stores and maintains connectivity (nodes, links, and paths) within networks and provides network analysis capability such as shortest path and connectivity analyses. NDM includes a PL/SQL API Package for Network Data Query and management, and a Java API for network creation, representation, editing and analysis. This workshop will give an overview of the architecture of NDM, and provide customer use cases. Demos of the functionality will be shown using Cytoscape and Tom Sawyer.



Managing Life Science and Medical Image Repositories Using Oracle interMedia 10g Release 2

Oracle interMedia extends the benefits of the database to media data. Images can be stored and managed in the database for security, ease of maintenance, scalability, and other advantages. interMedia functionality includes metadata management, image processing, interfaces for media upload and retrieval, and tools for easy application development involving media. Oracle interMedia 10g Release 2 (now in beta) provides support for metadata management and DICOM (the medical image standard). Metadata support for Life Science applications is based on XMP and the XMP framework allows user's to write metadata into images based on any user-defined schema. The metadata is hence part of the image and cannot be lost or disconnected from the image itself. DICOM support allows applications to read metadata such as patient information from medical images. All metadata and images can be stored securely in the database using interMedia.

The workshop will start with a 10 minute overview of interMedia. This will be followed by a closer look at two sample applications. The first application will walk-through the steps of storing images in the database and managing the metadata associated with images in the database. The application will create a table of images in the database, create thumbnails of those images, retrieve images from the table, extract metadata from the images, add new metadata, define and add an application defined schema to the image metadata, and search the image metadata. It will also include some processing functionality. The second application will walk-through the steps of storing DICOM images in the database and will extract and manage DICOM metadata in the database. Segments of PL/SQL and Java code from both applications will be presented.



Oracle Data Mining

This workshop provides information about the applications of data mining technology using several life sciences examples. Participants will learn how to find the factors associated with certain pathogens. The workshop will present the attribute importance algorithm which is used to find the attributes (variables) that most influence a user specified target attribute e.g. positive outcomes, high risk patients, or a disease. Several classification algorithms including Naïve Bayes, Adaptive Bayes Networks, Support Vector Machines, and Decision Trees will be presented to build models that can be used to make predictions e.g. which patients are likely to respond to treatment or determine whether a tissue sample is healthy or unhealthy based on gene expression data. Decision trees, Association Rules, and Clustering techniques will be presented to help uncover hidden factors and associations that can be used to improve the quality of life, outcomes, or clinical care. Lastly, this workshop will present how “unstructured data” e.g. a physician’s notes or a laboratory report can be used to extract more hidden information and build better predictive models.



RDF Data Model for the Semantic Web

A project that is focused on true semantic interoperability is the Semantic Web initiative by the W3C. The idea behind the Semantic Web is that it creates a globally distributed database by adding definition tags to information that is available on the Web and linking the tags in such a way that computers can discover data more efficiently and form new associations between pieces of information. The aim is then that applications will be able to take advantage of distributed data, and incorporate data that users were not aware existed. Semantic tagging is unlike most other data integration approaches as it allows relationships to be discovered. For example, gene definitions can become functionally linked to inherited diseases, and their position within a biochemical pathway can be interwoven. Resource description framework (RDF) is the recommended standard from W3C for the common data format, which is required for making data available in the Semantic Web language. The syntax of RDF is a triple, subject-predicate-object, which allows data to become connected piece-by-piece, and link-by-link. This workshop will describe the RDF Data Model in Oracle Database 10g, and provide a demo of the capabilities.



Statistical analysis of gene expression data with Oracle & “R”

With Oracle's 10g database, most statistics, informatics, and traditional machine learning can be done on all types of data completely inside the database. The Oracle Data Miner interface is a Java based GUI that allows one to do most of the traditional data mining functions, however, many specialized analyses such as gene-to-gene correlation and t-statistic calculations can best be done with Oracle's SQL or PL/SQL API. Traditional programming languages such as Java, C++, C-sharp can also be used to access these database functions.

This talk/workshop will demonstrate database techniques for this type of informatics. SQL and PL/SQL code and packages will be demonstrated, analyzed and made available for many specific gene expression and other life science high throughput analysis.

Some of these techniques will include:

  • Affymetrix gene expression 2d wide to transaction or nested table format - This allows many thousands of genes to be attributes or columns in machine learning clustering and predictive analysis.
  • Efficient use of Correlations, t-statistics, f-statistics and Minimum Description length (MDL) in determining the most significant genes in an expression experiment.
  • Various statistical significance tests - Bonferroni, FDR, etc.
  • Other analysis techniques, Protein Arrays, Mass. Spec, QSAR , etc. within Oracle's 10g DB.


Loading Life Sciences Data into Oracle Database 10g

This workshop provides an overview of a range of techniques for loading data into Oracle Database 10g. Features to be covered include SQL*Loader, Data Pump and Oracle Warehouse Builder.

SQL*Loader allows data to be loaded into Oracle tables from operating system files. It can load data from multiple datafiles during the same session, as well as having the capability to load data into multiple tables. The data can be loaded from disk, tape, or named pipe. SQL*Loader contains a powerful data parsing engine which puts little limitation on the format of the data in the datafile.

Data Pump is a high speed, parallel infrastructure that enables quick movement of data and metadata from one Oracle database to another. This technology is the basis for Oracle's new data movement utilities - Data Pump Import and Data Pump Export.

Oracle Warehouse Builder is a tool to enable the design and deployment of Business Intelligence applications, data warehouses and data marts. Warehouse Builder enables users to design their own Business Intelligence application from start to finish. Dimensional design, ETL process design, extraction from disparate source systems, extensive metadata reporting and integration with Oracle Discoverer, Oracle Workflow and Oracle Enterprise Manager enable an integrated Business Intelligence solution with Warehouse Builder at the core.