Cloud Based Resource for Data Hosting, Visualization and Analysis Using UCSC Cancer Genomics Browser

Award Type: 
U24
Award Year: 
2013
PI Name(s): 
David Haussler
Institution(s): 
University of California, Santa Cruz
Status: 
Active
Description: 

The Cancer Analysis Virtual Machine (CAVM) project will leverage cloud technology, the UCSC Cancer Genomics Browser, and the Galaxy analysis workflow system to provide investigators with a flexible, scalable platform for hosting, visualizing and analyzing their own genomic data. At the core of the platform is a network of cloud-enabled virtual machines (CAVMs) that consist of three tightly-integrated components: a data server for high-performance data storage and retrieval, UCSC Cancer Genomics Browser for data visualization, and Galaxy workflow system pre-packaged with UCSC's suite of tools for nextgen sequencing analysis and pathway inference. Users can upload their data into their private CAVM and integrate it with datasets from other CAVMs through the same API. UCSC will host a public CAVM of public-accessible data in UCSC's cancer genomics data repository (data from TCGA, CCLE, LINCS and etc.). The system allows the dynamic formation of new virtual datasets composed of data slices from multiple sources. The ability to combine data into larger sample sizes will provide the statistical power necessary to allow discoveries that would otherwise not be possible. The user can analyze data with Galaxy, visualize the results with the UCSC Cancer Genomics Browser, store the results in CAVM, and share the results with either public or protected data access. New analysis tools can be integrated via Galaxy, and the data server is modular such that it can provide data independently to third-party applications. The virtual machines can be easily initiated in a commercial cloud environment or installed within the user's institution. Users will be able to turn on a CAVM as needed to ingest, visualize and analyze their own data. The simplicity of copying and saving VMs enables reproducible science and will allow easy distribution of software updates. CAVM provides researchers with scalable data storage, federated data integration, and high-performance computing resources, without the cost and overhead of installing and administering a separate system in-house.

Reporter URL:  https://projectreporter.nih.gov/project_info_details.cfm?aid=9330115&icde=49012551