About Workspaces

A workspace is a cloud-based compute cluster built on Hadoop/AWS EMR that you can interact with directly in your browser. Once registered for the AHA Precision Medicine Platform, you can provision a workspace from the My Workspace page. It will take approximately 1 hour before the My Workspace page will display a button that you can click to log into your workspace.

The AHA Precision Medicine Platform provides a friendly web UI that allows you to write code in various languages (for example, Python, R, Scala), execute the code, and view the results as they are processed.

The AHA Precision Medicine Platform UI is based around the concept of notebook files, where each notebook contains one or more code blocks (called cells). All of the content you write in your notebook is saved, even if you pause your workspace and come back to it later.

We have included Spark libraries for various languages (e.g. Pyspark, if you're writing Python code) to leverage the full parallel computing power of the Hadoop/AWS EMR platform. You can find examples and code snippets in the sample notebooks that are included on each workspace you provision.

Web Applications:

Application Introduction Version
Jupyter Notebooks The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, machine learning and much more. v4.4.1
RStudio RStudio is an integrated development environment (IDE) for R. It includes a console, syntax-highlighting editor that supports direct code execution, as well as tools for plotting, history, debugging and workspace management. v1.0.153

List of Jupyter Kernels:

We have the following kernels/languages that are pre-installed in the researcher workspace

Language Versions
Python 2.7.12
Python3 3.5.1
Spark 2.2.0
R 3.4.1
Julia 0.6.0
Scala 2.12.3
Bash 4.2.46
Torch 5.3.0
Java 1.8.0

List of R and Python packages:

The following packages are available in the Production Workspace. We have installed the tools using standard automated methods. The implication being that the packages that are installed are the latest versions at the time of provisioning the workspace.

Python Packages
Machine Learning
  • Scikit-learn - Open source machine learning and visualization library
  • Theano - Efficiently evaluate mathematically expressions involving multi-dimensional arrays
  • Keras - Deep Learning library for Theano and TensorFlow
  • NiLearn - Machine Learning for Neuro Imaging in Python
  • TensorFlow - An open-source software library for Machine Intelligence
Scientific Computing
  • Numpy - Fundamental package for scientific computing using N-dimensional arrays
  • Scipy - Open source library of scientific computing
  • Numexpr - Fast numerical array expression evaluator
Data Analysis
  • Pandas - Data analysis library
  • Ez_setup - Installation helper library
  • Boto3 - AWS SDK for Python
  • Ggplot - Plotting system for Python based on R's ggplot2
  • Matplotlib - 2D plotting library
  • Autovizwidget - An auto visualization library for pandas dataframe
R Packages
  • RJSONIO - Serialize R objects to JSON
  • Itertools - Tools for creating iterators, based on Python equivalents
  • Digest - A function to create hash of R objects
  • Rcpp - Provides seamless integration between R and C++
  • Functional - A higher-order functions library
  • Httr - Tools for working with URLs and HTTP
  • Stringr - Wrapper for common string operations
  • RJava - Simple R-to-Java interface
  • DBI - For communication between R and RDBMS systems
  • Devtools - Tools to make Developing R packages easier
  • R.methodsS3 - Methods that simplify the setup of S3 generic functions and methods
  • Memoise - A method to cache the results of functions
  • Rjson - Converts R objects to JSON and vice versa
  • Curl - A modern and flexible web client for R
  • PbdZMQ - Interface to Zero MQ messaging system
  • Uuid - Tools for generating and handling UUID
  • Htmltools - Tools for html generation and output
  • Repr - String and binary representations of objects
  • IRdisplay - Interface to rich display capabilities of Jupyter frontend
  • Evaluate - Parsing and Evaluation Tools that Provide More Details than the Default
  • Crayon - Colored terminal output
Data Handling
  • Reshape2 - Flexibly restructure and aggregate data
Statistical Tools
  • CaTools - Basic statistical utility functions
  • FUnitRoots - Environment for teaching "financial engineering and computational finance"
  • Vars - Collection of statistical functions
  • E1071 - Misc functions from department of statistics, probability theory group
  • Ggplot2 - Create elegant and complex plots
  • Shiny - Web application framework for R
  • Corrplot - Graphical display of correlation matrix
  • Plotly - Graphing library makes interactive, publication-quality graphs online
  • ROCR - A visualization package
  • Shiny Dashboard - Create web-based dashboards
  • Rattle - A gnome based GUI for Data Mining
  • Rpart.plot - Plot 'rplot' models
Data Analysis
  • Hmisc - Contains many data analysis functions
  • Aod - Functions to analyze over dispersed data
  • Tseries - Time series analysis and computational finance
  • Markdown - Turns analysis into high quality documents, reports
  • Plyr - A set of tools to break down large problems into small manageable pieces
  • Dplr - A set of tools to work with data frame like objects in-memory and out-of-memory
  • FSelector - Functions for selecting attributes from dataset
  • Party - A computational toolbox for recursive partitioning
Machine Learning
  • RandomForest - Classification and regression library
  • Arm - Helper functions for regression
  • C50 - C5.0 decision trees and rule-based models for pattern recognition
  • DT - R Interface to DataTables library
  • Ipred - Improved predictive models
  • Caret - ML functions for regression and classification

Genomics and Bioinformatics Tools

Genomics / Bioinformatics Tools
ADAM Genomics v0.22
SnpEff Genomics Installed
Picard Genomics Installed
SAMTools Genomics v1.5
BWA Genomics v0.75.15
Bioconductor Installed
Hail Installed
PLINK v1.07