A. List of Jupyter Kernels:

We have the following kernels/languages that is pre-installed in the researcher workspace

Language Versions
Python 2.7.x
3.x.x
PySpark
R 3.3.2
SparkR
Julia
Scala
SQL
Bash
iTorch

B. List of R and Python packages:

The following packages are available in the Production Workspace. We have installed the tools using standard automated methods. The implication being that the packages that are installed are the latest versions at the time of provisioning the workspace.

Python Packages
Machine Learning
  • scikit-learn - open source machine learning and visualization library
  • Theano - Efficiently evaluate mathematically expressions involving multi-dimensional arrays
  • Keras - Deep Learning library for Theano and TensorFlow
  • NiLearn - Machine Learning for Neuro Imaging in Python
  • TensorFlow - An open-source software library for Machine Intelligence
Scientific Computing
  • numpy - fundamental package for scientific computing using N-dimensional arrays
  • scipy - open source library of scientific computing
  • numexpr - fast numerical array expression evaluator
Data Analysis
  • Pandas - data analysis library
Statistics
Generic
  • ez_setup - installation helper library
  • boto3 - AWS SDK for Python
Visualization
  • ggplot - plotting system for Python based on R's ggplot2
  • Matplotlib - 2D plotting library
  • autovizwidget - An auto visualization library for pandas dataframe
R Packages
Generic
  • RJSONIO serialize R objects to JSON
  • itertools - Tools for creating iterators, based on Python equivalents
  • digest - a function to create hash of R objects
  • Rcpp - provides seamless integration between R and C++
  • functional - a higher-order functions library
  • httr - tools for working with URLs and HTTP
  • stringr - wrapper for common string operations
  • rJava - simple R-to-Java interface
  • DBI - for communication between R and RDBMS systems
  • devtools - tools to make Developing R packages easier
  • R.methodsS3 - methods that simplify the setup of S3 generic functions and methods
    • S3 here does not mean AWS S3
  • memoise - a method to cache the results of functions
  • rjson - converts R objects to JSON and vice versa
  • curl - - A modern and flexible web client for R
  • pbdZMQ - - interface to Zero MQ messaging system
  • uuid - tools for generating and handling UUID
  • htmltools - tools for html generation and output
  • repr - string and binary representations of objects
  • IRdisplay - interface to rich display capabilities of Jupyter frontend
  • evaluate - Parsing and Evaluation Tools that Provide More Details than the Default
  • crayon - Colored terminal output
Data Handling
  • Reshape2 - Flexibly restructure and aggregate data
Statistical Tools
  • caTools - basic statistical utility functions
  • fUnitRoots - environment for teaching "financial engineering and computational finance"
  • vars - collection of statistical functions
  • e1071 - misc functions from department of statistics, probability theory group
Visualisation
  • ggplot2 - create elegant and complex plots
  • shiny - web application framework for R
  • corrplot - graphical display of correlation matrix
  • plotly - graphing library makes interactive, publication-quality graphs online
  • ROCR - a visualization package
  • Shiny Dashboard - create web-based dashboards
  • rattle - A gnome based GUI for Data Mining
  • rpart.plot - plot 'rplot' models
Data Analysis
  • Hmisc - contains many data analysis functions
  • aod - functions to analyze over dispersed data
  • tseries - time series analysis and computational finance
  • markdown - turns analysis into high quality documents, reports
  • plyr - a set of tools to break down large problems into small manageable pieces
  • dplr - a set of tools to work with data frame like objects in-memory and out-of-memory
  • FSelector - Functions for selecting attributes from dataset
  • party - A computational toolbox for recursive partitioning
  • R Studio Server - A web-based IDE for R. Port currently not exposed.
Machine Learning
  • randomForest - classification and regression library
  • arm - helper functions for regression
  • C50 - C5.0 decision trees and rule-based models for pattern recognition
  • DT - R Interface to DataTables library
  • ipred - Improved predictive models
  • caret - ML functions for regression and classification

C. Genomics and Bioinformatics Tools

The following packages are available in the Production Workspace. We have installed the tools using standard automated methods. The implication being that the packages that are installed are the latest versions at the time of provisioning the workspace.

Genomics / Bioinformatics Tools
ADAM Genomics Available on researcher workspace
SnpEff Genomics Available
Picard Genomics Available
SAMTools Genomics Available
BWA Genomics Available
Bioconductor Coming soon
BioPerl Coming soon
BioPython Coming soon
BioRuby Coming soon
BioJava Coming soon
Galaxy Coming soon
Hail Coming soon
PLINK 2 Coming soon