#DATASCIENCE OMICS-2020

An Online Program on Data Science for Biomedical Discovery
Aug 03, 2020, 10 a.m CST
ONLINE

About the Program:

Aug data science square

 

The rapid growth of high-throughput data, including -omics technologies, gave rise to a significant demand for data science skills and experience with bioinformatics methods of analysis. To help introduce biologists, clinicians and students to cutting edge bioinformatics methods and commonly used data science concepts, our team designed an online bioinformatics training program called OmicsLogic.

 This online summer program is designed for Data science beginners students interested in data-driven research questions. The program will include aspects of data science, such as data wrangling, visualization, statistical analysis and machine learning. The methods will be reviewed in the context of biomedical and other scientific problems students will study. 

PROGRAM TOPICS

introduction_data
High-throughput Biomedical Data
Omics in Diagnostics, Drugs and Precision Medicine
Exploratory_Analysis
Exploratory Data Analysis
Descriptive Statistics, Lognormal Distribution and Normalization
DataMining-2
Data Mining And Clustering
PCA, Clustering and Finding Trends in Objects and Features
Classification
Feature Selection and Classification
Predictive Analysis, ML and Feature Significance

Program Overview

Program Schedule

aug
-
03
Introduction to data types and properties
Overview of commonly used Omics dataNGS, Mass-Spec, phenotypic data (genomics, transcriptomics, metagenomics)Phenotypes: clinical, imaging, metadata (research, clinical, biotech, pharma)The need for preparation of raw data for analysis 
aug
-
04
Big Data Challenges and Opportunities
Availability and variability of data and associated technologies for generating such data Unprecedented Availability, Variability, Detail and Volume of dataData heterogeneity, complexity and noise Need for structure and reproducibility (FAIR principles)
Aug
-
05
Cleaning Loading and Processing data
Analysis logic: from raw reads to a table of expression (RNA-seq example)Common sources of unwanted technical variation Pre-processing steps, filtering and cleaning the table of expressionLoading processed data for analysis 

Aug
-
06
Exploratory data analysis: summary and visualization
Summary statistics (histogram, boxplot, scatterplot of 2 samples compared to each other, Excel operation)Visualization of practice data - compare regular and ln scale of gene expression and discuss distribution and log-normal distributionMissing data and data errors (remove 0s, filter anything below 2 in ln scale in R)Summary statistics in R.


Aug
-
07
Hands-on: handling large and complex datasets
Learn how to make statistical representations of the data and how to address missing or data errors.How do you compare same gene from different samples in the same condition and how to compare all genes between different conditionsHow to find a sample that has poor quality reads or lots of missing gene, low expression (outliers)3 types of data: gene expression (continuous), clinical (categorical) and drug response (LD50 - continuous, but different variance, less features)
 

aug
-
10
Machine Learning and Artificial Intelligence
Hypothesis testing 101: compare conditions and find p-valueData driven discovery: discover groups or conditionsProcess of inference for a machine versus human. (What and how machines learn.) 

AUG
-
11
Unsupervised Learning: dimensionality reduction and clustering
Identifying patterns and LearningNeed for data miningOverview and examples: PCA, k-means, h-clustering (run example on T-Bio and then open the script in R and modify it)


AUG
-
12
Supervised Machine Learning: classification and feature selection
Conceptual introduction: Known sample data is used to train the computer to use these patterns to correlate to unknown data.Binary decision trees, random forest, then LDA, then swLDA.Classification and Regression

AUG
-
13
Model accuracy and validation
Technical accuracy (ROC curve, AUC)Logical or biological relevance (compare feature selection with PCA by subtype or clinical phenotype)Trained Model validation: Learning how a model used to analyze data is accurate and valid across multiple datasets.Hands-on example: cross-validation, Leave 1 out analysis

AUG
-
14
ML in production - getting results with ML
The interaction between artificial intelligence and human Differences between ML and AIIn what ways can AI support human research and decision makingTraining and research extractions are applied in new ways. 

Sign Up To Learn More: