MaGIC Dimensionality Reduction Tool

Welcome to the Dimensionality Reduction Tool by the Molecular and Genomics Informatics Core (MaGIC).


About Dimensionality Reduction

Dimensionality reduction can be used to reduce datasets with high numbers of features into smaller summarized dimensions. This can be used to visualize similarities and dissimilarities between samples in your dataset. Ideally, samples that are prescribed as similar should group with like samples- for example your control group and treatment group should cluster respectively.This type of dimensionality reduction can usually be performed using PCA, tSNE, or UMAP.


PCA vs tSNE vs UMAP

Each dimensionality reduction method has its own use. In many datasets they will tell a similar story, but it is best to decide which optimally fits your experimental design. In general, PCA will be used for smaller datasets (such as a few samples RNA-seq) and will begin to scale into tSNE and UMAP as the sample N increases, such as for scRNA-seq.

PCA

Principal component analysis (PCA) is a common linear method of dimensionality reduction. In essence, the variant features in your dataset are distilled into eigenvector components that capture the maximum amount of variance.

tSNE

t-Stochastic Neighbor Embedding (tSNE) is a graph based and non-linear dimensionality reduction method. Essentially, it is calculating the distance for the embedding on the distance to neighbor cells in PCA space.

UMAP

Uniform Manifold Approximation and Projection (UMAP) is a non-linear dimensionality reduction method. UMAP is similar to tSNE, but scales effectively and preserves local/global distances for delineating groups.


Data Sources

The data for dimensionality reduction can come from any type of count data. A few examples are RNA-seq normalized hit counts, Luminex assay counts, plant petal size/color, etc. If your data sources contain smaller N's, start with the PCA plots. Both tSNE and UMAP are designed for larger datasets, and will require custom hyperparameter tweaks to run, even with the demo data.


Minimum requirements

To use this tool, at minimum you must have a tsv/csv table containing with the first row containing your row identifiers (for example Gene IDs), followed by a column with count data per sample (for example VST normalized hit counts). Additionally you need a metadata table. The first column should contain the sample names matching the columns in the count data. Each subsequent column can include any non-measured data, such as grouping variables.


Input Data


Upload your own data

Loading...

Loading...

Plot Options

Show point options
Show label options
Encircle the samples
Fill the encircle
Stat ellipses
Ellipse fill
Legend Options
Show label options
Show function options
Show label options
Resize Image

Loading...

Loading...

Loading...

Loading...

Plot Options

Show point options
Show point labels
Show legend options
Advanced tSNE options

Caution. These are some of the tSNE hyperparameters. We strongly recommend you do not tweak these unless you understand the changes being made. Some of these will drastically increase runtime and memory consumption or trigger crashes.

Center PCA data
Internally normalize PCA data
Scale PCA data
Resize Image

Loading...

Loading...

Plot Options

Show point options
Show point labels
Show legend options
Advanced UMAP options

Caution. These are some of the UMAP hyperparameters. We strongly recommend you do not tweak these unless you understand the changes being made. Some of these will drastically increase runtime and memory consumption or trigger crashes.

Resize Image

Loading...

Loading...