MaGIC Dimensionality Reduction Tool

Welcome to the Dimensionality Reduction Tool by the Molecular and Genomics Informatics Core (MaGIC).

About Dimensionality Reduction

Dimensionality reduction can be used to reduce datasets with high numbers of features into smaller summarized dimensions. This can be used to visualize similarities and dissimilarities between samples in your dataset. Ideally, samples that are prescribed as similar should group with like samples- for example your control group and treatment group should cluster respectively.This type of dimensionality reduction can usually be performed using PCA, tSNE, or UMAP.

PCA vs tSNE vs UMAP

Each dimensionality reduction method has its own use. In many datasets they will tell a similar story, but it is best to decide which optimally fits your experimental design. In general, PCA will be used for smaller datasets (such as a few samples RNA-seq) and will begin to scale into tSNE and UMAP as the sample N increases, such as for scRNA-seq.

PCA

Principal component analysis (PCA) is a common linear method of dimensionality reduction. In essence, the variant features in your dataset are distilled into eigenvector components that capture the maximum amount of variance.

tSNE

t-Stochastic Neighbor Embedding (tSNE) is a graph based and non-linear dimensionality reduction method. Essentially, it is calculating the distance for the embedding on the distance to neighbor cells in PCA space.

UMAP

Uniform Manifold Approximation and Projection (UMAP) is a non-linear dimensionality reduction method. UMAP is similar to tSNE, but scales effectively and preserves local/global distances for delineating groups.

Data Sources

The data for dimensionality reduction can come from any type of count data. A few examples are RNA-seq normalized hit counts, Luminex assay counts, plant petal size/color, etc. If your data sources contain smaller N's, start with the PCA plots. Both tSNE and UMAP are designed for larger datasets, and will require custom hyperparameter tweaks to run, even with the demo data.

Minimum requirements

To use this tool, at minimum you must have a tsv/csv table containing with the first row containing your row identifiers (for example Gene IDs), followed by a column with count data per sample (for example VST normalized hit counts). Additionally you need a metadata table. The first column should contain the sample names matching the columns in the count data. Each subsequent column can include any non-measured data, such as grouping variables.

Input Data

Upload your own data

Select your counts file

Browse...

Select your metadata file

Browse...

Data table
Metadata table

Plot Options

Show point options

Color By

Point size:

Show point labels

Label By

Label size:

Show legend options

Legend Label Size:

Legend Position

Top Bottom Right left

Axes Label Size:

Advanced UMAP options

Caution. These are some of the UMAP hyperparameters. We strongly recommend you do not tweak these unless you understand the changes being made. Some of these will drastically increase runtime and memory consumption or trigger crashes.

Number of nearest neighbors

Seed value

Number epochs

Choose distance metric

Minimum distance

Op mix ratio

Local connectivity

Bandwidth

Alpha

Gamma

Negative sample rate

Spread

Resize Image

Plot Heights:

Plot Widths:

2D UMAP
3D UMAP

Choose download format

Download the UMAP plot