Visualization

Circular UMAP with plot1cell

Python port of the R package plot1cell's plot_circlize (Wu 2021). Clusters become arc sectors on the unit circle (sector length ∝ log10(n_cells)); the UMAP / t-SNE scatter and KDE contour live *inside* the circle; any list of adata.obs columns you pass to tracks= becomes outer concentric rings.

This notebook walks through four real scales — all pulled straight from CELLxGENE Discover:

Scale	Dataset	Cell types	Tracks shown
10k	Krasnow Lung Cell Atlas (Smart-seq2)	40	donor · compartment · sex · age
50k	Mature kidney	29	compartment · tissue · sex · cell_state
100k	Human distal airways (healthy + COPD)	44	disease · tissue · assay · sex · ethnicity
200k	Lung full cell & nuclei atlas	23	smoking · BMI · age · tissue · assay · sex

Note: above ~10 clusters the function's label_orient='auto' switches from tangent to radial labels so even 40- to 60-type rings stay readable. With 3-6 tracks you can see plot1cell's composition-within-sector behaviour on many variables at once.

Setup

All four datasets are available as public .h5ad files from the CELLxGENE CDN. We cache each under data/ on first run.

import omicverse as ov

ov.style(font_path='arial')



%load_ext autoreload

%autoreload 2

import os

CDN = 'https://datasets.cellxgene.cziscience.com/'

DATA = {

    'lung10k':   'c88e0403-da93-40f4-99b5-f5fdeb81a82c.h5ad',  # Krasnow Smart-seq2

    'kidney50k': '7dafa492-6129-4dff-a794-17bdefde3575.h5ad',  # Mature kidney full

    'airway100k':'861b6b12-f9c9-4434-8d09-695a5156ce23.h5ad',  # distal airways

    'lung200k':  '769fff4f-099a-46e1-917b-06ce1fee858a.h5ad',  # all cells and nuclei

}

os.makedirs('data', exist_ok=True)

def fetch(key):

    local = f'{key}.h5ad'

    if not os.path.exists(local):

        print(f'downloading {key}...')

        ov.datasets.download_data(CDN + DATA[key], local)

    return local

Scale 1 — 10 k cells (Krasnow Lung Cell Atlas, Smart-seq2)

9 409 cells × 40 cell types. This dataset only carries a t-SNE (no UMAP) — ov.pl.plot1cell accepts any 2-D embedding via basis=, so we plot against X_tSNE directly. Four tracks: donor (3), compartment (4 — immune / epithelial / endothelial / stromal), sex (2), age (3).

a10k = ov.read('data/'+fetch('lung10k'))

# Shorten the long development_stage strings for the legend

a10k.obs['age'] = a10k.obs['development_stage'].astype(str).str.replace('-year-old stage', 'y', regex=False)

a10k

ov.pl.plot1cell(

    a10k, clusters='cell_type', basis='X_tSNE',

    tracks=['donor_id', 'compartment', 'sex', 'age'],

    point_size=6, point_alpha=0.5,

    figsize=(9, 9), label_fontsize=7,

)

Scale 2 — 50 k cells (Mature kidney)

40 268 cells × 29 cell types spanning 5 kidney sub-tissues, 12 donors, mixed pediatric / adult / tumour samples. Four tracks: compartment (proximal tubule / non-PT / lymphoid / myeloid), tissue (cortex / medulla / …), sex, and the cell_state (proliferating flag).

a50k = ov.read('data/'+fetch('kidney50k'))

a50k

ov.pl.plot1cell(

    a50k, clusters='cell_type', basis='X_umap',

    tracks=['compartment', 'tissue', 'sex', 'cell_state'],

    point_size=2, point_alpha=0.35,

    figsize=(10, 10), label_fontsize=7,

)

Scale 3 — 100 k cells (distal airways, healthy + COPD)

115 788 cells × 44 cell types across 17 donors and two disease states (normal vs. COPD). Five tracks: disease, tissue (distal / terminal / proximal airway), assay, sex, self_reported_ethnicity.

At this scale the scatter is dense — we dial point size + alpha down so the KDE contour still reads through. Labels are all radial so the 44 cell types don't collide.

a100k = ov.read('data/'+fetch('airway100k'))

a100k

ov.pl.plot1cell(

    a100k, clusters='cell_type', basis='X_umap',

    tracks=['disease', 'tissue', 'assay', 'sex',

            'self_reported_ethnicity'],

    point_size=1, point_alpha=0.25,

    figsize=(11, 11), label_fontsize=6,

)

Scale 4 — 200 k cells (Lung all cells and nuclei)

193 108 cells × 60 fine cell types — we use the curator's mid-level collapse Celltypes_master_higher_immune (23 types) to keep the ring readable. Six tracks: smoking status, BMI range, age range, tissue sub-region, assay, sex.

The full file is ~2 GB. If RAM is tight, you can subsample: adata = adata[np.random.choice(adata.n_obs, 50_000, replace=False)]. The KDE contour is still smooth at 50k sampled points.

a200k = ov.read('data/'+fetch('lung200k'))

# Keep fewer genes to reduce memory — plot1cell only needs obsm + obs

a200k = a200k[:, :200].copy()

a200k

ov.pl.plot1cell(

    a200k, clusters='Celltypes_master_higher_immune',

    basis='X_umap_Harmony_scDonor_snBatch',

    tracks=['Smoking status', 'BMI range', 'Age range',

            'tissue', 'assay', 'sex'],

    point_size=0.8, point_alpha=0.2,

    figsize=(12, 12), label_fontsize=8,

)

Key parameters

Parameter	Purpose
`clusters`	obs column with the cluster label (required).
`basis`	obsm key, default `'X_umap'`. Any 2-D embedding works (`X_tSNE`, `X_pca`, harmony UMAPs, …).
`tracks`	list of obs columns → one concentric ring each, coloured by the run-length composition within each cluster sector. No hard limit; 6 rings render fine.
`coord_scale`	how much of the unit circle the scatter fills (0–1, default 0.8).
`contour_levels`	KDE levels to overlay; `None` disables.
`label_orient`	`'auto'` (default) \	`'tangent'` \	`'radial'`. Auto uses tangent for ≤10 clusters (classic R look) and radial above that so labels never overlap.
`gap_between_deg`, `gap_start_deg`	angular gaps between sectors (2°) and at the start of the circle (12°) — match the R convention.
`cluster_palette`, `track_palette`	override the default `ov` palette. Accepts a colormap name or a list of colors.
`bg_color`	canvas colour (default the R parchment `'#F9F2E4'`). Use `'white'` for a plain look.
`point_size`, `point_alpha`	for dense scatters (> 50k points) drop both to keep the KDE contour visible through the cloud.
`return_data=True`	also return the per-cell dataframe used internally.