Spatial

Analyze 10x Atera (WTA Preview) FFPE breast cancer data

Atera is 10x Genomics' next-generation, in-situ, whole-transcriptome assay built on top of the Xenium platform. Compared with Xenium's gene-panel chemistry, Atera v1 expands coverage to ~18,000 human gene targets while keeping the cell-segmentation, polygon, and OME-TIFF morphology imaging that Xenium tutorials are written against.

An Atera outs/ bundle ships a Xenium-format core plus three additions:

File / folder	Origin	Notes
`cell_feature_matrix.h5`	shared	gene × cell sparse counts (10x HDF5)
`cells.parquet`	shared	per-cell metadata + centroids (microns)
`cell_boundaries.parquet`	shared	per-cell polygon vertices
`experiment.xenium`	shared	pipeline metadata (JSON)
`nucleus_boundaries.parquet`	Atera-only	per-cell nucleus polygon vertices
`morphology_focus/ch####_<tag>.ome.tif`	Atera-only	named multi-stain images (DAPI, ATP1A1+CD45+E-Cad, 18S, αSMA+Vim)
`*_cell_groups.csv`	optional	vendor-shipped cell-type classifier (cell_id → group → display color)
`_he_image.ome.tif` + `_he_alignment.csv`	optional	registered H&E whole-slide image and 3×3 affine

OmicVerse's ov.io.spatial.read_atera mirrors read_xenium plus loaders for the four Atera-only items. This notebook walks through:

loading an Atera FFPE breast-cancer bundle into a single AnnData,
inspecting the four morphology stain channels,
plotting the vendor-supplied cell-type segmentation in spatial coords,
zooming into a small region with ov.pl.spatialseg to render cell polygons,
running standard preprocessing (filter, normalize, HVG, PCA),
plotting per-gene spatial expression for canonical breast-cancer markers.

Environment setup

import omicverse as ov

ov.style(font_path='Arial')



%load_ext autoreload

%autoreload 2

ov.settings.cpu_gpu_mixed_init()

Inspect the Atera bundle

Atera ships its primary outputs as a single outs.zip. For tutorials we extract the small files (matrices, parquet, JSON) plus the four morphology focus channels — the giant morphology.ome.tif (~15 GB, full multi-z stack) and transcripts.parquet (~10 GB) are not needed for cell-level downstream analysis.

Companion files (cell_groups CSV, H&E image + alignment CSV) sit *next to* outs/ in the public 10x dataset page rather than inside it.

from pathlib import Path



ATERA = Path("data/atera_breast_cancer")



# Show top-level files + the morphology_focus channel TIFFs (skip GB-scale OME-TIFFs).

for p in sorted(ATERA.iterdir()):

    if p.is_dir():

        print(f"  {p.name}/")

        for k in sorted(p.iterdir()):

            print(f"      {k.name}  ({k.stat().st_size / 1024**2:,.1f} MB)")

    else:

        size_mb = p.stat().st_size / 1024**2

        # Skip the multi-GB H&E OME-TIFF and outs.zip from the printout.

        if size_mb < 500:

            print(f"  {p.name}  ({size_mb:,.1f} MB)")

        else:

            print(f"  {p.name}  ({size_mb / 1024:,.1f} GB)  # large, not loaded directly")

Load the dataset with `read_atera`

read_atera returns an AnnData with:

X: cells × genes sparse counts (control probes / codewords are dropped automatically; only Gene Expression features are kept).
obsm['spatial']: cell centroids in microns.
obs: cells.parquet metadata, plus geometry (cell polygon WKT), nucleus_geometry (nucleus polygon WKT), and — when cell_groups_csv is passed — cell_group and cell_group_color columns from the vendor's classifier.
uns['spatial'][library_id]: images['hires'] (the chosen morphology channel), scalefactors that map microns → image pixels, and the full experiment.xenium metadata dict.

We start by selecting the dapi channel for the morphology image. Atera multi-channel selection accepts either a semantic tag ('dapi', 'boundary', 'rna', 'stroma'), a substring ('cd45', '18s'), or an integer-as-string index ('0'–'3').

adata = ov.io.spatial.read_atera(

    ATERA,

    image_key='dapi',

    image_max_dim=2048,

    cell_groups_csv=ATERA / 'WTA_Preview_FFPE_Breast_Cancer_cell_groups.csv',

    cache_file=ATERA / 'atera_dapi.h5ad',

)

adata

The experiment.xenium metadata is preserved verbatim under uns['spatial']['<library>']['metadata']. The pixel_size field (in microns) defines how spatial centroids are converted into image-pixel coordinates downstream.

library_id = list(adata.uns['spatial'].keys())[0]

meta = adata.uns['spatial'][library_id]['metadata']

for k in ['run_name', 'region_name', 'preservation_method', 'panel_name',

          'panel_num_targets_predesigned', 'chemistry_version', 'pixel_size',

          'num_cells', 'transcripts_per_cell']:

    print(f"  {k:30s} {meta.get(k)}")

Quick QC: per-cell distributions

Atera's cells.parquet already carries per-cell transcript_counts, cell_area, and nucleus_count. A first sanity check is to verify the distributions look reasonable: a mean of ~2,000 transcripts/cell with cell areas in the tens of µm² is typical for Atera v1.

import matplotlib.pyplot as plt

import numpy as np

import pandas as pd



fig, axes = plt.subplots(1, 4, figsize=(15, 3.2))

for ax, (col, log) in zip(axes, [

    ('transcript_counts', True),

    ('cell_area',        False),

    ('nucleus_area',     False),

    ('nucleus_count',    False),

]):

    vals = adata.obs[col].astype(float).to_numpy()

    if log:

        vals = np.log10(vals + 1)

        ax.set_xlabel(f'log10({col} + 1)')

    else:

        ax.set_xlabel(col)

    ax.hist(vals, bins=60, color='#4477aa', edgecolor='white', linewidth=0.3)

    ax.set_ylabel('cells')

    ax.spines[['right', 'top']].set_visible(False)

fig.suptitle(f'Per-cell QC ({adata.n_obs:,} cells)', y=1.02)

fig.tight_layout()

plt.show()

Inspect the four morphology channels

Atera ships morphology imaging as four separate stain channels rather than a single composite. Each channel is a standalone OME-TIFF pyramid:

Channel	Stain	Purpose
`ch0000_dapi.ome.tif`	DAPI	nucleus
`ch0001_atp1a1_cd45_e-cadherin.ome.tif`	ATP1A1 + CD45 + E-Cadherin	cell boundary
`ch0002_18s.ome.tif`	18S rRNA	RNA / cell mass
`ch0003_alphasma_vimentin.ome.tif`	αSMA + Vimentin	stromal cells

We re-read the bundle with each channel in turn. read_atera only loads the morphology image — the matrix and metadata reuse from cache_file are *not* triggered when re-reading directly from the source path with a different image_key, but image loading is cheap (~15 s per channel at 2048 px max) so we just call it four times.

channel_keys = [('dapi',     'DAPI (nucleus)'),

                ('boundary', 'ATP1A1 / CD45 / E-Cadherin (cell boundary)'),

                ('rna',      '18S rRNA'),

                ('stroma',   'αSMA / Vimentin (stroma)')]



channel_imgs = {}

for key, _ in channel_keys:

    a = ov.io.spatial.read_atera(

        ATERA, image_key=key, image_max_dim=2048,

        load_boundaries=False, load_nucleus_boundaries=False,

    )

    channel_imgs[key] = a.uns['spatial'][library_id]['images']['hires']



# Register every channel under a semantic key so ov.pl.spatialseg can use any

# of them as background via `img_key=...`. ov.pl.to_rgb_grayscale converts each

# 2-D channel to a contrast-clipped RGB stack — without this matplotlib's

# default viridis colormap kicks in and washes the per-channel structure into

# a uniform purple background. All four channels share the same

# `tissue_hires_scalef` (loaded at the same `image_max_dim`).

spatial_block = adata.uns['spatial'][library_id]

scalef = spatial_block['scalefactors']['tissue_hires_scalef']

for key, _ in channel_keys:

    spatial_block['images'][key] = ov.pl.to_rgb_grayscale(channel_imgs[key])

    spatial_block['scalefactors'][f'tissue_{key}_scalef'] = scalef

spatial_block['images']['hires'] = spatial_block['images']['dapi']

print('available background channels:', list(spatial_block['images']))

fig, axes = plt.subplots(2, 2, figsize=(12, 12))

for ax, (key, title) in zip(axes.flat, channel_keys):

    img = channel_imgs[key]

    # Per-channel 99th-percentile contrast clip — Atera stains have a long tail

    # of bright outliers that flatten the rest of the image if not clipped.

    vmax = np.percentile(img[img > 0], 99) if (img > 0).any() else img.max()

    ax.imshow(img, cmap='gray', vmin=0, vmax=vmax)

    ax.set_title(f'{key}: {title}', fontsize=10)

    ax.set_xticks([]); ax.set_yticks([])

fig.suptitle(f'Atera morphology focus channels ({img.shape[1]}×{img.shape[0]} px)',

             y=0.94, fontsize=11)

fig.tight_layout()

plt.show()

Vendor cell-group spatial map

Atera ships a CSV mapping every cell_id to a curated cell-type label and a display color (*_cell_groups.csv). The 10x team produces these labels with a downstream classifier on top of the segmentation, and they make a useful sanity reference for our own clustering later.

We plot every cell as a tiny dot using its centroid and the vendor color directly.

print(adata.obs['cell_group'].value_counts().head(10))

print()

print(f"{adata.obs['cell_group'].nunique()} groups in total")

# Pull vendor display colours into uns['cell_group_colors'] so ov.pl.spatial

# uses them as the categorical palette automatically.

ov.pl.sync_categorical_palette(adata, key='cell_group', color_obs='cell_group_color')



ov.pl.spatial(

    adata,

    color='cell_group',

    img_key=None,

    size=10,

    show=False,

)

plt.gcf().suptitle(f'Vendor cell-group classifier ({adata.n_obs:,} cells, '

                   f"{adata.obs['cell_group'].nunique()} groups)",

                   y=1.02, fontsize=11)

plt.show()

Render cell polygons in a small region

ov.pl.spatialseg reads the WKT polygons stored in obs['geometry'] and renders each cell as its segmentation outline. Plotting all 170k polygons at once would dwarf the visible features, so we subset to a 1 mm × 1 mm window first.

Background image selection: by registering all four morphology channels under semantic keys ('dapi', 'boundary', 'rna', 'stroma') in uns['spatial'][lib]['images'], ov.pl.spatialseg can render polygons over *any* channel via img_key=.... For the cell-segmentation view the boundary channel (ATP1A1+CD45+E-Cadherin membrane stain) is the most informative — the polygons trace exactly the structures the stain lights up.

# Pick a 1 mm × 1 mm window centred on the densest tumour patch and reuse

# the same window for both the cell subset and the spatialseg crop_coord —

# this keeps the rendered background image aligned to the polygon extent.

x0, y0 = np.median(adata.obsm['spatial'], axis=0)

crop_window = (x0 - 500, x0 + 500, y0 - 500, y0 + 500)  # (x0, x1, y0, y1)

bdata = ov.space.subset_window(adata,

                                xlim=(crop_window[0], crop_window[1]),

                                ylim=(crop_window[2], crop_window[3]))

print(f'Subset window: {bdata.n_obs:,} cells')

fig, axes = plt.subplots(2, 2, figsize=(14, 14))

for i, (ax, (key, label)) in enumerate(zip(axes.flat, channel_keys)):

    ov.pl.spatialseg(

        bdata,

        color='cell_group',

        img_key=key,

        crop_coord=crop_window,

        edges_color='white',

        edges_width=0.4,

        alpha=0.35,         # let the morphology dominate — was 0.85

        alpha_img=1.0,

        ax=ax,

        legend=(i == len(channel_keys) - 1),

        show=False,

    )

    ax.set_title(f'{key}: {label}', fontsize=10)

fig.suptitle('Cell polygons (vendor cell_group) over each morphology channel',

             y=0.94, fontsize=12)

plt.tight_layout()

plt.show()

Standard preprocessing

We follow the Visium HD / Xenium recipe: filter very-low-count cells, normalize counts to a fixed total, log-transform, and select highly-variable genes. The Atera matrix is dense enough (median ~2k counts/cell) that a count-based filter is gentle.

adata.layers['counts'] = adata.X.copy()

ov.pp.filter_cells(adata, min_counts=20)

ov.pp.filter_genes(adata, min_cells=10)

ov.pp.normalize_total(adata)

ov.pp.log1p(adata)

print(adata)

ov.pp.highly_variable_genes(adata, n_top_genes=2000, batch_key=None)

adata.raw = adata

adata = adata[:, adata.var['highly_variable']].copy()

print(f'Kept {adata.n_vars} HVGs')

ov.pp.scale(adata)

ov.pp.pca(adata, layer='scaled', n_pcs=50)

adata

Spatial maps for canonical breast-cancer markers

With the matrix log-normalised, the same obsm['spatial'] is the natural axis for visualising any gene. We pick four genes that should split cleanly between the cell types in the vendor classifier:

KRT8: luminal-epithelial keratin (DCIS / luminal-like cells).
PTPRC: CD45 — pan-immune.
COL1A1: collagen — CAF / stromal.
PECAM1: CD31 — endothelial.

We use .raw so we can plot any gene that wasn't selected as HVG.

marker_genes = ['KRT8', 'PTPRC', 'COL1A1', 'PECAM1']

available = [g for g in marker_genes if g in adata.raw.var_names]

print('Available markers:', available)



ov.pl.spatial(

    adata,

    color=available,

    use_raw=True,

    img_key=None,

    size=10,

    vmax='p99.2',     # scanpy parses 'p<N>' as the N-th percentile per panel

    cmap='magma',

    show=False,

)

plt.gcf().suptitle('Spatial expression (log-normalised) of canonical markers',

                   y=1.02, fontsize=11)

plt.show()

Marker spatialseg over each morphology channel

Re-creating the 1 mm × 1 mm subset on the post-preprocessing matrix lets us overlay marker expression on top of the cell polygons. Instead of fixing the background to DAPI, we pair each marker with a different morphology channel — every panel shows the same polygon set with the *same* zoom window but a *different* image behind it, so the multi-channel flexibility of ov.pl.spatialseg(..., img_key=...) is visible at a glance.

Channel pairing (rationale):

Marker	Background channel	Why
`KRT8` (luminal keratin)	`dapi` (nuclei)	epithelial nuclei light up where KRT8 is expressed
`PTPRC` / CD45 (immune)	`boundary` (ATP1A1+CD45+E-Cad)	CD45 lives in the boundary stain itself
`COL1A1` (collagen)	`stroma` (αSMA+Vim)	both highlight stromal compartment
`PECAM1` / CD31 (endothelial)	`rna` (18S)	endothelial cells stand out against generic RNA

ov.pl.spatialseg doesn't parse the 'p99.2' percentile string (only ov.pl.spatial does), so we compute the per-gene 99.2-th percentile up front and pass it through as a float vmax.

bdata = ov.space.subset_window(adata,

                                xlim=(crop_window[0], crop_window[1]),

                                ylim=(crop_window[2], crop_window[3]))

print(f'Subset for spatialseg: {bdata.n_obs:,} cells')



# Re-attach the multi-channel uns so spatialseg can find each `img_key`.

bdata.uns['spatial'] = adata.uns['spatial']



# Materialise per-gene log-normalised expression onto bdata.obs so spatialseg

# can colour polygons directly. p99.2 vmax per gene clips the bright tail.

raw = adata.raw.to_adata()

raw_b = raw[bdata.obs_names].copy()

for gene in available:

    bdata.obs[f'{gene}_expr'] = raw_b[:, gene].X.toarray().ravel()



# Pair each marker with a different morphology channel — see the markdown

# above for the rationale. Order matches `available`: KRT8/PTPRC/COL1A1/PECAM1.

pairings = list(zip(available, ['dapi', 'boundary', 'stroma', 'rna']))



fig, axes = plt.subplots(2, 2, figsize=(14, 14))

for ax, (gene, ch) in zip(axes.flat, pairings):

    expr = bdata.obs[f'{gene}_expr'].to_numpy()

    nz = expr[expr > 0]

    vmax_val = float(np.percentile(nz, 99.2)) if nz.size else float(expr.max() or 1.0)

    ov.pl.spatialseg(

        bdata,

        color=f'{gene}_expr',

        img_key=ch,

        crop_coord=crop_window,

        edges_color='white',

        edges_width=0.4,

        alpha=0.55,

        alpha_img=1.0,

        cmap='magma',

        vmax=vmax_val,

        ax=ax,

        show=False,

    )

    ax.set_title(f'{gene} on {ch} (vmax≈{vmax_val:.2f})', fontsize=10)

fig.suptitle('Marker expression on cell polygons across morphology channels',

             y=0.94, fontsize=12)

plt.tight_layout()

plt.show()

Summary

In this notebook we used omicverse.io.spatial.read_atera to:

load a 170,057-cell × 18,028-gene Atera v1 FFPE breast-cancer dataset into a single AnnData,
inspect Atera's four-channel morphology focus stack (DAPI / boundary / 18S / stroma),
merge the vendor cell_groups.csv classifier directly into obs,
render cell polygons via ov.pl.spatialseg for region-of-interest views,
run a standard normalize → HVG → PCA preprocessing pipeline,
map canonical breast-cancer markers (KRT8, PTPRC, COL1A1, PECAM1) in physical space.

Atera's outs/ layout is a strict superset of Xenium's, so any downstream OmicVerse spatial workflow (ov.space.svg, ov.pl.spatial, neighborhood graphs, leiden clustering, cell-cell communication) works without modification — read_atera is the only thing that has to know about the extras (nucleus polygons, channel-named morphology, cell-group CSV, optional H&E + alignment).