lamindb.curators.DataFrameCurator

class lamindb.curators.DataFrameCurator(dataset, schema)

Bases: Curator

Curator for a DataFrame object.

See also Curator and Schema.

Parameters:
  • dataset (DataFrame) – The DataFrame-like object to validate & annotate.

  • schema (Schema) – A Schema object that defines the validation constraints.

Example:

import lamindb as ln
import bionty as bt

# define valid labels
cell_medium = ln.ULabel(name="CellMedium", is_type=True).save()
ln.ULabel(name="DMSO", type=cell_medium).save()
ln.ULabel(name="IFNG", type=cell_medium).save()
bt.CellType.from_source(name="B cell").save()
bt.CellType.from_source(name="T cell").save()

# define schema
schema = ln.Schema(
    name="small_dataset1_obs_level_metadata",
    otype="DataFrame",
    features=[
        ln.Feature(name="cell_medium", dtype="cat[ULabel[CellMedium]]").save(),
        ln.Feature(name="sample_note", dtype="str").save(),
        ln.Feature(name="cell_type_by_expert", dtype="cat[bionty.CellType]").save(),
        ln.Feature(name="cell_type_by_model", dtype="cat[bionty.CellType]").save(),
    ],
    coerce_dtype=True,
).save()

# curate a DataFrame
df = datasets.small_dataset1(otype="DataFrame")
curator = ln.curators.DataFrameCurator(df, small_dataset1_schema)        artifact = curator.save_artifact(key="example_datasets/dataset1.parquet")
assert artifact.schema == anndata_schema

Methods

save_artifact(*, key=None, description=None, revises=None, run=None)

Save an annotated artifact.

Parameters:
  • key (str | None, default: None) – A path-like key to reference artifact in default storage, e.g., "myfolder/myfile.fcs". Artifacts with the same key form a revision family.

  • description (str | None, default: None) – A description.

  • revises (Artifact | None, default: None) – Previous version of the artifact. Is an alternative way to passing key to trigger a revision.

  • run (Run | None, default: None) – The run that creates the artifact.

Returns:

A saved artifact record.

validate()

Validate dataset.

Raises:

lamindb.errors.ValidationError – If validation fails.

Return type:

None