xarray-schema

Schema validation for Xarray

CI
MIT License

installation

This package is in the early stages of development. Install it from source:

pip install git+git://github.com/carbonplan/xarray-schema

usage

Xarray-schema’s API is modeled after Pandera. The DataArraySchema and DatasetSchema objects both have .validate() methods.

The basic usage is as follows:

import numpy as np
import xarray as xr
from xarray_schema import DataArraySchema, DatasetSchema

da = xr.DataArray(np.ones(4, dtype='i4'), dims=['x'], name='foo')

schema = DataArraySchema(dtype=np.integer, name='foo', shape=(4, ), dims=['x'])

schema.validate(da)

roadmap

This is a very early prototype of a library. Some key things are missing:

  1. Validation of coords, chunks, and attrs. None of these are implemented yet.
  2. Class-based schema’s for parts of the Xarray data model. Most validations are currently made as direct comparisons (da.name == self.name) but a more robust approach is possible that leverages classes for each component of the data model. We’re already handling some special cases using None as a sentinel value to allow for wildcard-like behavior in places (i.e. dims and shape)
  3. Exceptions: Pandera accumulates schema exceptions and reports them all at once. Currently, we are a eagerly raising SchemaErrors when the are found.

license

All the code in this repository is MIT licensed, but we request that you please provide attribution if reusing any of our digital content (graphics, logo, articles, etc.).

about us

CarbonPlan is a non-profit organization working on the science and data of carbon removal. We aim to improve the transparency and scientific integrity of carbon removal and climate solutions through open data and tools. Find out more at carbonplan.org or get in touch by opening an issue or sending us an email.

GitHub

View Github