The dataTap Python library is the primary interface for using dataTap's rich data management tools. Create datasets, stream annotations, and analyze model performance all with one library.


  • [x] ⚡ Begin training instantly
  • [x] 🔥 Works with all major ML frameworks (Pytorch, TensorFlow, etc.)
  • [x] 🛰️ Real-time streaming to avoid large dataset downloads
  • [x] 🌐 Universal data format for simple data exchange
  • [x] 🎨 Combine data from multiples sources into a single dataset easily
  • [x] 🧮 Rich ML utilities to compute PR-curves, confusion matrices, and accuracy metrics.
  • [x] 💽 Free access to a variety of open datasets.

Getting Started (Platform)

To begin, select a dataset from the dataTap repository.

Then copy the starter code based on your library preference.

Paste the starter code and start training.

Getting Started (API)

Install the client library.

pip install datatap

Register at Then, go to Settings > Api Keys to find your personal API key.


Start using open datasets instantly.

from datatap import Api

api = Api()
coco = api.get_default_database().get_repository("_/coco")
dataset = coco.get_dataset("latest")
print("COCO: ", dataset)

Data Streaming Example

import itertools
from datatap import Api

api = Api()
dataset = (api

training_stream = dataset_version.stream_split("training")
for annotation in itertools.islice(training_stream, 5):
    print("Received annotation:", annotation)