Training Data Platform in One Application
Complete training data platform for machine learning delivered as a single application.
Who is Diffgram for?
Data Scientists, Project Admins, Software Engineers,
Data Annotators and Subject Matter Experts.
What problem does Diffgram solve?
The current state of multiple tools is a huge pain.
Integrating these complex toolchains, context switching, human training,
costs for multiple applications all add up.
Diffgram solves this by bringing all the functions of a complex
toolchain directly into one application. Replacing as many as 9 tools
with one single integrated application.
What are Diffgram’s competitive advantages?
One single Application with all the features you require.
Enterprise Questions? Please contact us.
Support & Community
Try Diffgram Online (Hosted Service, No Setup.)
Diffgram Dev Installer Quickstart
Requires Docker and Docker Compose
git clone https://github.com/diffgram/diffgram.git cd diffgram pip install -r requirements.txt python install.py # Follow the installer instruction and # After install: View the Web UI at: http://localhost:8085
Read also our Docker compose commands cheat-sheet
Bugs and Issues
If you see any missing features, bugs etc please report them
ASAP to diffgram/issues.
- Google GCP Install Guide Compute Engine
- Azure AKS Kubernetes Install Guide
- AWS Full Kubernetes Guide
- Helm Chart for Kubernetes Clusters
Other Getting Started Docs:
What is Diffgram a drop in replacement for?
Diffgram is a drop in replacement for the following systems:
Labelbox, CVAT, SuperAnnotate, Label Studio (Heartex),
V7 Labs (Darwin), BasicAI, SuperbAI, Kili-Technology, HastyAI, Dataloop, Keymakr.
Directionally, Diffgram will replace Aquarium Learning, Scale Nucleus, DVC, ActiveLoop and more.
How much does this cost?
First, the Full Platform is Open Source. There is no trick where it "sort of works"
but you need to pay for a SaaS service to really use it. This is the full core product.
What limits are there on the free edition?
If you have less then 20 people using it there are few intentional limits you are likely to bump into.
If you have 20+ people using it we encourage you to consider Enterprise Edition.
If you have 30+ people using for an extended period of time without upgrading
to Enterprise you may hit some limits.
We reserve the right to publish more assertive limits in future editions to encourage
good enterprise citizens who are happy and using the software to support it via Enterprise.
This is an ACTIVE project. We are very open to feedback and encourage you to create Issues and help us grow!
- NEW Streamlined Annotation UI suitable both from "First Time" Subject Matter Experts, and powerful options for Professional Full Time Annotators
Ingest prediction data without a software engineer.
- NEW Import Wizard saves you hours having to map your data (pre-labels, QA, debug etc.).
- All-Cloud Integrated File Browser
Collaboration across teams between machine learning, product, ops, managers, and more.
- Store virtually any scale of dataset and instantly access slices of the data to avoid having to download/unzip/load.
- Fast access to datasets from multiple machines. Have multiple Data Scientists working on the same data.
- Integrates with your tools and 3rd party workforces. Integrations
It's a database for your training data, both metadata and access of raw BLOB data (over top of your storage choice).
Manage Annotation Workflow, Tasks, Quality Assurance and more.
- One click create human review Pipelines.
- Webhooks with Actions
- Easily annotate a single dataset, or scale to hundreds of projects with
thousands of subdivided task sets. Includes easy search and filtering.
- Fully integrated customizable Annotation Reporting.
- Continually upgrade your data, including easily adding more depth
to existing partially annotated sets.
Fully featured data annotation tool for images and video to create, update, and maintain high quality training datasets.
- Quality Image and Video Annotation.
- Semantic Segmentation Focus Autobordering, turbo mode and more
- Video Annotation High resolution, high frame rate, multiple sequences.
- Automation Examples
- Build your own interactions
- Play with model parameters, and see the results in real time (Coming Soon)
General purpose automation language, solve any annotation automation challenge.
Less annotation and automation costs.
Stream to Training - Coming Soon
Easier and faster for data science. Less compute cost. More privacy controls.
Load streaming data from Diffgram directly into pytorch and tensorflow with one line (coming soon)
Skip downloading and unzipping massive datasets. Explore data instantly through the browser.
- NEW Data Explorer:
Visualize in seconds multiple datasets (Including Video!) and compare models easily without extra computation. Try it now (click Dataset Explorer)
- Automatic Dataset Versioning and user definable datasets.
- Collaborate share and comment on specific instances with a Diffgram Permalink.
Use your models to debug the human. Visually see errors.
Diffgram is an amazing way to access, view, compare, and collaborate on datasets to
create the highest quality models. Because these features are fully integrated with the Annotation Tooling, it's absolutely seamless to go from spotting an issue, to creating a labeling campaign, updating schema, etc to correct it.
- Uncover bad data and edge cases
- Curate data and send for labeling with one click
- Automatic error highlighting (Coming Soon)
Secure and Private
- Runs on your local system or cloud. Less lag, more secure, more control. Security and Privacy
- Enforce PII & RBAC automatically across life-cycle of
training data from ingest to dataset to model predictions and back again (Coming Soon)
Tested and Stable Core
Fully integrated automatic test suite, with comprehensive End to End tests and many unit tests.
Flexible & Scaleable
- Flexible deploy and many integrations - run Diffgram anywhere in the way you want.
- Scale every aspect - from volume of data, to number of supervisors, to ML speed up approaches.
- Fully featured - 'batteries included'.
- Application: Support all popular media types for raw data; all popular schema, label, and attribute needs; and all annotation assist speed up approaches
- Support all popular training data management and organizational needs
- Integrate with all popular 3rd party applications and related offerings
- Support modification of source code
- Run on any hardware, any cloud, and anywhere
Speed Ups & AI
Latest AI + More
- Diffgram Python SDK
- Diffgram API Any language
- AWS - Amazon Storage
- GCP Google Storage
- Azure - Now available
- Scale AI
- Submit a pull request! We want your integration here too
Note for initial open core release Actions Hooks are not yet available.
Please see Diffgram.com and use them there if needed.
We welcome contributions! Please see our contributing documentation.
Architecture & Design Docs
We plan to release more internal architecture docs over time. Please see the general docs in the mean time.
IMPORTANT Disclaimer: Our opinions based on how we define the above categories. Subject to change. A vendor may offer something in one of these categories that doesn’t meet our definition of the category. Some Diffgram checkmarks include items coming soon.