An API to interact with the file system hosting the JupyterHub home directories.
TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters
A Multilingual Latent Dirichlet Allocation (LDA) Pipeline with Stop Words Removal, n-gram features, and Inverse Stemming, in Python
A speaker diarization framework that integrates two complementary major diarization approaches
HDBSCAN – Hierarchical Density-Based Spatial Clustering of Applications with Noise. Performs DBSCAN over varying epsilon values and integrates the result to find a clustering that gives the best stability over epsilon.
dedupe is a python library that uses machine learning to perform fuzzy matching, deduplication and entity resolution quickly on structured data.
The milatools package provides the mila command, which is meant to help with connecting to and interacting with the Mila cluster.
Submitit is a lightweight tool for submitting Python functions for computation within a Slurm cluster.