50%+ Faster, 50%+ less RAM usage, GPU support re-written Sklearn, Statsmodels combo with new novel algorithms.
HyperLearn is written completely in PyTorch, NoGil Numba, Numpy, Pandas, Scipy & LAPACK, and mirrors (mostly) Scikit Learn. HyperLearn also has statistical inference measures embedded, and can be called just like Scikit Learn's syntax (model.confidence_interval_)
Comparison of Speed / Memory
|QDA (Quad Dis A)||1000000||100||54.2||22.25||2,700||1,200||Now parallelized|
|LinearRegression||1000000||100||5.81||0.381||700||10||Guaranteed stable & fast|
Time(s) is Fit + Predict. RAM(mb) = max( RAM(Fit), RAM(Predict) )
1. Embarrassingly Parallel For Loops
- Including Memory Sharing, Memory Management
- CUDA Parallelism through PyTorch & Numba
2. 50%+ Faster, 50%+ Leaner
- Matrix Multiplication Ordering: https://en.wikipedia.org/wiki/Matrix_chain_multiplication
- Element Wise Matrix Multiplication reducing complexity to O(n^2) from O(n^3): https://en.wikipedia.org/wiki/Hadamard_product_(matrices)
- Reducing Matrix Operations to Einstein Notation: https://en.wikipedia.org/wiki/Einstein_notation
- Evaluating one-time Matrix Operations in succession to reduce RAM overhead.
- If p>>n, maybe decomposing X.T is better than X.
- Applying QR Decomposition then SVD might be faster in some cases.
- Utilise the structure of the matrix to compute faster inverse (eg triangular matrices, Hermitian matrices).
- Computing SVD(X) then getting pinv(X) is sometimes faster than pure pinv(X)
3. Why is Statsmodels sometimes unbearably slow?
- Confidence, Prediction Intervals, Hypothesis Tests & Goodness of Fit tests for linear models are optimized.
- Using Einstein Notation & Hadamard Products where possible.
- Computing only what is neccessary to compute (Diagonal of matrix and not entire matrix).
- Fixing the flaws of Statsmodels on notation, speed, memory issues and storage of variables.
4. Deep Learning Drop In Modules with PyTorch
- Using PyTorch to create Scikit-Learn like drop in replacements.
5. 20%+ Less Code, Cleaner Clearer Code
- Using Decorators & Functions where possible.
- Intuitive Middle Level Function names like (isTensor, isIterable).
- Handles Parallelism easily through hyperlearn.multiprocessing
6. Accessing Old and Exciting New Algorithms
- Matrix Completion algorithms - Non Negative Least Squares, NNMF
- Batch Similarity Latent Dirichelt Allocation (BS-LDA)
- Correlation Regression
- Feasible Generalized Least Squares FGLS
- Outlier Tolerant Regression
- Multidimensional Spline Regression
- Generalized MICE (any model drop in replacement)
- Using Uber's Pyro for Bayesian Deep Learning