Python implementation of feature extraction with KNN.
- Python 3.x
git clone [email protected]:upura/knnFeat.git cd knnFeat pip install -r requirements.txt
Notebook version can be seen here.
Packages for visualization
import numpy as np %matplotlib inline import matplotlib.pyplot as plt
x0 = np.random.rand(500) - 0.5 x1 = np.random.rand(500) - 0.5 X = np.array(list(zip(x0, x1))) y = np.array([1 if i0 * i1 > 0 else 0 for (i0, i1) in list(zip(x0, x1))])
Feature extraction with KNN
from knnFeat import knnExtract newX = knnExtract(X, y, k = 1, holds = 5)
Quote from here.
It generates k * c new features, where c is the number of class labels. The new features are computed from the distances between the observations and their k nearest neighbors inside each class, as follows:
- The first test feature contains the distances between each test instance and its nearest neighbor inside the first class.
- The second test feature contains the sums of distances between each test instance and its 2 nearest neighbors inside the first class.
- The third test feature contains the sums of distances between each test instance and its 3 nearest neighbors inside the first class.
- And so on.
This procedure repeats for each class label, generating k * c new features. Then, the new training features are generated using a n-fold CV approach, in order to avoid overfitting.