# Python Implementation of Feature Extraction with K-Nearest Neighbor

## knnFeat

Python implementation of feature extraction with KNN.

## Requirements

- Python 3.x
- numpy
- scikit-learn
- scipy

## Install

```
git clone [email protected]:upura/knnFeat.git
cd knnFeat
pip install -r requirements.txt
```

## Demo

Notebook version can be seen here.

### Packages for visualization

```
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
```

### Data generation

```
x0 = np.random.rand(500) - 0.5
x1 = np.random.rand(500) - 0.5
X = np.array(list(zip(x0, x1)))
y = np.array([1 if i0 * i1 > 0 else 0 for (i0, i1) in list(zip(x0, x1))])
```

### Visualization

### Feature extraction with KNN

```
from knnFeat import knnExtract
newX = knnExtract(X, y, k = 1, holds = 5)
```

### Visualization

## Algorithm

Quote from here.

It generates k * c new features, where c is the number of class labels. The new features are computed from the distances between the observations and their k nearest neighbors inside each class, as follows:

- The first test feature contains the distances between each test instance and its nearest neighbor inside the first class.
- The second test feature contains the sums of distances between each test instance and its 2 nearest neighbors inside the first class.
- The third test feature contains the sums of distances between each test instance and its 3 nearest neighbors inside the first class.
- And so on.

This procedure repeats for each class label, generating k * c new features. Then, the new training features are generated using a n-fold CV approach, in order to avoid overfitting.