featureform

Embedding Store workflow PyPi Downloads Featureform Slack

Python supported PyPi Version featureform Website Twitter

What is Embeddinghub?

Embeddinghub is a database built for machine learning embeddings. It is built with four goals in mind.

  • Store embeddings durably and with high availability
  • Allow for approximate nearest neighbor operations
  • Enable other operations like partitioning, sub-indices, and averaging
  • Manage versioning, access control, and rollbacks painlessly

drawing

Features

  • Supported Operations: Run approximate nearest neighbor lookups, average multiple embeddings, partition tables (spaces), cache locally while training, and more.
  • Storage: Store and index billions vectors embeddings from our storage layer.
  • Versioning: Create, manage, and rollback different versions of your embeddings.
  • Access Control: Encode different business logic and user management directly into Embeddinghub.
  • Monitoring: Keep track of how embeddings are being used, latency, throughput, and feature drift over time.

What is an Embedding?

Embeddings are dense numerical representations of real-world objects and relationships, expressed as a vector. The vector space quantifies the semantic similarity between categories. Embedding vectors that are close to each other are considered similar. Sometimes, they are used directly for “Similar items to this” section in an e-commerce store. Other times, embeddings are passed to other models. In those cases, the model can share learnings across similar items rather than treating them as two completely unique categories, as is the case with one-hot encodings. For this reason, embeddings can be used to accurately represent sparse data like clickstreams, text, and e-commerce purchases as features to downstream models.

Further Reading

Getting Started

Step 1: Install Embeddinghub client

Install the Python SDK via pip

pip install embeddinghub

Step 2: Deploy Docker container ( optional )

The Embeddinghub client can be used without a server. This is useful when using embeddings in a research environment where a database server is not necessary. If that’s the case for you, skip ahead to the next step.

Otherwise, we can use this docker command to run Embeddinghub locally and to map the container’s main port to our host’s port.

docker run featureformcom/embeddinghub -p 7462:7462

Step 3: Initialize Python Client

If you deployed a docker container, you can initialize the python client.

import embeddinghub as eh

hub = eh.connect(eh.Config())

Otherwise, you can use a LocalConfig to store and index embeddings locally.

hub = eh.connect(eh.LocalConfig("data/"))

Step 4: Create a Space

Embeddings are written and retrieved from Spaces. When creating a Space we must also specify a version, otherwise a default version is used.

space = hub.create_space("quickstart", dims=3)

Step 5: Upload Embeddings

We will create a dictionary of three embeddings and upload them to our new quickstart space.

embeddings = {
    "apple": [1, 0, 0],
    "orange": [1, 1, 0],
    "potato": [0, 1, 0],
    "chicken": [-1, -1, 0],
}
space.multiset(embeddings)

Step 6: Get nearest neighbors

Now we can compare apples to oranges and get the nearest neighbors.

neighbors = space.nearest_neighbors(key="apple", num=2)
print(neighbors)

Contributing

Report Issues

Please help us by reporting any issues you may have while using Embeddinghub.

License

GitHub

GitHub - featureform/embeddinghub: A storage engine for vector machine learning embeddings.
A storage engine for vector machine learning embeddings. - GitHub - featureform/embeddinghub: A storage engine for vector machine learning embeddings.