blazingsql

A lightweight, GPU accelerated, SQL engine built on the RAPIDS.ai ecosystem.

BlazingSQL is a SQL interface for cuDF, with various features to support large scale data science workflows and enterprise datasets.

Query Data Stored Externally - a single line of code can register remote storage solutions, such as Amazon S3.
Simple SQL - incredibly easy to use, run a SQL query and the results are GPU DataFrames (GDFs).
Interoperable - GDFs are immediately accessible to any RAPIDS library for data science workloads.

Try our 5-min Welcome Notebook to start using BlazingSQL and RAPIDS AI.

Getting Started

Here's two copy + paste reproducable BlazingSQL snippets, keep scrolling to find example Notebooks below.

Create and query a table from a cudf.DataFrame with progress bar:

import cudf

df = cudf.DataFrame()

df['key'] = ['a', 'b', 'c', 'd', 'e']
df['val'] = [7.6, 2.9, 7.1, 1.6, 2.2]

from blazingsql import BlazingContext
bc = BlazingContext(enable_progress_bar=True)

bc.create_table('game_1', df)

bc.sql('SELECT * FROM game_1 WHERE val > 4') # the query progress will be shown

	Key	Value
0	a	7.6
1	b	7.1

Create and query a table from a AWS S3 bucket:

from blazingsql import BlazingContext
bc = BlazingContext()

bc.s3('blazingsql-colab', bucket_name='blazingsql-colab')

bc.create_table('taxi', 's3://blazingsql-colab/yellow_taxi/taxi_data.parquet')

bc.sql('SELECT passenger_count, trip_distance FROM taxi LIMIT 2')

	passenger_count	fare_amount
0	1.0	1.1
1	1.0	0.7

Examples

Notebook Title	Description	Try Now
Welcome Notebook	An introduction to BlazingSQL Notebooks and the GPU Data Science Ecosystem.
The DataFrame	Learn how to use BlazingSQL and cuDF to create GPU DataFrames with SQL and Pandas-like APIs.
Data Visualization	Plug in your favorite Python visualization packages, or use GPU accelerated visualization tools to render millions of rows in a flash.
Machine Learning	Learn about cuML, mirrored after the Scikit-Learn API, it offers GPU accelerated machine learning on GPU DataFrames.

Documentation

You can find our full documentation at docs.blazingdb.com.

Prerequisites

Anaconda or Miniconda installed
OS Support
Ubuntu 16.04/18.04 LTS
CentOS 7
GPU Support
Pascal or Better
Compute Capability >= 6.0
CUDA Support
11.0
11.2
11.4
Python Support
3.7
3.8

Install Using Conda

BlazingSQL can be installed with conda (miniconda, or the full Anaconda distribution) from the blazingsql channel:

Stable Version

conda install -c blazingsql -c rapidsai -c nvidia -c conda-forge -c defaults blazingsql python=$PYTHON_VERSION cudatoolkit=$CUDA_VERSION

Where $CUDA_VERSION is 11.0, 11.2 or 11.4 and $PYTHON_VERSION is 3.7 or 3.8
For example for CUDA 11.2 and Python 3.8:

conda install -c blazingsql -c rapidsai -c nvidia -c conda-forge -c defaults blazingsql python=3.8 cudatoolkit=11.2

Nightly Version

For nightly version cuda 11+ are only supported, see https://github.com/rapidsai/cudf#cudagpu-requirements

conda install -c blazingsql-nightly -c rapidsai-nightly -c nvidia -c conda-forge -c defaults blazingsql python=$PYTHON_VERSION  cudatoolkit=$CUDA_VERSION

Where $CUDA_VERSION is 11.0, 11.2 or 11.4 and $PYTHON_VERSION is 3.7 or 3.8
For example for CUDA 11.2 and Python 3.8:

conda install -c blazingsql-nightly -c rapidsai-nightly -c nvidia -c conda-forge -c defaults blazingsql python=3.8  cudatoolkit=11.2

Build/Install from Source (Conda Environment)

This is the recommended way of building all of the BlazingSQL components and dependencies from source. It ensures that all the dependencies are available to the build process.

Stable Version

Install build dependencies

conda create -n bsql python=$PYTHON_VERSION
conda activate bsql
./dependencies.sh 21.08 $CUDA_VERSION

Where $CUDA_VERSION is is 11.0, 11.2 or 11.4 and $PYTHON_VERSION is 3.7 or 3.8
For example for CUDA 11.2 and Python 3.7:

conda create -n bsql python=3.7
conda activate bsql
./dependencies.sh 21.08 11.2

Build

The build process will checkout the BlazingSQL repository and will build and install into the conda environment.

cd $CONDA_PREFIX
git clone https://github.com/BlazingDB/blazingsql.git
cd blazingsql
git checkout main
export CUDACXX=/usr/local/cuda/bin/nvcc
./build.sh

NOTE: You can do ./build.sh -h to see more build options.

$CONDA_PREFIX now has a folder for the blazingsql repository.

Nightly Version

Install build dependencies

For nightly version cuda 11+ are only supported, see https://github.com/rapidsai/cudf#cudagpu-requirements

conda create -n bsql python=$PYTHON_VERSION
conda activate bsql
./dependencies.sh 21.10 $CUDA_VERSION nightly

Where $CUDA_VERSION is 11.0, 11.2 or 11.4 and $PYTHON_VERSION is 3.7 or 3.8
For example for CUDA 11.2 and Python 3.8:

conda create -n bsql python=3.8
conda activate bsql
./dependencies.sh 21.10 11.2 nightly

Build

The build process will checkout the BlazingSQL repository and will build and install into the conda environment.

cd $CONDA_PREFIX
git clone https://github.com/BlazingDB/blazingsql.git
cd blazingsql
export CUDACXX=/usr/local/cuda/bin/nvcc
./build.sh

NOTE: You can do ./build.sh -h to see more build options.

NOTE: You can perform static analysis with cppcheck with the command cppcheck --project=compile_commands.json in any of the cpp project build directories.

$CONDA_PREFIX now has a folder for the blazingsql repository.

Storage plugins

To build without the storage plugins (AWS S3, Google Cloud Storage) use the next arguments:

# Disable all storage plugins
./build.sh disable-aws-s3 disable-google-gs

# Disable AWS S3 storage plugin
./build.sh disable-aws-s3

# Disable Google Cloud Storage plugin
./build.sh disable-google-gs

NOTE: By disabling the storage plugins you don't need to install previously AWS SDK C++ or Google Cloud Storage (neither any of its dependencies).

SQL providers

To build without the SQL providers (MySQL, PostgreSQL, SQLite) use the next arguments:

# Disable all SQL providers
./build.sh disable-mysql disable-sqlite disable-postgresql

# Disable MySQL provider
./build.sh disable-mysql

...

NOTES:

By disabling the storage plugins you don't need to install mysql-connector-cpp=8.0.23 libpq=13 sqlite=3 (neither any of its dependencies).
Currenlty we support only MySQL. but PostgreSQL and SQLite will be ready for the next version!

Documentation

User guides and public APIs documentation can be found at here

Our internal code architecture can be built using Spinx.

conda install -c conda-forge doxygen
cd $CONDA_PREFIX
cd blazingsql/docsrc
pip install -r requirements.txt
make doxygen
make html

The generated documentation can be viewed in a browser at blazingsql/docsrc/build/html/index.html

A lightweight, GPU accelerated, SQL engine for Python

blazingsql

Getting Started

Examples

Documentation

Prerequisites

Install Using Conda

Stable Version

Nightly Version

Build/Install from Source (Conda Environment)

Stable Version

Install build dependencies

Build

Nightly Version

Install build dependencies

Build

Storage plugins

SQL providers

Documentation

GitHub

John

General purpose GPU compute framework for cross vendor graphics cards

A library for data loading and pre-processing to accelerate deep learning applications

blazingsql

Getting Started

Examples

Documentation

Prerequisites

Install Using Conda

Stable Version

Nightly Version

Build/Install from Source (Conda Environment)

Stable Version

Install build dependencies

Build

Nightly Version

Install build dependencies

Build

Storage plugins

SQL providers

Documentation

GitHub

General purpose GPU compute framework for cross vendor graphics cards

A library for data loading and pre-processing to accelerate deep learning applications

You might also like...