The Taichi programming language
taichi
The Taichi programming language.
# Python 3.6/3.7 needed
# CPU only. No GPU/CUDA needed. (Linux, OS X and Windows)
python3 m pip install taichinightly
# With GPU (CUDA 10.0) support (Linux only)
python3 m pip install taichinightlycuda100
# With GPU (CUDA 10.1) support (Linux only)
python3 m pip install taichinightlycuda101
Linux (CUDA)  OS X (10.14)  Windows  

Build  
PyPI 
Related papers
 (SIGGRAPH Asia 2019) HighPerformance Computation on Sparse Data Structures [Video] [BibTex]
 by Yuanming Hu, TzuMao Li, Luke Anderson, Jonathan RaganKelley, and Frédo Durand
 (ICLR 2020) Differentiable Programming for Physical Simulation [Video] [BibTex] [Code]
 by Yuanming Hu, Luke Anderson, TzuMao Li, Qi Sun, Nathan Carr, Jonathan RaganKelley, and Frédo Durand
Shortterm goals
 (Done) Fully implement the LLVM backend to replace the legacy sourcetosource C++/CUDA backends (By Dec 2019)
 The only missing features compared to the old sourcetosource backends:
 Vectorization on CPUs. Given most users who want performance are using GPUs (CUDA), this is given low priority.
 Automatic shared memory utilization. Postponed until Feb/March 2020.
 The only missing features compared to the old sourcetosource backends:
 (WIP) Tune the performance of the LLVM backend to match that of the legacy sourcetosource backends (By the end of Jan 2020)
 (WIP) Redesign memory allocator
Updates
 (Jan 3, 2020) v0.3.20 released.
 Support for loops with
ti.static(ti.grouped(ti.ndrange(...)))
 Support for loops with
 (Jan 2, 2020) v0.3.19 released.
 Added
ti.atan2(y, x)
 Improved error msg when using float point numbers as tensor indices
 Added
 (Jan 1, 2020) v0.3.18 released.
 Added
ti.GUI
class  Improved the performance of performance
ti.Matrix.fill
 Added
 (Dec 31, 2019) v0.3.17 released.
 Fixed cuda context conflict with PyTorch (thanks to @Xingzhe He for reporting)
 Support
ti.Matrix.T()
for transposing a matrix  Iteratable
ti.static(ti.ndrange)
 Fixed
ti.Matrix.identity()
 Added
ti.Matrix.one()
(create a matrix with 1 as all the entries)  Improved
ir_printer
on SNodes  Better support for
dynamic
SNodes.Structfor's
ondynamic
nodes supportedti.length
andti.append
to query and manipulate dynamic nodes
 (Dec 29, 2019) v0.3.16 released.
 Fixed ndrangefors with local variables (thanks to Xingzhe He for reporting this issue)
 (Dec 28, 2019) v0.3.15 released.
 Multidimensional parallel rangefor using
ti.ndrange
:
 Multidimensional parallel rangefor using
@ti.kernel
def fill_3d():
# Parallelized for all 3 <= i < 8, 1 <= j < 6, 0 <= k < 9
for i, j, k in ti.ndrange((3, 8), (1, 6), 9):
x[i, j, k] = i + j + k
 (Dec 28, 2019) v0.3.14 released.
 GPU random number generator support for more than 1024x1024 threads
 Parallelized element list generation on GPUs. Structfors significantly sped up.
ti
andtid
(debug mode) CLI commands
 (Dec 26, 2019) v0.3.13 released.
ti.append
now returns the list length before appending Fixed for loops with 0 iterations
 Set
ti.get_runtime().set_verbose_kernel_launch(True)
to log kernel launches  Distinguish
/
and//
following the Python convention  Allow using local variables as kernel argument type annotations
 (Dec 25, 2019) v0.3.11 released.
 Support multiple kernels with the same name, especially in the OOP cases where multiple member kernels share the same name
 Basic
dynamic
node support (ti.append
,ti.length
) in the new LLVM backend  Fixed structfor loops on 0D tensors
 (Dec 24, 2019) v0.3.10 released.
assert <condition>
statement supported in Taichi kernels. Comparison operator chaining (e.g.
1 < x <3
) supported in Taichi kernels.
 (Dec 24, 2019) v0.3.9 released.
ti.classfunc
decorator for functions within adata_oriented
class[Expr/Vector/Matrix].to_torch
now has a extra argumentdevice
, which specifies the device placement for returned torch tensor, and should have typetorch.device
. Default=None
. Crossdevice (CPU/GPU) taichi/PyTorch interaction support, when using
to_torch/from_torch
. 
kernels compiled during external array IO significantly reduced (from
matrix size
to1
)
 (Dec 23, 2019) v0.3.8 released.
 Breaking change:
ti.data_oriented
decorator introduced. Please decorate all your Taichi dataoriented objects using this decorator. To invoke the gradient versions ofclassmethod
, for example,A.forward
, simply useA.forward.grad()
instead ofA.forward(__gradient=True)
(obsolete).
 Breaking change:
 (Dec 22, 2019) v0.3.5 released.
 Maximum tensor dimensionality is 8 now (used to be 4). I.e., you can now allocate up to 8D tensors.
 (Dec 22, 2019) v0.3.4 released.
 2D and 3D polar decomposition (
R, S = ti.polar_decompose(A, ti.f32)
) and svd (U, sigma, V = ti.svd(A, ti.f32)
) support. Note thatsigma
is a3x3
diagonal matrix.  Fixed documentation versioning
 Allow
expr_init
withti.core.DataType
as inputs, so thatti.core.DataType
can be used asti.func
parameter
 2D and 3D polar decomposition (
 (Dec 20, 2019) v0.3.3 released.
 Loud failure message when calling nested kernels. Closed #310
DiffTaichi
examples moved to a standalone repo Fixed documentation versioning
 Correctly differentiating kernels with multiple offloaded statements
 (Dec 18, 2019) v0.3.2 released
Vector.norm
now comes with a parametereps
(=0
by default), and returnssqrt(\sum_i(x_i ^ 2) + eps)
. A nonzeroeps
safe guards the operator's gradient on zero vectors during differentiable programming.
 (Dec 17, 2019) v0.3.1 released.
 Removed dependency on
glibc 2.27
 Removed dependency on
 (Dec 17, 2019) v0.3.0 released.
 Documentation significantly improved
break
statements supported in while loops CPU multithreading enabled by default
 (Dec 16, 2019) v0.2.6 released.
ti.GUI.set_image(np.ndarray/Taichi tensor)
 Inplace adds are atomic by default. E.g.,
x[i] += j
is equivalent toti.atomic_add(x[i], j)
ti.func
arguments are forced to pass by valuemin/max
can now take more than two arguments, e.g.max(a, b, c, d)
 Matrix operators
transposed
,trace
,polar_decompose
,determinant
promoted toti
scope. I.e., users can now useti.transposed(M)
instead ofti.Matrix.transposed(M)
ti.get_runtime().set_verbose(False)
to eliminate verbose outputs LLVM backend now supports multithreading on CPUs
 LLVM backend now supports random number generators (
ti.random(ti.i32/i64/f32/f64
)
 (Dec 5, 2019) v0.2.3 released.
 Simplified interaction between
Taichi
,numpy
andPyTorch
taichi_scalar_tensor.to_numpy()/from_numpy(numpy_array)
taichi_scalar_tensor.to_torch()/from_torch(torch_array)
 Simplified interaction between
 (Dec 4, 2019) v0.2.2 released.
 Argument type
ti.ext_arr()
now takes PyTorch tensors
 Argument type
 (Dec 3, 2019) v0.2.1 released.
 Improved type mismatch error message
 native
min
/max
supprt  Tensor access index dimensionality checking
Matrix.to_numpy
,Matrix.zero
,Matrix.identity
,Matrix.fill
 Warning instead of error on lossy stores
 Added some initial support for crossreferencing local variables in different offloaded blocks.
 (Nov 28, 2019) v0.2.0 released.
 More friendly syntax error when passing noncompiletimeconstant values to
ti.static
 Systematically resolved the variable name resolution issue
 Better interaction with numpy:
numpy
arrays passed as ati.ext_arr()
[examples]i32/f32/i64/f64
data type support for numpy Multidimensional numpy arrays now supported in Taichi kernels
Tensor.to_numpy()
andTensor.from_numpy(numpy.ndarray)
supported [examples] Corresponding PyTorch tensor interaction will be supported very soon. Now only 1D f32 PyTorch tensors supproted when using
ti.ext_arr()
. Please use numpy arrays as intermediate buffers for now
 Indexing arrays with an incorrect number of indices now results in a syntax error
 Tensor shape reflection: [examples]
Tensor.dim()
to retrieve the dimensionality of a global tensorTensor.shape()
to retrieve the shape of a global tensor Note the above queries will cause data structures to be materialized
structfor
(e.g.for i, j in x
) now supports iterating over tensors with non poweroftwo dimensions Handy tensor filling: [examples]
Tensor.fill(x)
to set all entries tox
Matrix.fill(x)
to set all entries tox
, wherex
can be a scalar orti.Matrix
of the same size
 Reduced python package size
structfor
with grouped indices for better metaprogramming, especially in writing dimensionalityindependent code, in e.g. physical simulation: [examples]
 More friendly syntax error when passing noncompiletimeconstant values to
for I in ti.grouped(x): # I is a vector of size x.dim() and data type i32
x[I] = 0
# If tensor x is 2D
for I in ti.grouped(x): # I is a vector of size x.dim() and data type i32
y[I + ti.Vector([0, 1])] = I[0] + I[1]
# is equivalent to
for i, j in x:
y[i, j + 1] = i + j

(Nov 27, 2019) v0.1.5 released.
 Better modular programming support
 Disalow the use of
ti.static
outside Taichi kernels  Documentation improvements (WIP)
 Codegen bug fixes
 Special thanks to Andrew Spielberg and KLozes for bug report and feedback.

(Nov 22, 2019) v0.1.3 released.
 Objectoriented programming. [Example]
 native Python function translation in Taichi kernels:
 Use
print
instead ofti.print
 Use
int()
instead ofti.cast(x, ti.i32)
(orti.cast(x, ti.i64)
if your default integer precision is 64 bit)  Use
float()
instead ofti.cast(x, ti.f32)
(orti.cast(x, ti.f64)
if your default floatpoint precision is 64 bit)  Use
abs
instead ofti.abs
 Use
ti.static_print
for compiletime printing
 Use

(Nov 16, 2019) v0.1.0 released. Fixed PyTorch interface.

(Nov 12, 2019) v0.0.87 released.
 Added experimental Windows support with a [known issue] regarding virtual memory allocation, which will potentially limit the scalability of Taichi programs (If you are a Windows expert, please let me know how to solve this. Thanks!). Most examples work on Windows now.
 CUDA march autodetection;
 Complex kernel to override autodiff.

(Nov 4, 2019) v0.0.85 released.
ti.stop_grad
for stopping gradients during backpropagation. [Example]; Compatibility improvements on Linux and OS X;
 Minor bug fixes.

(Nov 1, 2019) v0.0.77 released.
 Python wheels now support OS X 10.14+;
 LLVM is now the default backend. No need to install
gcc7
orclang7
anymore. To use legacy backends,export TI_LLVM=0
;  LLVM compilation speed is improved by 2x;
 More friendly syntax error messages.

(Oct 30, 2019) v0.0.72 released.
 LLVM GPU backend now as fast as the legacy (yet optimized) CUDA backend. To enable,
export TI_LLVM=1
;  Bug fixes: LLVM
struct for
list generation.
 LLVM GPU backend now as fast as the legacy (yet optimized) CUDA backend. To enable,

(Oct 29, 2019) v0.0.71 released. LLVM GPU backend performance greatly improved. Frontend compiler now emits readable syntax error messages.

(Oct 28, 2019) v0.0.70 released. This version comes with experimental LLVM backends for x86_64 and CUDA (via NVVM/PTX). GPU kernel compilation speed is improved by 10x. To enable, update the taichi package and
export TI_LLVM=1
. 
(Oct 24, 2019) Python wheels (v0.0.61) released for Python 3.6/3.7 and CUDA 10.0/10.1 on Ubuntu 16.04+. Contributors of this release include Yuanming Hu, robbertvc, Zhoutong Zhang, Tao Du, Srinivas Kaza, and Kenneth Lozes.

(Oct 22, 2019) Added support for kernel templates. Kernel templates allow users to pass in taichi tensors and compiletime constants as kernel parameters.

(Oct 9, 2019) Compatibility improvements. Added a basic PyTorch interface. [Example].
Notes:
 You still need to clone this repo for demo scripts under
examples
. You do not need to executeinstall.py
ordev_setup.py
.
After installation usingpip
you can simply go toexamples
and execute, e.g.,python3 mpm_fluid.py
.  Make sure you clear your legacy Taichi installation (if applicable) by cleaning the environment variables (delete
TAICHI_REPO_DIR
, and remove legacy taichi fromPYTHONPATH
) in your.bashrc
or.zshrc
. Or you can simply do this in your shell to temporarily clear them:
export PYTHONPATH=
export TAICHI_REPO_DIR=
The Taichi Library [Legacy branch]
Taichi is an opensource computer graphics library that aims to provide easytouse infrastructures for computer graphics R&D. It's written in C++14 and wrapped friendly with Python.
News
 May 17, 2019: GigaVoxel SPGrid Topology Optimization Solver is released!
 March 4, 2019: MLSMPM/CPIC solver is now MITlicensed!
 August 14, 2018: MLSMPM/CPIC solver reloaded! It delivers 414x performance boost over the previous state of the art on CPUs.