Synthetic Dataset Generator


This is a tool that generates a dataset of synthetic buildings of different typologies.

The generated data includes:

  • Mesh files of generated buildings, .obj format
  • Rendered images of the mesh, .png format
  • Rendered segmentation masks, .png format
  • Depth annotation, .png and .exr format
  • Surface normals annotation, .png format
  • Point cloud files, .ply format (the number of points by default is 2048, can be changed in

How To Use

  • Install Blender>=2.90. After installation make sure to add blender as an Environment variable.
  • Download the package as a .zip file or:
git clone

*Navigate to the Building-Dataset-Generator folder.

pip install -r requirements.txt

To create completely synthetic buildings use:



blender setup.blend --python

Unfortunately, it is not possible to use Blender in background mode as it will not render the image masks correctly.

all the parameters related to the dataset (including any specific parameters for your buildings (e.g. max and min height / width / length)) are to be provided in Default values adhere to international standards (min) and most common European values (max):

  • minimum height 3m
  • minimum length and width 6m
  • maximum length, width, height 30 m
    Other values to set:
  • number of dataset samples
  • building types
  • component materials
  • rendered image dimensions
  • number of points in the point clouds
  • paths to store the generated data
  • option to save the .exr files

Annotation structure

{'img': 'images/0.png',
'category': 'building',
'img_size': (256, 256),
'2d_keypoints': [],
'mask': 'masks/0.png',
'img_source': 'synthetic',
'model': 'models/0.obj',
'point_cloud': 'PointCloud/0.ply',
'model_source': 'synthetic',
'trans_mat': 0,
'focal_length': 35.0,
'cam_position': (0.0, 0.0, 0.0),
'inplane_rotation': 0,
'truncated': False,
'occluded': False,
'slightly_occluded': False,
'bbox': [0.0, 0.0, 0.0, 0.0],
'material': ['concrete', 'brick']}


We ran the dataset generation algorithm for 100 model samples with different input parameters on Windows 10 OS on CPU and GPU using AMD Ryzen 7 3800-X 8-Core Processor and GeForce GTX 1080.
Here we report the results for the multiview generation (3 views per model):

GPU Multiview Time (h)
:white_check_mark: 2.7
:white_check_mark: 0.34
:white_check_mark: :white_check_mark: 0.8


Bibtex format

      title={Synthetic 3D Data Generation Pipeline for Geometric Deep Learning in Architecture}, 
      author={Stanislava Fedorova and Alberto Tono and Meher Shashwat Nigam and Jiayao Zhang and Amirhossein Ahmadnia and Cecilia Bolognesi and Dominik L. Michels},

Generated Image Samples



Stanislava Fedorova
Alberto Tono
Meher Shashwat Nigam
Jiayao Zhang
Amirhossein Ahmadnia
Cecilia bolognesi
Dominik L. Michels