lightning-transformers
Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.
Installation
Option 1: from PyPI
pip install lightning-transformers
# instead of: `python train.py ...`, run with:
pl-transformers-train ...
Option 2: from source
git clone https://github.com/PyTorchLightning/lightning-transformers.git
cd lightning-transformers
pip install .
python train.py ...
# the `pl-transformers-train` endpoint is also available!
Quick recipes
Train bert-base-cased on the CARER emotion dataset using the Text Classification task.
python train.py \
task=nlp/text_classification \
dataset=nlp/text_classification/emotion
See the composed Hydra config used under-the-hood
optimizer:
_target_: torch.optim.AdamW
lr: ${training.lr}
weight_decay: 0.001
scheduler:
_target_: transformers.get_linear_schedule_with_warmup
num_training_steps: -1
num_warmup_steps: 0.1
training:
run_test_after_fit: true
lr: 5.0e-05
output_dir: .
batch_size: 16
num_workers: 16
trainer:
_target_: pytorch_lightning.Trainer
logger: true
checkpoint_callback: true
callbacks: null
default_root_dir: null
gradient_clip_val: 0.0
process_position: 0
num_nodes: 1
num_processes: 1
gpus: null
auto_select_gpus: false
tpu_cores: null
log_gpu_memory: null
progress_bar_refresh_rate: 1
overfit_batches: 0.0
track_grad_norm: -1
check_val_every_n_epoch: 1
fast_dev_run: false
accumulate_grad_batches: 1
max_epochs: 1
min_epochs: 1
max_steps: null
min_steps: null
limit_train_batches: 1.0
limit_val_batches: 1.0
limit_test_batches: 1.0
val_check_interval: 1.0
flush_logs_every_n_steps: 100
log_every_n_steps: 50
accelerator: null
sync_batchnorm: false
precision: 32
weights_summary: top
weights_save_path: null
num_sanity_val_steps: 2
truncated_bptt_steps: null
resume_from_checkpoint: null
profiler: null
benchmark: false
deterministic: false
reload_dataloaders_every_epoch: false
auto_lr_find: false
replace_sampler_ddp: true
terminate_on_nan: false
auto_scale_batch_size: false
prepare_data_per_node: true
plugins: null
amp_backend: native
amp_level: O2
move_metrics_to_cpu: false
task:
_recursive_: false
backbone: ${backbone}
optimizer: ${optimizer}
scheduler: ${scheduler}
_target_: lightning_transformers.task.nlp..text_classification.TextClassificationTransformer
downstream_model_type: transformers.AutoModelForSequenceClassification
dataset:
cfg:
batch_size: ${training.batch_size}
num_workers: ${training.num_workers}
dataset_name: emotion
dataset_config_name: null
train_file: null
validation_file: null
test_file: null
train_val_split: null
max_samples: null
cache_dir: null
padding: max_length
truncation: only_first
preprocessing_num_workers: 1
load_from_cache_file: true
max_length: 128
limit_train_samples: null
limit_val_samples: null
limit_test_samples: null
_target_: lightning_transformers.task.nlp.text_classification.TextClassificationDataModule
experiment_name: ${now:%Y-%m-%d}_${now:%H-%M-%S}
log: false
ignore_warnings: true
tokenizer:
_target_: transformers.AutoTokenizer.from_pretrained
pretrained_model_name_or_path: ${backbone.pretrained_model_name_or_path}
use_fast: true
backbone:
pretrained_model_name_or_path: bert-base-cased
Swap the backbone to RoBERTa and the optimizer to RMSprop:
python train.py \
task=nlp/text_classification \
dataset=nlp/text_classification/emotion
backbone.pretrained_model_name_or_path=roberta-base
optimizer=rmsprop
See the changed Hydra config under-the-hood
optimizer:
- _target_: torch.optim.AdamW
+ _target_: torch.optim.RMSprop
lr: ${training.lr}
- weight_decay: 0.001
scheduler:
_target_: transformers.get_linear_schedule_with_warmup
num_training_steps: -1
....
tokenizer:
pretrained_model_name_or_path: ${backbone.pretrained_model_name_or_path}
use_fast: true
backbone:
- pretrained_model_name_or_path: bert-base-cased
+ pretrained_model_name_or_path: roberta-base
Enable Sharded Training.
python train.py \
task=nlp/text_classification \
dataset=nlp/text_classification/emotion \
trainer=ddp \
trainer/plugins=sharded
See the changed Hydra config under-the-hood
Without the need to modify any code, the config updated automatically for sharded training:
optimizer:
_target_: torch.optim.AdamW
lr: ${training.lr}
trainer:
process_position: 0
num_nodes: 1
num_processes: 1
- gpus: null
+ gpus: 1
auto_select_gpus: false
tpu_cores: null
log_gpu_memory: null
...
val_check_interval: 1.0
flush_logs_every_n_steps: 100
log_every_n_steps: 50
- accelerator: null
+ accelerator: ddp
sync_batchnorm: false
- precision: 32
+ precision: 16
weights_summary: top
weights_save_path: null
num_sanity_val_steps: 2
....
terminate_on_nan: false
auto_scale_batch_size: false
prepare_data_per_node: true
- plugins: null
+ plugins:
+ _target_: pytorch_lightning.plugins.DDPShardedPlugin
amp_backend: native
amp_level: O2
move_metrics_to_cpu: false
tokenizer:
pretrained_model_name_or_path: ${backbone.pretrained_model_name_or_path}
use_fast: true
backbone:
pretrained_model_name_or_path: bert-base-cased
Enable DeepSpeed ZeRO Training.
python train.py \
task=nlp/text_classification \
dataset=nlp/text_classification/emotion \
trainer=ddp \
trainer/plugins=deepspeed
See the changed Hydra config under-the-hood
Without the need to modify any code, the config updated automatically for DeepSpeed:
optimizer:
_target_: torch.optim.AdamW
lr: ${training.lr}
trainer:
process_position: 0
num_nodes: 1
num_processes: 1
- gpus: null
+ gpus: 1
auto_select_gpus: false
tpu_cores: null
log_gpu_memory: null
...
val_check_interval: 1.0
flush_logs_every_n_steps: 100
log_every_n_steps: 50
- accelerator: null
+ accelerator: ddp
sync_batchnorm: false
- precision: 32
+ precision: 16
...
- plugins: null
+ plugins:
+ _target_: pytorch_lightning.plugins.DeepSpeedPlugin
+ stage: 2
+ cpu_offload: true
amp_backend: native
amp_level: O2
move_metrics_to_cpu: false
...
Train with a pre-trained t5-base backbone, on the XSUM dataset using the Summarization task.
python train.py \
task=nlp/summarization \
dataset=nlp/summarization/xsum \
backbone.pretrained_model_name_or_path=t5-base
Train with a pre-trained mt5-base backbone, on the WMT16 dataset using the Translation task with 2 GPUs.
python train.py \
task=nlp/translation \
dataset=nlp/translation/wmt16 \
backbone.pretrained_model_name_or_path=google/mt5-base \
trainer.gpus=2