FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality Assessment
Created by Jinglin Xu*, Yongming Rao*, Xumin Yu, Guangyi Chen, Jie Zhou, Jiwen Lu
This repository contains the FineDiving dataset and PyTorch implementation for Temporal Segmentation Attention. (CVPR 2022)
[Project Page] [[arXiv]](Coming soon) [Dataset]
Dataset
Lexicon
We construct a fine-grained video dataset organized by both semantic and temporal structures, where each structure contains two-level annotations.
-
For semantic structure, the action-level labels describe the action types of athletes and the step-level labels depict the sub-action types of consecutive steps in the procedure, where adjacent steps in each action procedure belong to different sub-action types. A combination of sub-action types produces an action type.
-
In temporal structure, the action-level labels locate the temporal boundary of a complete action instance performed by an athlete. During this annotation process, we discard all the incomplete action instances and filter out the slow playbacks. The step-level labels are the starting frames of consecutive steps in the action procedure.
Annotation
Given a raw diving video, the annotator utilizes our defined lexicon to label each action and its procedure. We accomplish two annotation stages from coarse- to fine-grained. The coarse stage is to label the action type for each action instance and its temporal boundary accompanied with the official score. The fine-grained stage is to label the sub-action type for each step in the action procedure and record the starting frame of each step, utilizing an effective Annotation Toolbox.
The annotation information is saved in FineDiving_coarse_annotation.pkl
and FineDiving_fine-grained_annotation.pkl
.
Field Name | Type | Description | Field Name | Type | Description |
---|---|---|---|---|---|
action_type |
string | Description of the action type. | sub-action_types |
dict | Description of the sub-action type. |
(x, y) |
string | Instance ID. | judge_scores |
list | Judge scores. |
dive_score |
float | Diving score of the action instance. | frames_labels |
array | Step-level labels of the frames. |
difficulty |
float | Difficulty of the action type. | steps_transit_frames |
array | Frame index of step transitions. |
start_frame |
int | Start frame of the action instance. | end_frame |
int | End frame of the action instance. |
Statistics
The FineDiving dataset consists of 3000 video samples, crossed 52 action types, 29 sub-action types, and 23 difficulty degree types.
Download
- Untrimmed_Videos: Diving competition videos in Olympics, World Cup, World Championships, and European Aquatics Championships. [Baidu Drive]
- Trimmed_Video_Frames: Trimmed video frames are extracted from untrimmed videos using data_process.py and can be directly used for AQA. [Baidu Drive] or [Google Drive]
- Annotations: Coarse- and fine-grained annotations including the action-level labels for untrimmed videos and the step-level labels for action instances.
FineDiving_coarse_annotation.pkl
FineDiving_fine-grained_annotation.pkl
- Train/Test Split: 75 percent of samples are for training and 25 percent are for testing. train_split.pkl test_split.pkl
We have made the full dataset available on [Baidu Drive] (extract number: 0624).
Code
Requirement
- Python 3.7.9
- Pytorch 1.7.1
- torchvision 0.8.2
- timm 0.3.4
- torch_videovision
pip install git+https://github.com/hassony2/torch_videovision
The FineDiving Dataset for AQA
- The data structure should be:
$DATASET_ROOT
├── FineDiving
| ├── FINADivingWorldCup2021_Men3m_final_r1
| ├── 0
| ├── 00489.jpg
| ...
| └── 00592.jpg
| ...
| └── 11
| ├── 14425.jpg
| ...
| └── 14542.jpg
| ...
| └── FullMenSynchronised10mPlatform_Tokyo2020Replays_2
| ├── 0
| ...
| └── 16
└──
Pretrain Model
The Kinetics pretrained I3D downloaded from the reposity kinetics_i3d_pytorch
model_rgb.pth
Experimental Setting
FineDiving_TSA.yaml
# frame_length : 96 # the number of frames for each video
# voter_number : 10 # the number of exemplars in inference
# fix_size : 5 # the number of frames in each step
# step_num : 3 # the number of step transitions in each action
# prob_tas_threshold : 0.25 # the ratio of iteration
# random_choosing : False # whether selecting exemplars randomly or not
# action_number_choosing: True # whether selecting exemplars based on action types or not
Training and Evaluation
# train a model on FineDiving
bash train.sh TSA FineDiving 0,1
# resume the training process
bash train.sh TSA FineDiving 0,1 --resume
# test a model on FineDiving
bash test.sh TSA FineDiving 0,1 --test
Contact: [email protected]