FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality Assessment

Created by Jinglin Xu*, Yongming Rao*, Xumin Yu, Guangyi Chen, Jie Zhou, Jiwen Lu

This repository contains the FineDiving dataset and PyTorch implementation for Temporal Segmentation Attention. (CVPR 2022)

[Project Page] [[arXiv]](Coming soon) [Dataset]

Dataset

Lexicon

We construct a fine-grained video dataset organized by both semantic and temporal structures, where each structure contains two-level annotations.

For semantic structure, the action-level labels describe the action types of athletes and the step-level labels depict the sub-action types of consecutive steps in the procedure, where adjacent steps in each action procedure belong to different sub-action types. A combination of sub-action types produces an action type.
In temporal structure, the action-level labels locate the temporal boundary of a complete action instance performed by an athlete. During this annotation process, we discard all the incomplete action instances and filter out the slow playbacks. The step-level labels are the starting frames of consecutive steps in the action procedure.

Annotation

Given a raw diving video, the annotator utilizes our defined lexicon to label each action and its procedure. We accomplish two annotation stages from coarse- to fine-grained. The coarse stage is to label the action type for each action instance and its temporal boundary accompanied with the official score. The fine-grained stage is to label the sub-action type for each step in the action procedure and record the starting frame of each step, utilizing an effective Annotation Toolbox.

The annotation information is saved in FineDiving_coarse_annotation.pkl and FineDiving_fine-grained_annotation.pkl.

Field Name	Type	Description	Field Name	Type	Description
`action_type`	string	Description of the action type.	`sub-action_types`	dict	Description of the sub-action type.
`(x, y)`	string	Instance ID.	`judge_scores`	list	Judge scores.
`dive_score`	float	Diving score of the action instance.	`frames_labels`	array	Step-level labels of the frames.
`difficulty`	float	Difficulty of the action type.	`steps_transit_frames`	array	Frame index of step transitions.
`start_frame`	int	Start frame of the action instance.	`end_frame`	int	End frame of the action instance.

Statistics

The FineDiving dataset consists of 3000 video samples, crossed 52 action types, 29 sub-action types, and 23 difficulty degree types.

Download

Untrimmed_Videos: Diving competition videos in Olympics, World Cup, World Championships, and European Aquatics Championships. [Baidu Drive]
Trimmed_Video_Frames: Trimmed video frames are extracted from untrimmed videos using data_process.py and can be directly used for AQA. [Baidu Drive] or [Google Drive]
Annotations: Coarse- and fine-grained annotations including the action-level labels for untrimmed videos and the step-level labels for action instances. FineDiving_coarse_annotation.pkl FineDiving_fine-grained_annotation.pkl
Train/Test Split: 75 percent of samples are for training and 25 percent are for testing. train_split.pkl test_split.pkl

We have made the full dataset available on [Baidu Drive] (extract number: 0624).

Code

Requirement

Python 3.7.9
Pytorch 1.7.1
torchvision 0.8.2
timm 0.3.4
torch_videovision

pip install git+https://github.com/hassony2/torch_videovision

The FineDiving Dataset for AQA

The data structure should be:

$DATASET_ROOT
├── FineDiving
|  ├── FINADivingWorldCup2021_Men3m_final_r1
|     ├── 0
|        ├── 00489.jpg
|        ...
|        └── 00592.jpg
|     ...
|     └── 11
|        ├── 14425.jpg
|        ...
|        └── 14542.jpg
|  ...
|  └── FullMenSynchronised10mPlatform_Tokyo2020Replays_2
|     ├── 0
|     ...
|     └── 16 
└──

Pretrain Model

The Kinetics pretrained I3D downloaded from the reposity kinetics_i3d_pytorch

model_rgb.pth

Experimental Setting

FineDiving_TSA.yaml
# frame_length : 96             # the number of frames for each video
# voter_number : 10             # the number of exemplars in inference
# fix_size : 5                  # the number of frames in each step
# step_num : 3                  # the number of step transitions in each action
# prob_tas_threshold : 0.25     # the ratio of iteration
# random_choosing : False       # whether selecting exemplars randomly or not
# action_number_choosing: True  # whether selecting exemplars based on action types or not

Training and Evaluation

# train a model on FineDiving
bash train.sh TSA FineDiving 0,1

# resume the training process
bash train.sh TSA FineDiving 0,1 --resume

# test a model on FineDiving
bash test.sh TSA FineDiving 0,1 --test

Contact: [email protected]

GitHub

View Github

FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality Assessment

FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality Assessment

Dataset

Lexicon

Annotation

Statistics

Download

Code

Requirement

The FineDiving Dataset for AQA

Pretrain Model

Experimental Setting

Training and Evaluation

GitHub

John

Refine video frame based on nearby frames

An IoT Integrated Fully Automatic WIreless PHIshing System

FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality Assessment

Dataset

Lexicon

Annotation

Statistics

Download

Code

Requirement

The FineDiving Dataset for AQA

Pretrain Model

Experimental Setting

Training and Evaluation

GitHub

Refine video frame based on nearby frames

An IoT Integrated Fully Automatic WIreless PHIshing System

You might also like...