3D-MSNet: A point cloud based deep learning model for untargeted feature detection and quantification in profile LC-HRMS data
If you meet any problem in running or training 3D-MSNet, feel free to contact [email protected]
Highlights
- Novelty: 3D-MSNet achieves lossless high-dimensional feature detection and quantification, by considering LC-MS feature detection problem as a 3D point cloud instance segmentation problem.
- Accuracy: 3D-MSNet achieved the best accuracy in terms of feature detection and quantification on the test datasets.
- Applicability: 3D-MSNet can be widely applied to various MS systems and acquisition methods.
- Efficiency: 3D-MSNet spends reasonable analysis time by accelerating with GPU, which is similar to traditional methods and about 5 times faster than other deep learning methods.
- Potentiality: 3D-MSNet can obtain better accuracy on bigger annotated training datasets.
Environment
Recommended
Intel(R)_Core(TM)_i9-10900K CPU, 128GB memory, GeForce RTX 3090 GPU
Ubuntu 16.04 + CUDA 11.1 + cuDNN 8.0.5
Anaconda 4.9.2 + Python 3.6.13 + PyTorch 1.9
Setup
-
Prepare the deep-learning environment based on your system and hardware, including GPU driver, CUDA, cuDNN, Anaconda, Python, and PyTorch.
-
Install the dependencies. Here we use ROOT_PATH to represent the root path of 3D-MSNet.
cd ROOT_PATH
pip install -r requirements.txt
-
Compile CUDA code. This will take a few minutes.
cd cuda
python setup.py install
Datasets
The 3DMS dataset and all the benchmark datasets (mzML format) can be freely downloaded at Zenodo.
Raw MS files of the metabolomics datasets can be downloaded at Google Drive.
Raw MS files of the proteomics datasets can be downloaded at ProteomeXchange (dataset PXD001091).
Targeted annotation results, evaluation results and evaluation methods can be downloaded at Zenodo.
Run 3D-MSNet
Demos
Our demos can help you reproduce the evaluation results.
Place the benchmark datasets as follows.
3D-MSNet-master
├── dataset
│ ├── TripleTOF_6600
│ │ ├── mzml
│ │ │ ├── *.mzML
│ ├── QE_HF
│ │ ├── mzml
│ │ │ ├── *.mzML
│ ├── Orbitrap_XL
│ │ ├── mzml
│ │ │ ├── *.mzML
Then run scripts in folder DEMO. For example:
cd ROOT_PATH
Prepare point clouds: python DEMO/TripleTOF_6600_untarget/0_pc_extraction.py
Extract features: python DEMO/TripleTOF_6600_untarget/1_peak_detection.py
The result files are saved in the dataset folder.
Customized running
Refer to DEMO for parameter setting of different LC-MS platforms.
cd ROOT_PATH
Prepare point clouds:
python workflow/predict/point_cloud_extractor.py --data_dir=PATH_TO_MZML --output_dir=POINT_CLOUD_PATH --window_mz_width=0.8 --window_rt_width=6 --min_intensity=128 --from_mz=0 --to_mz=2000 --from_rt=0 --to_rt=300 --expansion_mz_width=0.1 --expansion_rt_width=1
Extract features:
python workflow/predict/main_eval.py --data_dir=POINT_CLOUD_PATH --mass_analyzer=orbitrap --mz_resolution=60000 --resolution_mz=400 --rt_fwhm=0.1 --target_id=None
Run python workflow/predict/point_cloud_extractor.py -h
and python workflow/predict/main_eval.py -h
to learn parameter details.
Train
We provided a pretrained model in experiment
folder.
If you want to train the model on your self-annotated data, prepare your .csv files refer to the 3DMS dataset. Each MS signal should be annotated an instance label.
Place the training dataset as follows.
3D-MSNet-master
├── dataset
│ ├── your_training_dataset
│ │ ├── dataset_anno
│ │ │ ├── [id_mz_rt].csv
Then change the training parameters at config/msnet_default.yaml
cd ROOT_PATH
Split training set and validation set:
python workflow/train/dataset_generator.py
Start training:
python workflow/train/main_train.py
Trained models are saved in experiment
folder.
Citation
Cite our paper at:
@article{}
License
3D-MSNet is an open-source tool, using Mulan Permissive Software License,Version 2 (Mulan PSL v2)