3D-MSNet: A point cloud based deep learning model for untargeted feature detection and quantification in profile LC-HRMS data
If you meet any problem in running or training 3D-MSNet, feel free to contact [email protected]
- Novelty: 3D-MSNet achieves lossless high-dimensional feature detection and quantification, by considering LC-MS feature detection problem as a 3D point cloud instance segmentation problem.
- Accuracy: 3D-MSNet achieved the best accuracy in terms of feature detection and quantification on the test datasets.
- Applicability: 3D-MSNet can be widely applied to various MS systems and acquisition methods.
- Efficiency: 3D-MSNet spends reasonable analysis time by accelerating with GPU, which is similar to traditional methods and about 5 times faster than other deep learning methods.
- Potentiality: 3D-MSNet can obtain better accuracy on bigger annotated training datasets.
Intel(R)_Core(TM)_i9-10900K CPU, 128GB memory, GeForce RTX 3090 GPU
Ubuntu 16.04 + CUDA 11.1 + cuDNN 8.0.5
Anaconda 4.9.2 + Python 3.6.13 + PyTorch 1.9
Prepare the deep-learning environment based on your system and hardware, including GPU driver, CUDA, cuDNN, Anaconda, Python, and PyTorch.
Install the dependencies. Here we use ROOT_PATH to represent the root path of 3D-MSNet.
pip install -r requirements.txt
Compile CUDA code. This will take a few minutes.
python setup.py install
The 3DMS dataset and all the benchmark datasets (mzML format) can be freely downloaded at Zenodo.
Raw MS files of the metabolomics datasets can be downloaded at Google Drive.
Raw MS files of the proteomics datasets can be downloaded at ProteomeXchange (dataset PXD001091).
Targeted annotation results, evaluation results and evaluation methods can be downloaded at Zenodo.
Our demos can help you reproduce the evaluation results.
Place the benchmark datasets as follows.
3D-MSNet-master ├── dataset │ ├── TripleTOF_6600 │ │ ├── mzml │ │ │ ├── *.mzML │ ├── QE_HF │ │ ├── mzml │ │ │ ├── *.mzML │ ├── Orbitrap_XL │ │ ├── mzml │ │ │ ├── *.mzML
Then run scripts in folder DEMO. For example:
Prepare point clouds:
The result files are saved in the dataset folder.
Refer to DEMO for parameter setting of different LC-MS platforms.
Prepare point clouds:
python workflow/predict/point_cloud_extractor.py --data_dir=PATH_TO_MZML --output_dir=POINT_CLOUD_PATH --window_mz_width=0.8 --window_rt_width=6 --min_intensity=128 --from_mz=0 --to_mz=2000 --from_rt=0 --to_rt=300 --expansion_mz_width=0.1 --expansion_rt_width=1
python workflow/predict/main_eval.py --data_dir=POINT_CLOUD_PATH --mass_analyzer=orbitrap --mz_resolution=60000 --resolution_mz=400 --rt_fwhm=0.1 --target_id=None
python workflow/predict/point_cloud_extractor.py -h and
python workflow/predict/main_eval.py -h to learn parameter details.
We provided a pretrained model in
If you want to train the model on your self-annotated data, prepare your .csv files refer to the 3DMS dataset. Each MS signal should be annotated an instance label.
Place the training dataset as follows.
3D-MSNet-master ├── dataset │ ├── your_training_dataset │ │ ├── dataset_anno │ │ │ ├── [id_mz_rt].csv
Then change the training parameters at
Split training set and validation set:
Trained models are saved in
Cite our paper at:
3D-MSNet is an open-source tool, using Mulan Permissive Software License，Version 2 (Mulan PSL v2)