ParallelFold

Author: Bozitao Zhong

This is a modified version of DeepMind's Alphafold2 to divide CPU part (MSA and template searching) and GPU part (prediction model) of Alphafold2 local version.

How to install

First you should install Alphafold2. You can choose one of the following methods to install Alphafold locally.

  • Use official version from DeepMind with docker.
  • There are some other versions install Alphafold without docker.
  • Also you can use my guide which based on non_docker version and it can adjust to different cuda versions (cuda driver >= 10.1)

Then, put these 4 files in your Alphafold folder, this folder should have an original run_alphafold.py file, and I use a run_alphafold.sh file to run Alphafold easily (learned from non_docker version)

4 files:

  • run_alphafold.py: modified version of original run_alphafold.py, it skips featuring steps when there exists feature.pkl in output folder
  • run_alphaold.sh: bash script to run run_alphafold.py
  • run_feature.py: modified version of original run_alphafold.py, it exit python process after finished writing feature.pkl
  • run_feature.sh: bash scripts to run run_feature.py

How to run

First, you need CPUs to run run_feature.sh:

./run_feature.sh -d data -o output -m model_1 -f input/test3.fasta -t 2021-07-27

8 CPUs is enough, according to my test, more CPUs won't help with speed.

GPU can accelerate the hhblits step (but I think you choose this repo because GPU is expensive)

Featuring step will output the feature.pkl and MSA folder in your output folder: ./output/JOBNAME/

PS: Here I put my input files in an input folder to better organize my files, you can remove this.

Second, you can run run_alphafold.sh using GPU:

./run_alphafold.sh -d data -o output -m model_1,model_2,model_3,model_4,model_5 -f input/test.fasta -t 2021-07-27

If you have successfully output feature.pkl, you can have a very fast featuring step

I have also upload my scripts in SJTU HPC (using slurm): sub_alphafold.slurm and sub_feature.slurm

Other Files

In ./Alphafold folder, I modified some python files (hhblits.py, hmmsearch.py, jackhmmer.py) , give these steps more CPUs for acceleration. But these processes have been tested and shown to be unable to accelerate by providing more CPU. Maybe this is because

Probably because DeepMind uses a wrapped process, I'm trying to improve it (work in progress).

If you have any question, please send your problem in issues

GitHub

https://github.com/Zuricho/ParallelFold