this page briefly describes a local installation of Alphafold 2.1.1 (no docker or singularity involved). For details about the dockerized version see alphafold 2.1.1 - docker.
Running alphafold 2.1.1 (no container)
- Create a batch-script a sample is pasted below. Customize it to contain proper partitions and limits
- Use /software/alphafold/2.1.1L/alphafold.sh or customize it according to your needs.
- For multimer: use AF_preset=multimer ..., the default is monomer.
- For multimer: each monomer has to be a separate entry with full sequence in the fasta-file, even if all monomers are identical
- Almost all parameter can be customized, see the table below for details
- sbatch <your-alphafold-script>
Sample batch script
#!/bin/bash #SBATCH --partition=allgpu #SBATCH --constraint='A100|V100' #SBATCH --time=0-12:00 #SBATCH --job-name=T1050-dimer #SBATCH --output=slurm.T1050-dimer.out unset LD_PRELOAD export AF_preset=multimer export AF_outdir=/beegfs/desy/user/$USER/ALPHAFOLD2.1/local /software/alphafold/2.1.1L/alphafold.sh --fasta_paths=/software/alphafold/2.1.1L/T1050-2.fasta
Sample run script
#!/bin/bash # basic setup unset LD_PRELOAD source /etc/profile.d/modules.sh module purge module load maxwell cuda/11.3 # alphafold basics export PATH=/software/alphafold/2.1.1L/envs/af2.1/bin:$PATH export TF_FORCE_UNIFIED_MEMORY=1 export AF_datadir=${AF_datadir:-/beegfs/desy/group/it/ReferenceData/alphafold} # databases AF_uniref90=${AF_uniref90:-$AF_datadir/uniref90/uniref90.fasta} AF_bfd=${AF_bfd:-$AF_datadir/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt} AF_mmcif=${AF_mmcif:-$AF_datadir/pdb_mmcif/mmcif_files} AF_obsolete=${AF_obsolete:-$AF_datadir/pdb_mmcif/obsolete.dat} AF_pdb70=${AF_pdb70:-$AF_datadir/pdb70/pdb70} AF_mgnify=${AF_mgnify:-$AF_datadir/mgnify/mgy_clusters.fa} AF_uniclust30=${AF_uniclust30:-$AF_datadir/uniclust30/uniclust30_2018_08/uniclust30_2018_08} AF_uniprot=${AF_uniprot:-$AF_datadir/uniprot/uniprot.fasta} AF_pdbseqres=${AF_pdbseqres:-$AF_datadir/pdb_seqres/pdb_seqres.txt} AF_template_date=${AF_template_date:-$(date +%Y-%m-%d)} # make sure they all exist for e in $( /usr/bin/env | grep "$AF_datadir" | cut -d= -f2 ) ; do if [[ ! -e $e ]]; then echo "missing $e -- check your environment " exit fi done export AF_preset="${AF_preset:-monomer}" if [[ $AF_preset =~ monomer ]]; then export AF_dbs="--uniref90_database_path=$AF_uniref90 --bfd_database_path=$AF_bfd --template_mmcif_dir=$AF_mmcif" export AF_dbs="$AF_dbs --obsolete_pdbs_path=$AF_obsolete --pdb70_database_path=$AF_pdb70 --mgnify_database_path=$AF_mgnify" export AF_dbs="$AF_dbs --uniclust30_database_path=$AF_uniclust30" else export AF_dbs="--uniref90_database_path=$AF_uniref90 --bfd_database_path=$AF_bfd --template_mmcif_dir=$AF_mmcif" export AF_dbs="$AF_dbs --obsolete_pdbs_path=$AF_obsolete --mgnify_database_path=$AF_mgnify" export AF_dbs="$AF_dbs --uniclust30_database_path=$AF_uniclust30 --uniprot_database_path=$AF_uniprot --pdb_seqres_database_path=$AF_pdbseqres" fi # user customizable setup export AF_outdir="${AF_outdir:-/tmp/alphafold}" cat <<EOF AlphaFold Setup ---------------------------------------------------------------------------------------------------- AF_datadir.: $AF_datadir AF_outdir,,: $AF_outdir AF_preset..: $AF_preset Hardware Setup ---------------------------------------------------------------------------------------------------- Host.......: $(hostname) CPU........: $(grep "model name" /proc/cpuinfo | head -1 | cut -d: -f2 | grep -o '[a-Z].*') GPU........: $(nvidia-smi -L |cut -d'(' -f1 | tr '\n' ' ') Cores......: $(nproc) Memory.....: $(free -g | grep Mem | awk '{print $2}') Time.......: $(date) Execute: ---------------------------------------------------------------------------------------------------- python3 /software/alphafold/2.1.1L/alphafold/run_alphafold.py \ --output_dir=$AF_outdir \ --data_dir=$AF_datadir \ --model_preset=$AF_preset \ --max_template_date=$AF_template_date \ $AF_dbs \ "$@" EOF python3 /software/alphafold/2.1.1L/alphafold/run_alphafold.py --output_dir=$AF_outdir --data_dir=$AF_datadir --model_preset=$AF_preset --max_template_date=$AF_template_date $AF_dbs "$@"
Databases
Databases can be found in /beegfs/desy/group/it/ReferenceData/alphafold/, but feel free to use your own set of DBs. small_bfd is not defined in the sample script, but can be found at /beegfs/desy/group/it/ReferenceData/alphafold/small_bfd/bfd-first_non_consensus_sequences.fasta. Last update: mid November 2021.
The databases to be used differ for monomers and multimers. The sample script (/software/alphafold/2.1.1/alphafold.sh) takes that into account.
Multimers
Note: the fasta-file has to contain each chain as separate entry even if all sequences are identical. For the 1WUF sample it looks like this:
>1WUF_1|Chains A|hypothetical protein lin2664|Listeria innocua (272626) GHHHHHHHHHHGLVPRGSHMYFQKARLIHAELPLLAPFKTSYGELKSKDFYIIELINEEGIHGYGELEAFPLPDYTEETLSSAILIIKEQLLPLLAQRKIRKPEEIQELFSWIQGNEMAKAAVELAVWDAFAKMEKRSLAKMIGATKESIKVGVSIGLQQNVETLLQLVNQYVDQGYERVKLKIAPNKDIQFVEAVRKSFPKLSLMADANSAYNREDFLLLKELDQYDLEMIEQPFGTKDFVDHAWLQKQLKTRICLDENIRSVKDVEQAHSIGSCRAINLKLARVGGMSSALKIAEYCALNEILVWCGGMLEAGVGRAHNIALAARNEFVFPGDISASNRFFAEDIVTPAFELNQGRLKVPTNEGIGVTLDLKVLKKYTKSTEEILLNKGWS >1WUF_2|Chains B|hypothetical protein lin2664|Listeria innocua (272626) GHHHHHHHHHHGLVPRGSHMYFQKARLIHAELPLLAPFKTSYGELKSKDFYIIELINEEGIHGYGELEAFPLPDYTEETLSSAILIIKEQLLPLLAQRKIRKPEEIQELFSWIQGNEMAKAAVELAVWDAFAKMEKRSLAKMIGATKESIKVGVSIGLQQNVETLLQLVNQYVDQGYERVKLKIAPNKDIQFVEAVRKSFPKLSLMADANSAYNREDFLLLKELDQYDLEMIEQPFGTKDFVDHAWLQKQLKTRICLDENIRSVKDVEQAHSIGSCRAINLKLARVGGMSSALKIAEYCALNEILVWCGGMLEAGVGRAHNIALAARNEFVFPGDISASNRFFAEDIVTPAFELNQGRLKVPTNEGIGVTLDLKVLKKYTKSTEEILLNKGWS
Just made some simple attempts using 1WUF.fasta as a template:
Original dimer structure 1WUF.pdb | alphafold predicted dimer |
Another test run splitting T1050 into 2 domains and treat them as a multimer looks quite impressive:
alphafold prediction for T1050 | prediction for 2 independent domains |
alphafold parameters:
parameter | description | default | environment variable in alphafold.sh | default value of environment variable |
---|---|---|---|---|
--[no]benchmark | Run multiple JAX model evaluations to obtain a timing that excludes the compilation time, which should be more indicative of the time required for inferencing many proteins | false | none | none |
--data_dir | Path to directory of supporting data | none | AF_datadir | /beegfs/desy/group/it/ReferenceData/alphafold |
--bfd_database_path | Path to the BFD database for use by HHblits | none | AF_bfd | $AF_datadir/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt |
--db_preset | <full_dbs|reduced_dbs>: Choose preset MSA database configuration - smaller genetic database config (reduced_dbs) or full genetic databaseconfig (full_dbs) | full_dbs | none | none |
--fasta_paths | Paths to FASTA files, each containing a prediction target that will be folded one after another. If a FASTA file contains multiple sequences, then it will be folded as a multimer. Paths should be separated by commas. All FASTA paths must have a unique basename as the basename is used to name the output directories for each prediction (a comma separated list) | none | none | none |
--hhblits_binary_path | Path to the HHblits executable. | hhblits | none | none |
--hhsearch_binary_path | Path to the HHsearch executable. | hhsearch | none | none |
--hmmbuild_binary_path | Path to the hmmbuild executable. | hmmbuild | none | none |
--hmmsearch_binary_path | Path to the hmmsearch executable. | hmmsearch | none | none |
--is_prokaryote_list | Optional for multimer system, not used by the single chain system. This list should contain a boolean for each fasta specifying true where the target complex is from a prokaryote, and false where it is not, or where the origin is unknown. These values determine the pairing method for the MSA (a comma separated list) | none | none | none |
--jackhmmer_binary_path | Path to the JackHMMER executable. | jackhmmer | none | none |
--kalign_binary_path | Path to the Kalign executable. | kalign | none | none |
--max_template_date | Maximum template release date to consider. Important if folding historical test sets. | none | none | none |
--mgnify_database_path | Path to the MGnify database for use by JackHMMER | none | AF_mgnify | $AF_datadir/mgnify/mgy_clusters.fa |
--model_preset | <monomer|monomer_casp14|monomer_ptm|multimer>: Choose preset model configuration - the monomer model, the monomer model with extra ensembling, monomer model with pTM head, or multimer model | monomer | AF_preset | monomer |
--obsolete_pdbs_path | Path to file containing a mapping from obsolete PDB IDs to the PDB IDs of their replacements. | none | AF_obsolete | $AF_datadir/pdb_mmcif/obsolete.dat |
--output_dir | Path to a directory that will store the results. | none | AF_outdir | /tmp/alphafold |
--pdb70_database_path | Path to the PDB70 database for use by HHsearch. | none | AF_pdb70 | $AF_datadir/pdb70/pdb70 |
--pdb_seqres_database_path | Path to the PDB seqres database for use by hmmsearch. | none | AF_pdbseqres | $AF_datadir/pdb_seqres/pdb_seqres.txt |
--random_seed | The random seed for the data pipeline. By default, this is randomly generated. Note that even if this is set, Alphafold may still not be deterministic, because processes like GPU inference are nondeterministic (an integer) | none | none | none |
--small_bfd_database_path | Path to the small version of BFD used with the "reduced_dbs" preset. | none | none | none |
--template_mmcif_dir | Path to a directory with template mmCIF structures, each named <pdb_id>.cif | none | AF_mmcif | $AF_datadir/pdb_mmcif/mmcif_files |
--uniclust30_database_path | Path to the Uniclust30 database for use by HHblits. | none | AF_uniclust30 | $AF_datadir/uniclust30/uniclust30_2018_08/uniclust30_2018_08 |
--uniprot_database_path | Path to the Uniprot database for use by JackHMMer. | none | AF_uniprot | $AF_datadir/uniprot/uniprot.fasta |
--uniref90_database_path | Path to the Uniref90 database for use by JackHMMER. | none | AF_uniref90 | $AF_datadir/uniref90/uniref90.fasta |
--[no]use_precomputed_msas | Whether to read MSAs that have been written to disk. WARNING: This will not check if the sequence, database or configuration have changed. | false | none | none |
/software/alphafold/2.1.1L/alphafold.sh pre-defines a few more custom variables which can be redefined on the command line of the batch script:
environment variable | description | default value |
---|---|---|
TF_FORCE_UNIFIED_MEMORY | GPU memory handling | TF_FORCE_UNIFIED_MEMORY=1 |
AF_models | Naming template | model_1,model_2,model_3,model_4,model_5 |
AF_template_data | max_template_date | today |
Installation
# see /software/alphafold/2.1.1L/alphafold/docker/Dockerfile for the basic setup # some parts adopted from https://pythonrepo.com/repo/kuixu-alphafold tmpdir=/scratch/$USER inst_dir=/software/alphafold/2.1.1L # # install miniconda # pushd $tmpdir wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh bash Miniconda3-latest-Linux-x86_64.sh -b -p $inst_dir export PATH=$inst_dir/bin:$PATH # don't want to polute my environment, used /software/alphafold/2.1.1L/bin/conda-init instead. has to be created . conda-init # # setup alphafold conda environment # conda create -n af2.1 python=3.8 conda activate af2.1 conda install -y -c nvidia cudnn==8.0.4 conda install -y -c bioconda hmmer hhsuite==3.3.0 kalign2 conda install -y -c conda-forge openmm=7.5.1 pdbfixer pip # # alphafold itself # wget https://github.com/deepmind/alphafold/archive/refs/tags/v2.1.1.tar.gz -O alphafold-2.1.1.tar.gz tar xf alphafold-2.1.1.tar.gz mv alphafold-2.1.1 $inst_dir/alphafold rm alphafold-2.1.1.tar.gz pip3 install -r $inst_dir/alphafold/requirements.txt # both jax and jaxlib versions have to be explicit, will cause problems otherwise pip3 install --upgrade jax==0.2.21 jaxlib==0.1.69+cuda111 -f https://storage.googleapis.com/jax-releases/jax_releases.html popd # # patch # pushd $inst_dir/lib/python3.8/site-packages patch -p0 < $inst_dir/alphafold/docker/openmm.patch popd # # get stereo_chemical_props.txt # wget -P $inst_dir/alphafold/common/ https://git.scicore.unibas.ch/schwede/openstructure/-/raw/7102c63615b64735c4941278d92b554ec94415f8/modules/mol/alg/src/stereo_chemical_props.txt --no-check-certificate
Attachments:
1WU.alphafold.png (image/png)
1WUF.pdb.2.png (image/png)
1WUF.alphafold.2.png (image/png)
t1050-1.png (image/png)
t1050-2.png (image/png)