this page briefly describes a local installation of Alphafold 2.1.1 (no docker or singularity involved). For details about the dockerized version see alphafold 2.1.1 - docker.

Running alphafold 2.1.1 (no container)

  • Create a batch-script a sample is pasted below. Customize it to contain proper partitions and limits
  • Use /software/alphafold/2.1.1L/alphafold.sh or customize it according to your needs.
  • For multimer: use AF_preset=multimer ..., the default is monomer.
  • For multimer: each monomer has to be a separate entry with full sequence in the fasta-file, even if all monomers are identical
  • Almost all parameter can be customized, see the table below for details
  • sbatch <your-alphafold-script>

Sample batch script

/software/alphafold/2.1.1L/sbatch-alphafold.sh
#!/bin/bash
#SBATCH --partition=allgpu
#SBATCH --constraint='A100|V100'
#SBATCH --time=0-12:00
#SBATCH --job-name=T1050-dimer
#SBATCH --output=slurm.T1050-dimer.out
unset LD_PRELOAD
export AF_preset=multimer 
export AF_outdir=/beegfs/desy/user/$USER/ALPHAFOLD2.1/local 

/software/alphafold/2.1.1L/alphafold.sh --fasta_paths=/software/alphafold/2.1.1L/T1050-2.fasta

Sample run script

/software/alphafold/2.1.1L/alphafold.sh
#!/bin/bash
# basic setup
unset LD_PRELOAD

source /etc/profile.d/modules.sh
module purge
module load maxwell cuda/11.3

# alphafold basics 
export PATH=/software/alphafold/2.1.1L/envs/af2.1/bin:$PATH
export TF_FORCE_UNIFIED_MEMORY=1

export AF_datadir=${AF_datadir:-/beegfs/desy/group/it/ReferenceData/alphafold}

# databases
AF_uniref90=${AF_uniref90:-$AF_datadir/uniref90/uniref90.fasta}
AF_bfd=${AF_bfd:-$AF_datadir/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt}
AF_mmcif=${AF_mmcif:-$AF_datadir/pdb_mmcif/mmcif_files}
AF_obsolete=${AF_obsolete:-$AF_datadir/pdb_mmcif/obsolete.dat}
AF_pdb70=${AF_pdb70:-$AF_datadir/pdb70/pdb70}
AF_mgnify=${AF_mgnify:-$AF_datadir/mgnify/mgy_clusters.fa}
AF_uniclust30=${AF_uniclust30:-$AF_datadir/uniclust30/uniclust30_2018_08/uniclust30_2018_08}
AF_uniprot=${AF_uniprot:-$AF_datadir/uniprot/uniprot.fasta}
AF_pdbseqres=${AF_pdbseqres:-$AF_datadir/pdb_seqres/pdb_seqres.txt}
AF_template_date=${AF_template_date:-$(date +%Y-%m-%d)}

# make sure they all exist
for e in $( /usr/bin/env | grep "$AF_datadir" | cut -d= -f2 ) ; do
    if [[ ! -e $e ]]; then
	echo "missing $e -- check your environment "
	exit
    fi
done

export AF_preset="${AF_preset:-monomer}"

if [[ $AF_preset =~ monomer ]]; then
    export AF_dbs="--uniref90_database_path=$AF_uniref90 --bfd_database_path=$AF_bfd --template_mmcif_dir=$AF_mmcif"
    export AF_dbs="$AF_dbs --obsolete_pdbs_path=$AF_obsolete --pdb70_database_path=$AF_pdb70 --mgnify_database_path=$AF_mgnify"
    export AF_dbs="$AF_dbs --uniclust30_database_path=$AF_uniclust30"
else
    export AF_dbs="--uniref90_database_path=$AF_uniref90 --bfd_database_path=$AF_bfd --template_mmcif_dir=$AF_mmcif"
    export AF_dbs="$AF_dbs --obsolete_pdbs_path=$AF_obsolete --mgnify_database_path=$AF_mgnify"
    export AF_dbs="$AF_dbs --uniclust30_database_path=$AF_uniclust30 --uniprot_database_path=$AF_uniprot --pdb_seqres_database_path=$AF_pdbseqres"
fi
# user customizable setup
export AF_outdir="${AF_outdir:-/tmp/alphafold}"

cat <<EOF
AlphaFold Setup
----------------------------------------------------------------------------------------------------
AF_datadir.:  $AF_datadir
AF_outdir,,:  $AF_outdir
AF_preset..:  $AF_preset

Hardware Setup
----------------------------------------------------------------------------------------------------
Host.......:  $(hostname)
CPU........:  $(grep "model name" /proc/cpuinfo  | head -1 | cut -d: -f2 | grep -o '[a-Z].*')
GPU........:  $(nvidia-smi -L |cut -d'(' -f1 | tr '\n' ' ')
Cores......:  $(nproc)
Memory.....:  $(free -g | grep Mem | awk '{print $2}')

Time.......:  $(date)

Execute:
----------------------------------------------------------------------------------------------------
python3 /software/alphafold/2.1.1L/alphafold/run_alphafold.py \
        --output_dir=$AF_outdir \
        --data_dir=$AF_datadir \
        --model_preset=$AF_preset \
        --max_template_date=$AF_template_date \
        $AF_dbs \
        "$@"

EOF

python3 /software/alphafold/2.1.1L/alphafold/run_alphafold.py --output_dir=$AF_outdir --data_dir=$AF_datadir --model_preset=$AF_preset --max_template_date=$AF_template_date $AF_dbs "$@"






Databases

Databases can be found in /beegfs/desy/group/it/ReferenceData/alphafold/, but feel free to use your own set of DBs. small_bfd is not defined in the sample script, but can be found at /beegfs/desy/group/it/ReferenceData/alphafold/small_bfd/bfd-first_non_consensus_sequences.fasta. Last update: mid November 2021.

The databases to be used differ for monomers and multimers. The sample script (/software/alphafold/2.1.1/alphafold.sh) takes that into account.

Multimers

Note: the fasta-file has to contain each chain as separate entry even if all sequences are identical. For the 1WUF sample it looks like this:

>1WUF_1|Chains A|hypothetical protein lin2664|Listeria innocua (272626)
GHHHHHHHHHHGLVPRGSHMYFQKARLIHAELPLLAPFKTSYGELKSKDFYIIELINEEGIHGYGELEAFPLPDYTEETLSSAILIIKEQLLPLLAQRKIRKPEEIQELFSWIQGNEMAKAAVELAVWDAFAKMEKRSLAKMIGATKESIKVGVSIGLQQNVETLLQLVNQYVDQGYERVKLKIAPNKDIQFVEAVRKSFPKLSLMADANSAYNREDFLLLKELDQYDLEMIEQPFGTKDFVDHAWLQKQLKTRICLDENIRSVKDVEQAHSIGSCRAINLKLARVGGMSSALKIAEYCALNEILVWCGGMLEAGVGRAHNIALAARNEFVFPGDISASNRFFAEDIVTPAFELNQGRLKVPTNEGIGVTLDLKVLKKYTKSTEEILLNKGWS
>1WUF_2|Chains B|hypothetical protein lin2664|Listeria innocua (272626)
GHHHHHHHHHHGLVPRGSHMYFQKARLIHAELPLLAPFKTSYGELKSKDFYIIELINEEGIHGYGELEAFPLPDYTEETLSSAILIIKEQLLPLLAQRKIRKPEEIQELFSWIQGNEMAKAAVELAVWDAFAKMEKRSLAKMIGATKESIKVGVSIGLQQNVETLLQLVNQYVDQGYERVKLKIAPNKDIQFVEAVRKSFPKLSLMADANSAYNREDFLLLKELDQYDLEMIEQPFGTKDFVDHAWLQKQLKTRICLDENIRSVKDVEQAHSIGSCRAINLKLARVGGMSSALKIAEYCALNEILVWCGGMLEAGVGRAHNIALAARNEFVFPGDISASNRFFAEDIVTPAFELNQGRLKVPTNEGIGVTLDLKVLKKYTKSTEEILLNKGWS

Just made some simple attempts using 1WUF.fasta as a template:

Original dimer structure 1WUF.pdb

alphafold predicted dimer

Another test run splitting T1050 into 2 domains and treat them as a multimer looks quite impressive:

alphafold prediction for T1050

prediction for 2 independent domains

alphafold parameters:

parameterdescriptiondefaultenvironment variable in alphafold.shdefault value of environment variable
--[no]benchmark

Run multiple JAX model evaluations to obtain a timing that excludes the compilation time, which should be more indicative of the time required for inferencing many proteins

falsenonenone
--data_dirPath to directory of supporting datanoneAF_datadir/beegfs/desy/group/it/ReferenceData/alphafold
--bfd_database_pathPath to the BFD database for use by HHblitsnoneAF_bfd$AF_datadir/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt
--db_preset

<full_dbs|reduced_dbs>: Choose preset MSA database configuration - smaller genetic database config (reduced_dbs) or full genetic databaseconfig (full_dbs)

full_dbsnonenone
--fasta_paths

Paths to FASTA files, each containing a prediction target that will be folded one after another. If a FASTA file contains multiple sequences, then it will be folded as a multimer. Paths should be separated by commas. All FASTA paths must have a unique basename as the basename is used to name the output directories for each prediction (a comma separated list)

nonenonenone
--hhblits_binary_pathPath to the HHblits executable.hhblitsnonenone
--hhsearch_binary_pathPath to the HHsearch executable.hhsearchnonenone
--hmmbuild_binary_pathPath to the hmmbuild executable.hmmbuildnonenone
--hmmsearch_binary_pathPath to the hmmsearch executable.hmmsearchnonenone
--is_prokaryote_list

Optional for multimer system, not used by the single chain system. This list should contain a boolean for each fasta specifying true where the target complex is from a prokaryote, and false where it is not, or where the origin is unknown. These values determine the pairing method for the MSA (a comma separated list)

nonenonenone
--jackhmmer_binary_pathPath to the JackHMMER executable.jackhmmernonenone
--kalign_binary_pathPath to the Kalign executable.kalignnonenone
--max_template_date

Maximum template release date to consider. Important if folding historical test sets.

nonenonenone
--mgnify_database_pathPath to the MGnify database for use by JackHMMERnoneAF_mgnify$AF_datadir/mgnify/mgy_clusters.fa
--model_preset

<monomer|monomer_casp14|monomer_ptm|multimer>: Choose preset model configuration - the monomer model, the monomer model with extra ensembling, monomer model with pTM head, or  multimer model

monomerAF_presetmonomer
--obsolete_pdbs_path

Path to file containing a mapping from obsolete PDB IDs to the PDB IDs of their replacements.

noneAF_obsolete$AF_datadir/pdb_mmcif/obsolete.dat
--output_dirPath to a directory that will store the results.noneAF_outdir/tmp/alphafold
--pdb70_database_pathPath to the PDB70 database for use by HHsearch.noneAF_pdb70$AF_datadir/pdb70/pdb70
--pdb_seqres_database_path

Path to the PDB seqres database for use by hmmsearch.

noneAF_pdbseqres$AF_datadir/pdb_seqres/pdb_seqres.txt
--random_seed

The random seed for the data pipeline. By default, this is randomly generated. Note that even if this is set, Alphafold may still not be deterministic, because processes like GPU inference are nondeterministic (an integer)

nonenonenone
--small_bfd_database_path

Path to the small version of BFD used with the "reduced_dbs" preset.

nonenonenone
--template_mmcif_dir

Path to a directory with template mmCIF structures, each named <pdb_id>.cif

noneAF_mmcif$AF_datadir/pdb_mmcif/mmcif_files
--uniclust30_database_path

Path to the Uniclust30 database for use by HHblits.

noneAF_uniclust30$AF_datadir/uniclust30/uniclust30_2018_08/uniclust30_2018_08
--uniprot_database_pathPath to the Uniprot database for use by JackHMMer.noneAF_uniprot$AF_datadir/uniprot/uniprot.fasta
--uniref90_database_pathPath to the Uniref90 database for use by JackHMMER.noneAF_uniref90$AF_datadir/uniref90/uniref90.fasta
--[no]use_precomputed_msas

Whether to read MSAs that have been written to disk. WARNING: This will not check if the sequence, database or configuration have changed.

falsenonenone

/software/alphafold/2.1.1L/alphafold.sh pre-defines a few more custom variables which can be redefined on the command line of the batch script:

environment variabledescriptiondefault value
TF_FORCE_UNIFIED_MEMORYGPU memory handlingTF_FORCE_UNIFIED_MEMORY=1
AF_modelsNaming templatemodel_1,model_2,model_3,model_4,model_5
AF_template_datamax_template_datetoday

Installation

# see /software/alphafold/2.1.1L/alphafold/docker/Dockerfile for the basic setup
# some parts adopted from https://pythonrepo.com/repo/kuixu-alphafold

tmpdir=/scratch/$USER
inst_dir=/software/alphafold/2.1.1L

#
# install miniconda
#
pushd $tmpdir
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh -b -p $inst_dir
export PATH=$inst_dir/bin:$PATH
# don't want to polute my environment, used /software/alphafold/2.1.1L/bin/conda-init instead. has to be created
. conda-init

#
#  setup alphafold conda environment
#
conda create -n af2.1 python=3.8
conda activate af2.1
conda install -y -c nvidia          cudnn==8.0.4
conda install -y -c bioconda        hmmer hhsuite==3.3.0 kalign2
conda install -y -c conda-forge     openmm=7.5.1 pdbfixer pip

#
#  alphafold itself
#
wget https://github.com/deepmind/alphafold/archive/refs/tags/v2.1.1.tar.gz -O alphafold-2.1.1.tar.gz
tar xf alphafold-2.1.1.tar.gz
mv alphafold-2.1.1 $inst_dir/alphafold
rm alphafold-2.1.1.tar.gz 
pip3 install -r $inst_dir/alphafold/requirements.txt 
# both jax and jaxlib versions have to be explicit, will cause problems otherwise
pip3 install --upgrade jax==0.2.21 jaxlib==0.1.69+cuda111 -f https://storage.googleapis.com/jax-releases/jax_releases.html
popd

#
#  patch
#
pushd $inst_dir/lib/python3.8/site-packages
patch -p0 < $inst_dir/alphafold/docker/openmm.patch
popd

#
#  get stereo_chemical_props.txt
#
wget -P $inst_dir/alphafold/common/ https://git.scicore.unibas.ch/schwede/openstructure/-/raw/7102c63615b64735c4941278d92b554ec94415f8/modules/mol/alg/src/stereo_chemical_props.txt --no-check-certificate

Attachments:

1WUF.pdb.png (image/png)
1WU.alphafold.png (image/png)
1WUF.pdb.2.png (image/png)
1WUF.alphafold.2.png (image/png)
t1050-1.png (image/png)
t1050-2.png (image/png)