ATM#

The Alchemical Transfer Method (ATM), unlike traditional FEP methods, does not require setting up any kind of hybrid topology, or even alchemically transforming the ligand. Instead, ligands are 'translated' along a displacement vector from the binding site to the bulk solvent and vice-versa. Not only does this greatly simplify the setup, but also easily allows for computing the binding free energy between two ligands that do not share a common core.

It is highly recommended to read the original ABFE and RBFE ATM publications for a more detailed description of the method, as well as the fantastic documentation being put together by the Gallicchio lab

Here the general approach and python API for running ATM calculations with femto is described. See the overview for instructions on running using the CLI.

Tip

This guide will walk through setting up all calculations serially, however femto supports parallelisation using MPI. See femto.fe.atm.run_workflow and its source for an MPI ready runner / example implementation.

Procedure#

The implementation of ATM in femto generally proceeds by setting up the complex in solvation from the pre-prepared receptor and ligand structures, running equilibration simulation at each lambda state, running Hamiltonian replica exchange (HREMD) between each window, and finally computing the final free energy using MBAR or UWHAM.

The full procedure is configured using the femto.fe.atm.ATMConfig class:

import femto.fe.atm

config = femto.fe.atm.ATMConfig()

Setup#

The setup procedure is responsible for combining the ligand(s) and receptor structures into a single complex, solvating it, selecting any reference atoms if not provided, applying restraints, and finally creating the main OpenMM system.

It begins with the already prepared (correct binding pose, protonation, etc.) structures of the ligand(s) and receptor:

import pathlib

import femto.md.prepare

eralpha_dir = pathlib.Path("eralpha")

ligand_1, ligand_2 = femto.md.prepare.load_ligands(
    eralpha_dir / "forcefield/2d/vacuum.mol2",
    eralpha_dir / "forcefield/2e/vacuum.mol2",
)
receptor = femto.md.prepare.load_receptor(
    eralpha_dir / "proteins/eralpha/protein.pdb"
)

The ligands must both be in reasonable binding poses. The setup code will handle translating the ligands along the displacement vector as required.

If the ligands / receptor has already been parameterized, the OpenMM XML or AMBER prmtop files can additionally be specified:

extra_parameters = [
    eralpha_dir / "forcefield/2d/vacuum.parm7",
    eralpha_dir / "forcefield/2e/vacuum.parm7",
]

If unspecified, an OpenFF force field will be used to parameterize the ligands. The exact force field can be specified in the configuration.

Note

In previous versions of femto, the parameters were required. However in the current version, the parameters are optional and will be automatically generated if not provided.

The vector along which the ligands will be translated must also be defined. femto provides an experimental utility to automatically compute this:

displacement = femto.fe.atm.select_displacement(
    receptor, ligand_1, ligand_2, config.setup.displacement
)

At present, it will attempt to place the ligands in each corner of the simulation box at a pre-specified distance from the binding site, and select the vector that maximizes the distance between the ligand and receptor atoms.

Warning

This method is still experimental and may not always select the optimal vector.

When running RBFE calculations, the ligand 'reference' atoms (i.e. those that will be used for the alignment restraint) can be optionally specified:

ligand_1_ref_query = ["idx. 7",  "idx. 11", "idx. 23"]  # OR None
ligand_2_ref_query = ["idx. 12", "idx. 7",  "idx. 21"]  # OR None

If selecting by index like above, the indices will always be relative to the ligand. See the MDTop documentation for a fuller guide on the atom selection syntax which extends beyond PyMol.

If not specified, these will be automatically selected. By default (ligand_method='chen'), the distance between each atom in the first ligand and each atom in the second ligand is calculated. Pairs of atoms with the smallest distance will be greedily selected, ignoring pairs that would lead to all atoms being co-linear.

Warning

This method is still experimental and may not always select the optimal reference atoms.

Similarly, the receptor atoms that define the binding site can be optionally specified:

receptor_ref_query = [
    "resi 36+39+40+42+43+77+80+81+84+95+97+111+113+114+117+118+214+215+217+218 and name CA"
]  # OR None

If not specified, these will be automatically selected. By default, these will include all alpha carbons within a defined cutoff from the ligand.

The full complex topology and OpenMM system can then be created:

complex_topology, complex_system = femto.fe.atm.setup_system(
    config.setup,
    receptor,
    ligand_1,
    ligand_2,
    [],
    displacement,
    receptor_ref_query,
    ligand_1_ref_query,
    ligand_2_ref_query,
)

At this point, the system object will contain the fully parameterized system, a center-of-mass restraint on each ligand, an 'alignment' restraint between the two ligands, and a light position restraint on (by default) the alpha carbons of the receptor. When solvating, a cavity will be created where the ligands will be translated to.

If HMR or REST2 were enabled in the config, the system will also contain the appropriate modifications enabling these.

Tip

The solvated system can be saved for easier inspection and checkpointing:

complex_topology.to_file("system.pdb")

Equilibration#

The equilibration procedure is reasonably flexible, and is almost fully specified by the configuration. It comprises the list of 'stages' to run sequentially for each state, including minimization, temperature annealing, and either NVT or NPT MD simulation.

The default procedure is to:

energy minimize the system
anneal the temperature from 50 K to 300 K over 100 ps
run 300 ps of NPT simulation at 300 K
set the box vectors to the maximum across all states as HREMD will be run at constant volume

where at each stage light flat bottom position restraints are applied to any protein and ligand atoms.

The equilibration across all states can be run using femto.fe.atm.equilibrate_states:

import femto.md.constants

coords = femto.fe.atm.equilibrate_states(
    complex_system,
    complex_topology,
    config.states,
    config.equilibrate,
    displacement,
    femto.md.constants.OpenMMPlatform.CUDA,
    reporter=None
)

The results are returned as a list of OpenMM State objects.

HREMD#

The HREMD procedure is also reasonably flexible, and is almost fully specified by the configuration. By default, each step is propagated for 600 ps as a warmup phase. All statistics collected during the warmup are discarded.

Following this, 10ns of replica exchange is performed with a timestep of 4 fs, with exchanges attempted every 4 ps.

import femto.md.reporting

output_dir = pathlib.Path("eralpha/outputs-atm")

reporter = femto.md.reporting.TensorboardReporter(output_dir)  # OR None
config.sample.analysis_interval = 10

femto.fe.atm.run_hremd(
    complex_system,
    complex_topology,
    coords,
    config.states,
    config.sample,
    displacement,
    femto.md.constants.OpenMMPlatform.CUDA,
    output_dir,
    reporter
)

We have passed an optional reporter to the run_hremd function. As the name suggests, this reporter will log statistics to a TensorBoard run file in the output directory. At present, these include online estimates of the total free energy and the free energy of each 'leg'.

The statistics collected such as reduced potentials and acceptance rates are stored in a specified output directory as a convenient Arrow parquet file ("eralpha/outputs-atm/samples.arrow").

Analysis#

The final step is to compute the free energy using MBAR or UWHAM. This can be easily done using the femto.fe.atm.compute_ddg convience function:

import femto.fe.ddg

u_kn, n_k = femto.fe.ddg.load_u_kn(output_dir/ "samples.arrow")

ddg_df = femto.fe.atm.compute_ddg(config.sample, config.states, u_kn, n_k)
print(ddg_df)

femto also offers lower level methods for computing the free energy, which can be useful for debugging or if you want to compute things like overlap matrices:

import femto.fe.ddg

n_states_leg_1 = config.states.direction.index(-1)
n_states_leg_2 = len(config.states.lambda_1) - n_states_leg_1

state_groups = [(n_states_leg_1, 1.0), (n_states_leg_2, 1.0)]

estimated, overlap = femto.fe.ddg.estimate_ddg(
    u_kn, n_k, config.sample.temperature, state_groups
)

Edges#

The ATM CLI provides an option to specify a YAML file containing edge data. This file not only includes the edges that need to be computed, but can also optionally specify the 'reference' atoms for each ligand and the receptor atoms that form the binding cavity:

receptor: eralpha
receptor_ref_query: ":36,39,40,42,43,77,80,81,84,95,97,111,113,114,117,118,214,215,217,218 & @CA"

edges:
  - {ligand_1: 2d, ligand_2: 2e, ligand_1_ref_atoms: ["@7", "@11", "@23"], ligand_2_ref_atoms: ["@12", "@7", "@21"]}
  - {ligand_1: 2d, ligand_2: 3a, ligand_1_ref_atoms: ["@7", "@11", "@23"], ligand_2_ref_atoms: ["@15", "@10", "@5"]}
  - ...

The receptor field optionally specifies the name of the receptor, and should match the name of a subdirectory in your proteins directory. The optional receptor_ref_query field defines the AMBER style selection mask to use to manually select the receptor atoms that define the binding site. If unspecified, such atoms will be selected automatically as described above.

At minimum, the edges field must define pairs of ligands to compute the binding free energy between:

edges:
  - {ligand_1: 2d, ligand_2: 2e}
  - {ligand_1: 2d, ligand_2: 3a}
  - ...

where each name should match the name of a subdirectory in your forcefield directory. In this case where no reference atoms are specified, the reference atoms will be selected based on the configuration as defined above.