SepTop#
The Seperated Topology (SepTop) method, unlike traditional FEP methods and similar to ATM, does not require setting up any kind of hybrid topology which can be tricky to do rigorously. It proceeds instead by applying light restraints in the complex and solution phases to correctly orientate the ligands. Unlike ATM which can be run in a single shot, it currently requires computing the free energy of the ligand in the bound state, and the free energy of the ligand in the solution.
It is highly recommended to read the original and updated SepTop publications for a more detailed description of the method. The implementation here was also inspired by the SepTop implementation implemented by Baumann et al.
Here the general approach and python API for running SepTop calculations with femto
is described. See the overview
for instructions on running using the CLI.
Tip
This guide will walk through setting up all calculations serially, however femto
supports parallelisation using
MPI. See femto.fe.septop.run_complex_phase, femto.fe.septop.run_solution_phase and their respective sources for MPI ready
runners / example implementations.
Procedure#
The implementation of SepTop in femto
generally splits the setup and running of calculations into those required by
the complex phase, and those required by the solution phase.
Both phases however follow roughly the same steps, which include building and solvating the systems from the pre-prepared receptor and ligand structures, running equilibration simulation at each lambda state, running Hamiltonian replica exchange (HREMD) between each window, and finally computing the final free energy using MBAR or UWHAM.
The full procedure is configured using the femto.fe.septop.SepTopConfig class:
and partitions the options into those complex and those for the solution phase.
Setup Complex#
The setup procedure is responsible for combining the ligand(s) and receptor structures into a single complex, solvating it, selecting any reference atoms if not provided, applying restraints, and finally creating the main OpenMM system.
It begins with the already prepared (correct binding pose, protonation, etc.) structures of the ligand(s) and receptor:
import pathlib
import femto.md.prepare
eralpha_dir = pathlib.Path("eralpha")
ligand_1, ligand_2 = femto.md.prepare.load_ligands(
eralpha_dir / "forcefield/2d/vacuum.mol2",
eralpha_dir / "forcefield/2e/vacuum.mol2",
)
receptor = femto.md.prepare.load_receptor(
eralpha_dir / "proteins/eralpha/protein.pdb"
)
If the ligands / receptor has already been parameterized, the OpenMM XML or AMBER prmtop files can additionally be specified:
extra_parameters = [
eralpha_dir / "forcefield/2d/vacuum.parm7",
eralpha_dir / "forcefield/2e/vacuum.parm7",
]
If unspecified, an OpenFF force field will be used to parameterize the ligands. The exact force field can be specified in the configuration.
Note
In previous versions of femto
, the parameters were required. However in the current version, the parameters are
optional and will be automatically generated if not provided.
The ligand and receptor 'reference' atoms (i.e. those that will be used for the Boresch style restraints used to align the ligands) can be optionally specified by defining PyMol like atom selection queries:
ligand_1_ref_query = ["name C1", "name C2", "name O4"] # OR None
ligand_2_ref_query = ["name C1", "name C3", "name O5"] # OR None
receiver_ref_query = ["...", "...", "..."] # OR None
See the MDTop documentation for a fuller guide on the atom selection syntax which extends beyond PyMol.
These should match atoms in the respective ligands in isolation. If not specified, these will be automatically
selected. By default (ligand_method='baumann'
), this follows the procedure described in the
Baumann et al. publication.
Note
Currently the selection procedure only considers a single configuration of the complex rather than a trajectory, and so does not include the variance criteria.
The full complex topology and OpenMM system can then be created:
import femto.fe.septop
complex_topology, complex_system = femto.fe.septop.setup_complex(
config.complex.setup,
receptor,
ligand_1,
ligand_2,
[],
receptor_ref_query,
ligand_1_ref_query,
ligand_2_ref_query,
extra_parameters
)
At this point, the system object will contain the fully parameterized system, and a Boresch style restraint on each ligand. If HMR or REST2 were enabled in the config, the system will also contain the appropriate modifications enabling these.
Tip
The solvated system can be saved for easier inspection and checkpointing:
Setup Solution#
The solution phase calculation follows the 'two ligands separated by a distance restraint' approach outlined by Baumann et al, rather than performing two absolute hydration free energy (HFE) calculations. This is mostly as it makes it easier to horizontally scale edge calculations independantly, but the framework does very much contain the machinery to perform absolute HFE calculations if desired.
The setup procedure is responsible for combining the ligand(s) and at a fixed distance, solvating, selecting any reference atoms if not provided, applying restraints, and finally creating the main OpenMM system.
import femto.fe.septop
solution_topology, solution_system = femto.fe.septop.setup_solution(
config.solution.setup,
ligand_1,
ligand_2,
ligand_1_ref_query,
ligand_2_ref_query,
)
The distance restraint between the two ligands is applied between the first reference atom of each ligand.
At this point, the system object will contain the fully parameterized system, and a distance restraint between ligands. If HMR or REST2 were enabled in the config, the system will also contain the appropriate modifications enabling these.
Equilibration#
The equilibration procedure is reasonably flexible, and is almost fully specified by the configuration. It comprises the list of 'stages' to run sequentially for each state, including minimization, temperature annealing, and either NVT or NPT MD simulation.
The default procedure is to:
- energy minimize the system
- anneal the temperature from 50 K to 300 K over 100 ps
- run 300 ps of NPT simulation at 300 K
where at each stage light flat bottom position restraints are applied to any protein and ligand atoms.
The results are returned as a list of OpenMM State
objects.
HREMD#
The HREMD procedure is also reasonably flexible, and is almost fully specified by the configuration. By default, each step is propagated for 600 ps as a warmup phase. All statistics collected during the warmup are discarded.
Following this, 10ns of replica exchange is performed with a timestep of 4 fs, with exchanges attempted every 4 ps.
import femto.md.reporting
reporter = femto.md.reporting.TensorboardReporter(output_dir) # OR None
config.sample.analysis_interval = 10
femto.fe.septop.run_hremd(
complex_system,
complex_topology,
coords,
config.complex.states,
config.complex.sample,
femto.md.constants.OpenMMPlatform.CUDA,
output_dir / "complex",
reporter
)
import femto.md.reporting
reporter = femto.md.reporting.TensorboardReporter(output_dir) # OR None
config.sample.analysis_interval = 10
femto.fe.septop.run_hremd(
solution_system,
solution_topology,
coords,
config.solution.states,
config.solution.sample,
femto.md.constants.OpenMMPlatform.CUDA,
output_dir / "solution",
reporter
)
We have passed an optional reporter to the run_hremd
function. As the name
suggests, this reporter will log statistics to a TensorBoard run file in the output directory. At present, these
include online estimates of the total free energy and the free energy of each 'leg'.
The statistics collected such as reduced potentials and acceptance rates are stored in a specified output directory
as a convenient Arrow parquet file ("eralpha/outputs-septop/<phase>/samples.arrow"
).
Analysis#
The final step is to compute the free energy using MBAR or UWHAM. This can be easily done using the femto.fe.septop.compute_ddg convience function:
import femto.fe.ddg
import femto.fe.septop
u_kn_complex, n_k_complex = femto.fe.ddg.load_u_kn(
output_dir / "complex/samples.arrow"
)
u_kn_solution, n_k_solution = femto.fe.ddg.load_u_kn(
output_dir / "solution/samples.arrow"
)
ddg_df = femto.fe.septop.compute_ddg(
config,
u_kn_complex,
n_k_complex,
complex_system,
u_kn_solution,
n_k_solution,
solution_system
)
print(ddg_df)
Edges#
The SepTop CLI provides an option to specify a YAML file containing edge data. At present, this file can only be used to specify the edges that need to be computed:
where each name should match the name of a subdirectory in your forcefield
directory.