Skip to content

ddg #

Estimate ddG values from sampled data.

Modules:

  • femto

    A comprehensive toolkit for predicting free energies

Functions:

  • load_u_kn

    Loads the reduced potentials from a replica exchange sampler output file and

  • estimate_ddg

    Estimate the free energy change for each group of states as well as the total

load_u_kn #

load_u_kn(results_path: Path) -> tuple[ndarray, ndarray]

Loads the reduced potentials from a replica exchange sampler output file and re-orders / de-correlates them into a form acceptable by MBAR.

Note

Samples will be de-correlated using the pymbar.timeseries module.

Parameters:

  • results_path (Path) –

    The path to the arrow replica exchange output file.

Returns:

  • tuple[ndarray, ndarray]

    The loaded u_kn and n_k arrays.

Source code in femto/fe/ddg.py
def load_u_kn(results_path: pathlib.Path) -> tuple[numpy.ndarray, numpy.ndarray]:
    """Loads the reduced potentials from a replica exchange sampler output file and
    re-orders / de-correlates them into a form acceptable by MBAR.

    Note:
        Samples will be de-correlated using the ``pymbar.timeseries`` module.

    Args:
        results_path: The path to the arrow replica exchange output file.

    Returns:
        The loaded ``u_kn`` and ``n_k`` arrays.
    """
    import pymbar.timeseries

    with pyarrow.OSFile(str(results_path), "rb") as file:
        with pyarrow.RecordBatchStreamReader(file) as reader:
            output_table = reader.read_all()

    replica_to_state_idx = numpy.hstack(
        [numpy.array(x) for x in output_table["replica_to_state_idx"].to_pylist()]
    )

    # group the data along axis 1 so that data sampled in the same state is grouped.
    # this will let us more easily de-correlate the data.
    u_kn = numpy.hstack([numpy.array(x) for x in output_table["u_kn"].to_pylist()])
    u_kn_per_k = [u_kn[:, replica_to_state_idx == i] for i in range(len(u_kn))]

    n_uncorrelated = u_kn.shape[1] // u_kn.shape[0]

    g = pymbar.timeseries.statistical_inefficiency_multiple(
        [
            u_kn_per_k[i][i, i * n_uncorrelated : (i + 1) * n_uncorrelated]
            for i in range(len(u_kn))
        ]
    )
    uncorrelated_frames = _uncorrelated_frames(n_uncorrelated, g)

    for state_idx, state_u_kn in enumerate(u_kn_per_k):
        u_kn_per_k[state_idx] = state_u_kn[:, uncorrelated_frames]

    u_kn = numpy.hstack(u_kn_per_k)
    n_k = numpy.array([len(uncorrelated_frames)] * u_kn.shape[0])

    return u_kn, n_k

estimate_ddg #

estimate_ddg(
    u_kn: ndarray,
    n_k: ndarray,
    temperature: Quantity,
    state_groups: list[tuple[int, float]] | None = None,
) -> tuple[dict[str, float], dict[str, ndarray]]

Estimate the free energy change for each group of states as well as the total free energy change.

Parameters:

  • u_kn (ndarray) –

    The reduced potentials.

  • n_k (ndarray) –

    The number of samples per state.

  • temperature (Quantity) –

    The temperature at which the samples were collected.

  • state_groups (list[tuple[int, float]] | None, default: None ) –

    If certain states should be grouped together (e.g. leg 1 and leg 2 in ATM), this should be a list of tuples containing the number of states belonging to each group as well as the multiplier to use when summing the total free energy.

Returns:

  • tuple[dict[str, float], dict[str, ndarray]]

    The free energy estimates and the overlap matrices.

Source code in femto/fe/ddg.py
def estimate_ddg(
    u_kn: numpy.ndarray,
    n_k: numpy.ndarray,
    temperature: openmm.unit.Quantity,
    state_groups: list[tuple[int, float]] | None = None,
) -> tuple[dict[str, float], dict[str, numpy.ndarray]]:
    """Estimate the free energy change for each group of states as well as the total
    free energy change.

    Args:
        u_kn: The reduced potentials.
        n_k: The number of samples per state.
        temperature: The temperature at which the samples were collected.
        state_groups: If certain states should be grouped together (e.g. leg 1 and
            leg 2 in ATM), this should be a list of tuples containing the number of
            states belonging to each group as well as the multiplier to use when
            summing the total free energy.

    Returns:
        The free energy estimates and the overlap matrices.
    """
    beta = 1.0 / (openmm.unit.MOLAR_GAS_CONSTANT_R * temperature)

    state_idx = 0
    sample_idx = 0

    total_ddg = 0.0
    total_ddg_error = 0.0

    estimates = {}
    overlaps = {}

    state_groups = state_groups if state_groups is not None else [(len(n_k), 1.0)]

    for group_idx, (n_states, factor) in enumerate(state_groups):
        group_n_k = n_k[state_idx : state_idx + n_states]
        group_u_kn = u_kn[
            state_idx : state_idx + n_states, sample_idx : sample_idx + group_n_k.sum()
        ]

        f_i, f_i_variance, weights = _estimate_f_i(group_u_kn, group_n_k)
        overlaps[f"overlap_{group_idx}"] = group_n_k * (weights.T @ weights)

        ddg = (f_i[-1] - f_i[0]) / beta
        ddg = float(ddg.value_in_unit(openmm.unit.kilocalorie_per_mole))

        ddg_error = numpy.sqrt(f_i_variance[-1] + f_i_variance[0]) / beta
        ddg_error = float(ddg_error.value_in_unit(openmm.unit.kilocalorie_per_mole))

        total_ddg += factor * ddg
        total_ddg_error += ddg_error**2

        state_idx += n_states
        sample_idx += group_n_k.sum()

        estimates[f"ddG_{group_idx}_kcal_mol"] = ddg
        estimates[f"ddG_{group_idx}_error_kcal_mol"] = ddg_error

    total_ddg_error = numpy.sqrt(total_ddg_error)

    estimates["ddG_kcal_mol"] = total_ddg
    estimates["ddG_error_kcal_mol"] = total_ddg_error

    return estimates, overlaps