Date of Award

December 2023

Degree Type

Dissertation

Degree Name

Doctor of Philosophy

Department

Physics

First Advisor

Abbas Dist.Prof. Ourmazd

Committee Members

Peter Prof. Schwander, Ahmad Asst. Prof. Hosseinzadeh, Marius Prof. Schmidt, Ionel Assoc. Prof. Popa

Abstract

Biological molecules can assume a continuous range of conformations during function. Near equilibrium, the Boltzmann relation connects a particular conformation's free energy to the conformation's occupation probability, thus giving rise to one or more energy landscapes. Biomolecular function proceeds along minimum-energy pathways on such landscapes. Consequently, a comprehensive understanding of biomolecular function often involves the determination of the free-energy landscapes and the identification of functionally relevant minimum-energy conformational paths on these landscapes. Specific techniques are necessary to determine continuous conformational spectra and identify functionally relevant conformational trajectories from a collection of raw single-particle snapshots from, e.g. cryogenic electron microscopy (cryo-EM) or X-ray diffraction. To assess the capability of different algorithms to recover conformational landscapes, we:• Measure, compare, and benchmark the performance of four leading data-analytical approaches to determine the accuracy with which energy landscapes are recovered from simulated cryo-EM data. Our simulated data are derived from projection directions along the great circle, emanating from a known energy landscape. • Demonstrate the ability to recover a biomolecule's energy landscapes and functional pathways of biomolecules extracted from collections of cryo-EM snapshots.

Structural biology applications in drug discovery and molecular medicine highlight the importance of the free-energy landscapes of the biomolecules more crucial than ever. Recently several data-driven machine learning algorithms have emerged to extract energy landscapes and functionally relevant continuous conformational pathways from single-particle data (Dashti et al., 2014; Dashti et al., 2020; Mashayekhi,et al., 2022). In a benchmarking study, the performance of several advanced data-analytical algorithms was critically assessed (Dsouza et al., 2023). In this dissertation, we have benchmarked the performance of four leading algorithms in extracting energy landscapes and functional pathways from single-particle cryo-EM snapshots. In addition, we have significantly improved the performance of the ManifoldEM algorithm, which has demonstrated the highest performance. Our contributions can be summarized as follows.: • Expert user supervision is required in one of the main steps of the ManifoldEM framework wherein the algorithm needs to propagate the conformational information through all angular space. We have succeeded in introducing an automated approach, which eliminates the need for user involvement. • The quality of the energy landscapes extracted by ManifoldEM from cryo-EM data has been improved, as the accuracy scores demonstrate this improvement.

These measures have substantially enhanced ManifoldEM’s ability to recover the conformational motions of biomolecules by extracting the energy landscape from cryo-EM data.In line with the primary goal of our research, we aimed to extend the automated method across the entire angular sphere rather than a great circle. During this endeavor, we encountered challenges, particularly with some projection directions not following the proposed model. Through methodological adjustments and sampling optimization, we improved the projection direction's conformity to the model. However, a small subset of Projection directions (5 %) remained challenging. We also recommended the use of specific methodologies, namely feature extraction and edge detection algorithms, to enhance the precision in quantifying image differentiation, a crucial component of our automated model. we also suggested that integrating different techniques might potentially resolve challenges associated with certain projection directions. We also applied ManifoldEM to experimental cryo-EM images of the SARS-CoV-2 spike protein in complex with the ACE2 receptor. By introducing several improvements, such as the incorporation of an adaptive mask and cosine curve fitting, we enhanced the framework's output quality. This enhancement can be quantified by observing the removal of the artifact from the energy landscape, especially if the post-enhancement landscape differs from the artifact-affected one. These modifications, specifically aimed at addressing challenges from Nonlinear Laplacian Spectral Analysis (NLSA) (Giannakis et al., 2012), are intended for application in upcoming cryo-EM studies utilizing ManifoldEM.

In the closing sections of this dissertation, a summary and a projection of future research directions are provided. While initial automated methods have been explored, there remains room for refinement. We have offered numerous methodological suggestions oriented toward addressing solutions to the challenge of conformational information propagation. Key methodologies discussed include Manifold Alignment, Canonical Correlation Analysis, and Multi-View Diffusion Maps. These recommendations are aimed to inform and guide subsequent developments in the ManifoldEM suite.

Share

COinS