Researchers develop AI model that makes large-scale molecular screening practical for the first time
by Ruth Ntumba
天美传媒 researcher among a team working on a new AI model that makes screening up to one million molecules against a protein target practical for the first time.
The Genesis Research Team, with contribution from from Imperial’s Department of Computing and Carnegie Mellon University, has , the first flow map model for all-atom cofolding.
Our paper shows this Inference cost can be dramatically reduced for state of the art cofolding models like Pearl without a trade-off in performance---unlocking much faster virtual screening capabilities which are critical in AI based drug programs." Dr Joey Bose Assistant Professor
Cofolding means generating the precise three-dimensional shape of a protein and a small binding molecule at the same time. Existing state-of-the-art models, including AlphaFold 3, work by refining a generated structure through many small incremental steps, which produces accurate results but takes significant time and computing power. This slowness creates a roadblock for practical applications in drug discovery and molecular design
DeCAF-Pearl is built on a different mathematical framework called flow maps. Rather than taking many tiny steps along the generation process, it learns to jump directly from one point on the trajectory to another, traversing the entire generation process in just a handful of steps.
Few-step generation enables two massive advantages in drug discovery and molecular design. First, virtual screening becomes practical. Cofolding an entire molecule library against a target of interest using full diffusion-based models is computationally expensive. DeCAF-Pearl makes screening up to one million molecules practical for hit identification in around 18 hours on 64 graphics processing units. Second, it unlocks scalable synthetic data generation. High-quality protein and molecule structures are the bottleneck for training downstream AI models such as scoring functions and affinity predictors. A fivefold speedup in synthetic data generation translates directly into more training data per unit of compute, without losing the structural accuracy that downstream models depend on.

When tested against 196 protein and molecule structures the model had never seen during training, DeCAF-Pearl matched the accuracy of Pearl, the full model it was derived from, and outperformed other leading tools including AlphaFold 3 and Boltz-2 despite using fewer computational steps. Pearl itself remains the most accurate model in the comparison, but DeCAF-Pearl offers a compelling alternative for throughput-critical applications where the quality versus compute trade-off can be made.
Commenting on the study Dr Joey Bose, Assistant Professor at Imperial’s Department of Computing and Senior Author of the study said: “Increasingly AI based drug discovery is moving from the era of training foundation generative models, to the era of scaling inference to generate samples that have to be reward optimized. Our paper shows this Inference cost can be dramatically reduced for state of the art cofolding models like Pearl without a trade-off in performance---unlocking much faster virtual screening capabilities which are critical in AI based drug programs."
The researchers note that all results are based on computational benchmarks, and how the tool performs in real-world applications remains to be seen.
Article text (excluding photos or graphics) © 天美传媒.
Photos and graphics subject to third party copyright used with permission or © 天美传媒.
Article people, mentions and related links
Ruth Ntumba
Faculty of Engineering