A team of researchers from California Institute of Technology (Caltech), Quebec AI Institute Mila, and several leading academic institutions has introduced a new AI system capable of designing entirely novel enzymes for chemical reactions that do not exist in nature. The development is being viewed as a potential turning point for fields such as drug discovery, industrial chemistry, and synthetic biology, where progress has historically been constrained by the limits of natural evolution.
The system, named DISCO — short for DIffusion for Sequence-structure CO-design — is designed to generate both the amino acid sequence and the three-dimensional structure of a protein simultaneously. Unlike conventional methods, it does not require predefined assumptions about catalytic mechanisms or active site configurations. Instead, it is provided only with a target molecule, and it independently constructs a protein model capable of interacting with it.
The research effort spans multiple institutions, including Caltech, Mila, Université de Montréal, McGill University, the University of Cambridge, Oxford, and Imperial College London, and includes Nobel laureate Frances Arnold among its corresponding authors, reflecting the project’s strong connection to established enzyme engineering research.
Enzyme design has traditionally been limited by the constraints of both natural evolution and computational methodology. While biological evolution has produced highly efficient catalysts, it has only explored a relatively narrow subset of possible chemical transformations. Many reactions that are highly valuable for industrial or pharmaceutical applications remain absent from biology simply because they were never selected for in natural environments.
Conventional computational approaches have also faced structural limitations. One major constraint is the requirement to define catalytic residue arrangements in advance, which presupposes detailed mechanistic knowledge that is often unavailable for novel reactions. Another limitation is the separation of protein design into sequential steps, where sequence and structure are handled independently. This separation can lead to information loss, since enzymatic function depends on the integrated relationship between both.
DISCO is designed to overcome these constraints by jointly modeling sequence and structure within a unified framework. The system generates amino acid sequences and atomic coordinates together in a single process, allowing structural and functional relationships to emerge during generation rather than being imposed beforehand. This approach enables the system to propose enzymes for specific chemical targets without relying on pre-engineered catalytic blueprints or human-defined active sites.
Experimental validation of DISCO focused on carbene-transfer chemistry, a class of reactions that does not occur in known biological systems but is highly relevant for modern synthetic chemistry, particularly in pharmaceutical synthesis.
From approximately 20,000 computationally generated enzyme candidates, 90 were selected for laboratory testing across four reaction types. The results indicated strong performance relative to both naturally evolved enzymes and previously engineered artificial systems.
In a benchmark cyclopropanation reaction, the highest-performing DISCO-designed enzyme achieved 4,050 total turnovers with a 72 percent yield, exceeding both early engineered cytochrome P450 variants and previously published computational enzyme designs that relied on structured catalytic templates. In a carbon–boron bond formation reaction, a single unoptimized DISCO design surpassed performance levels that had previously required multiple rounds of directed evolution, achieving a substantial increase over baseline activity. In a carbon–hydrogen insertion reaction, the system matched outcomes that had previously taken many cycles of laboratory evolution to reach, but achieved them in a single computational step.
Beyond catalytic performance, the designs also demonstrated structural novelty. When compared against large-scale protein structure databases, many of the generated motifs showed little or no similarity to known natural proteins. One of the most effective designs appeared to be derived from a non-catalytic DNA-binding protein found in an extremophile organism, despite having only limited sequence similarity and no known enzymatic function. The resulting active site geometry diverged significantly from known biological templates, suggesting that the system is capable of repurposing existing protein folds for entirely new chemical purposes.
The engineered enzymes also exhibited adaptability under mutation. In follow-up experiments, random mutagenesis produced multiple improved variants, and in some cases altered stereochemical outcomes, indicating that the generated structures retain evolutionary flexibility. This characteristic is often considered essential for long-term practical application, as it allows further optimization through traditional laboratory methods.
The findings suggest a shift in how enzyme design may be approached, moving away from manually constructed catalytic hypotheses toward generative systems capable of producing functional starting points for further evolution. While the broader implications remain to be fully validated, the work highlights a growing possibility that previously unexplored regions of chemical space may now be computationally accessible.
The post DISCO Breaks Enzyme Design Barrier, Creating Proteins With No Equivalent In Nature appeared first on Metaverse Post.


