Stanford Engineers Crack Single-Molecule Protein Sequencing by Reverse-Translating Peptides into DNA
Stanford bioengineers publish a reverse-translation chemistry in Nature Biotechnology that converts peptides into DNA barcodes, achieving single-molecule resolution up to 1,000 times more sensitive than mass spectrometry.
A team at Stanford University has developed a method that converts protein sequences into DNA, allowing standard high-throughput DNA sequencers to read peptides at single-molecule resolution. The technique, published March 18 in Nature Biotechnology, overcomes longstanding sensitivity limits of mass spectrometry and opens a path toward routine, large-scale protein analysis from minimal biological samples.
The method, which the researchers call “reverse translation,” borrows from the central dogma of molecular biology but runs it in the opposite direction. Instead of DNA encoding proteins, proteins are chemically re-encoded as DNA, a transformation that hands the sequencing problem to an ecosystem of DNA analysis tools already optimized for speed, cost, and throughput.
How It Works
The process begins with a modified version of Edman degradation, the classical chemistry for sequentially cleaving amino acids from the end of a peptide chain. In the Stanford approach, each iterative cycle releases the N-terminal amino acid and tags it with a DNA barcode unique to the originating peptide molecule. Cycle-specific synthetic DNA sequences mark the position from which each amino acid was removed, preserving the order of the original protein sequence.
Once labeled, the barcoded amino acids are identified through antibody-mediated proximity extension assays. In this step, pairs of antibodies recognizing specific amino acid types bind to their targets, and because each antibody carries a DNA oligonucleotide, proximity between bound antibody pairs triggers a ligation or extension reaction. The result is a PCR-amplifiable DNA reporter that encodes three pieces of information: the identity of the amino acid, its position in the peptide chain, and which specific peptide molecule it came from, according to the paper’s abstract.
The accumulated reporters form a DNA library that is fed directly into a conventional high-throughput sequencer. Each sequencing read corresponds to a single amino acid event, and computational assembly reconstructs full peptide sequences from millions of parallel reads.
Sensitivity Gains Over Mass Spectrometry
The practical advantage is sensitivity. Mass spectrometry, the dominant tool for protein identification, requires large sample inputs and loses the vast majority of molecules during ionization and detection. Liwei Zheng, a research engineer in the Department of Radiology at Stanford and the study’s first author, described the gap in concrete terms: “With mass spectrometry, you’re shooting 1 billion to 10 billion protein molecules and see, typically, a million molecules out of it. With our method, you can potentially see 1,000 times that amount,” Zheng told Stanford Report.
That sensitivity enables analysis at the single-molecule level from samples too small for mass spectrometry to characterize. The researchers demonstrated full sequence coverage across millions of reads and showed that the method accurately distinguishes not only native peptide sequences but also peptides carrying post-translational modifications, including phosphorylation and glycosylation variants. These chemical decorations alter protein function in disease-relevant ways but are notoriously difficult for existing tools to resolve at single-molecule scale.
Applications in Cancer and Cell Biology
The ability to sequence proteins from tiny samples has immediate implications for cancer immunotherapy research. One open question in the field is why CAR-T cell therapy succeeds in some cancers but fails in others. Understanding the proteomic differences between individual immune cells could reveal which surface proteins and signaling molecules distinguish effective from ineffective T-cell responses, but current methods lack the sensitivity to profile single cells at the protein level without extensive amplification that introduces bias.
More broadly, the technique addresses a fundamental puzzle in cell biology: why genetically identical cells in the same tissue respond differently to disease or treatment. Genomics and transcriptomics capture DNA and RNA variation, but proteins are the functional molecules that carry out cellular work. A method that reads proteins with the throughput and sensitivity of DNA sequencing closes that measurement gap.
Toward a Push-Button Instrument
The work has already been licensed for commercial development, with the goal of packaging the chemistry into a self-contained instrument. H. Tom Soh, the W. M. Keck Foundation Professor of Electrical Engineering at Stanford and the study’s senior author, described the vision as a device where researchers could “put in a sample, press a button, and have it go,” analogous to the workflow that DNA sequencing instruments offer today, as reported by Stanford Report.
Three of the five authors, Zheng, co-author Yujia Sun, and Soh, are listed as coinventors on a pending U.S. patent application covering the reverse-translation method. The remaining authors, Linus A. Hein and Michael Eisenstein, both of Stanford’s Department of Electrical Engineering, declared no competing interests.
What Remains to Be Proven
The study demonstrates the chemistry on defined peptide mixtures under controlled conditions. Translating it to complex biological samples, such as the full proteome of a tumor biopsy or a single immune cell, will require additional engineering to handle the diversity of amino acid chemistries, the dynamic range of protein abundances, and potential interference from non-peptide biomolecules.
The antibody panel used for amino acid recognition must also scale. The current proximity extension assay identifies amino acids by type, and expanding coverage to all 20 standard amino acids with high specificity will be critical for comprehensive proteome analysis. Whether the method can maintain its sensitivity advantage at that scale is an open question.
Nonetheless, the core principle, converting a protein sequencing problem into a DNA sequencing problem, leverages decades of infrastructure investment in next-generation sequencing. If the chemistry proves robust in clinical and research settings, it could do for proteomics what Illumina-era sequencing did for genomics: make comprehensive molecular profiling routine, fast, and affordable.