A machine-learning compression method shrinks the gut-phage PhIP-Seq library by 78% while tripling the share of antibody-reactive peptides.
Nature Communications · February 21, 2024 · DOI: 10.1038/s41467-024-45601-8
Liebhoff A-M, Venkataraman T, Morgenlander WR, et al. Efficient encoding of large antigenic spaces by epitope prioritization with Dolphyn. Nat Commun. 2024;15(1):1577. doi:10.1038/s41467-024-45601-8
Liebhoff, A.-M., Venkataraman, T., Morgenlander, W. R., Na, M., Kula, T., Waugh, K., Morrison, C., Rewers, M., Longman, R., Round, J., Elledge, S., Ruczinski, I., Langmead, B., & Larman, H. B. (2024). Efficient encoding of large antigenic spaces by epitope prioritization with Dolphyn. Nature Communications, 15(1), 1577. https://doi.org/10.1038/s41467-024-45601-8
@article{liebhoff2024dolphyn,
title = {Efficient encoding of large antigenic spaces by epitope prioritization with {Dolphyn}},
author = {Liebhoff, Anna-Maria and Venkataraman, Thiagarajan and Morgenlander, William R. and Na, Miso and Kula, Tomasz and Waugh, Kathleen and Morrison, Charles and Rewers, Marian and Longman, Randy and Round, June and Elledge, Stephen and Ruczinski, Ingo and Langmead, Ben and Larman, H. Benjamin},
journal = {Nature Communications},
volume = {15},
number = {1},
pages = {1577},
year = {2024},
doi = {10.1038/s41467-024-45601-8}
}
Dolphyn is a peptide-library design method that compresses very large antigenic spaces into PhIP-Seq libraries small enough to run cost-effectively while raising the fraction of antibody-reactive peptides three-fold. The method uses a random-forest classifier to rank short peptides by their probability of carrying a B-cell epitope, then "stitches" the highest-probability fragments together with flexible linkers so that one synthetic 45–50 amino acid peptide can carry epitopes from three native sequences. Applied to the human gut phageome, Dolphyn shrank the library from 484,761 to 106,762 peptides — a 78% reduction — and the team used the resulting library to characterize the antibody response to E. coli-infecting bacteriophages in human serum, placing gut phages firmly inside the gut–immune axis.
In this publication:
The human gut hosts trillions of bacteriophages — viruses that infect bacteria — whose antigens make up an enormous "antigenic space." Profiling antibody responses across that space with phage immunoprecipitation sequencing (PhIP-Seq) is in principle straightforward, but in practice the peptide library needed to tile every gut-phage protein is so large that the assay becomes too expensive to deploy at cohort scale.
Dolphyn is the authors' answer to this scaling problem. It does two things at once. First, a random-forest classifier scores every 15 amino acid window of every input protein on its likelihood of containing a linear B-cell epitope, based on amino acid composition; only the highest-probability windows are kept. Second, an "epitope-stitching" step concatenates three high-probability 15-mers from one protein into a single synthetic 45 amino acid peptide, joining them with flexible Gly–Gly–Gly–Gly–Ser linkers so that the immunological behavior of each fragment is preserved. The compressed library still represents three times as many epitope candidates per oligonucleotide synthesized.
The team validated the design on the human virome (using VirScan reactivity data for EBV, rhinovirus, and CMV peptides) and on enterovirus reactivities from the DAISY pediatric cohort, and showed that Dolphyn captured the same immune signal as a much larger uniformly-tiled library while consuming roughly three-fold fewer peptides. Then they applied Dolphyn to a database of 142,810 gut bacteriophage genomes, built a 106,762-peptide library, and used it to characterize the gut-phage antibody response in a cohort of healthy adults. The headline finding: people make antibodies against their gut phages, and the dominant signal comes from E. coli-infecting Myoviridae — placing gut phages firmly inside the gut–immune axis as a recognized antigenic compartment.

Why reactome profiling captures biology that other immune assays miss.
Read more
Ben Larman at AIRRC7: the science behind MIPSA library construction.
Read more
Why reactome profiling captures biology that other immune assays miss.
Read more
How MIPSA deciphers the antibody reactome: the three-step workflow at a glance.
Read more
Antibody Reactomics: the new dimension for precision immunology research.
Read more
Where the antibody reactome fits in the multi-omic stack, how MIPSA deciphers it, and what the HuSIGHT, VirSIGHT, and En…
Read more
Ben Larman on translating dense reactome data: Antibody Reactomics framework, Complex Data Delivery.
Read more
Schistosoma mansoni vaccine candidates identified via unbiased screening in self-cured macaques.
Read more
Common food proteins drive systemic IgG responses in non-allergic adults (Cell Immunity, 2022).
Read more
Methods paper: statistical detection of true antibody reactivities in sequencing-based reactome data.
Read more
Credle et al., Nature Comm 2022: the foundational MIPSA paper introducing antibody reactome profiling.
Read more
Bioinformatic deconvolution of cross-reactive signals in virome-wide antibody reactome data.
Read more