MIPSA (Molecular Indexing of Proteins by Self Assembly) is an antibody-reactome profiling platform developed in the Larman Lab at Johns Hopkins and commercialised by Infinity Bio. It links DNA barcodes covalently to translated antigens via a Halo-tag / Halo-ligand chemistry, enabling massively multiplexed measurement of which antigens a sample's antibodies bind to — effectively converting an antibody profiling question into a DNA synthesis and sequencing question.

Can MIPSA detect both peptide and full-length antigen reactivities?

Yes. Unlike phage or yeast display platforms which are typically limited to short peptides, MIPSA can present both peptide libraries (encoded by oligonucleotide libraries) and open reading frame (full-length protein) libraries simultaneously. This is important because auto-antibodies in particular tend to recognise highly conformational epitopes that are only captured by full-length antigens.

How does MIPSA improve diagnostic yield in viral encephalitis?

In a study of patients with suspected infectious encephalitis, applying unbiased MIPSA viral antibody profiling on top of standard clinical testing improved the diagnostic yield by approximately 44%. The approach compares peptide reactivity at admission versus 14 days later, identifies new (off-diagonal) reactivities, deconvolutes cross-reactive responses using a network and statistical framework, and reports per-virus evidence.

What standards does the antibody reactome field need, according to Larman?

Larman calls for three field-wide standards: (1) standardised analytes — collections of monoclonal antibodies covering linear, conformational and post-translationally-modified epitopes that can be run blinded across platforms; (2) a queryable repository for antibody reactome data with minimum metadata requirements analogous to the AIRR data standard for repertoire sequencing; and (3) transparent published pipelines for converting raw to processed data, so platforms can be benchmarked and findings reproduced.

Does Infinity Bio require IP rights to samples submitted for MIPSA service?

No. Infinity Bio operates as a service business — researchers send samples and receive antibody reactome data back, with no IP restraints attached to the service relationship.

Antibody Reactome Profiling via self-assembling libraries of DNA barcoded antigens — AIRRC7

How DNA-barcoded antigen libraries make antibody reactome measurement reproducible at any cohort scale.

Watch on YouTube Hosted on the AIRR Community channel — no separate audio feed available.

In this AIRRC7 (AIRR Community Meeting VII, Porto, June 2024) talk, Dr. Ben Larman — Associate Professor at Johns Hopkins and founder of Infinity Bio — introduces antibody reactome profiling: massively multiplexed measurement of what circulating antibodies bind to. He walks through MIPSA (Molecular Indexing of Proteins by Self Assembly), the DNA-barcoded antigen platform Infinity Bio commercializes, shows clinical proof-of-concept results in COVID-19 and infectious encephalitis, and makes a direct call to the AIRR Community to help establish field-wide standards for antibody reactivity data.

In this episode:

DNA → Antibody: MIPSA converts an antibody profiling question into a DNA synthesis and sequencing question via Halo-tag self-assembly.
+44% Yield: unbiased viral profiling layered on top of standard clinical testing raised diagnostic yield ~44% in encephalitis.
Anti-IFN signal: full-length human-proteome MIPSA screening replicated type-1 interferon neutralising auto-antibodies in severe COVID-19.
Field Standards: Larman asks the AIRR Community for spike-in monoclonals, a queryable reactome repository, and transparent pipelines.

Full transcript

Note on this transcript: Generated from the YouTube auto-caption track and lightly cleaned (filler words removed, stutters collapsed). All scientific terms preserved verbatim where audible; minor caption-recognition artifacts (e.g. "carnita" for "Encarnita," "airc" for "AIRR-C," "Eliza" for "ELISA," "calent" for "covalent," "Fage" for "phage," "eoli" for "E. coli") may remain. For exact quoting, please verify against the YouTube audio.

[0:02] Dr. Ben Larman: Well, I want to express my gratitude to Encarnita and Victor for the invitation to be here — first time at the AIRR-C meeting, and really impressed with the level of attention to detail and rigor. I have to admit, I have a little bit of rigor envy, because in the field that I work in — which is sort of adjacent to repertoire sequencing — there's really a lack of standards. So my goal today is to hopefully convince you that the adjacent field of antibody reactivity profiling is worthwhile, and also to recruit your help in establishing the rigor in that field that you've done in yours. I think we're all here because we appreciate the breadth and central importance of immune responses in many health and disease states.

[1:17] Dr. Ben Larman: Immune responses are shaped by your inherited genetics along with your lifetime of environmental exposures, the random stochastic mechanisms that the immune system uses to generate diversity — and all of these contribute to our health and, either indirectly or directly, can be associated with disease states. So I like to think about antibodies as a way that the immune system can tell us its story. We have a highly complex repertoire of these protein molecules in our blood, and the abundance is staggering: about 800 trillion antibody molecules per 20-microlitre drop of blood. If you imagine an antibody molecule being a word, you can ask how many novels could be written in a drop of blood — there's one for every human on earth, about 8 billion novels.

[2:32] Dr. Ben Larman: Written in the antibody molecules. This repertoire is sculpted over our life by all of our exposures — microbiome, infections, genetics, vaccinations — and also by our demographics, age, nutritional status etc., and even the presence of neoplasms that the immune system is seeking to recognize. The repertoire, in addition to providing humoral immunity, is really a deep source of information that we can mine in the form of biomarkers. It maintains memory of prior responses, so even after the antigens are gone they bind their targets with high affinity and are really stable to freeze-thaw and storage. The constant region of the antibody — the isotype — can reflect the context in which the immune response happened. And it's really easily accessible and in high abundance.

[3:46] Dr. Ben Larman: I'm going to entertain you with an idea about where the medical field is going. Imagine me visiting my doctor 10 years from now for an annual checkup — of course this doctor is going to be an AI chatbot avatar that accesses all known medical knowledge, and because of other science advances I now have a full head of hair. He wants to talk about my antibody profile — my antibody reactome — and how it's changed over the last year. In all of these examples, we actually have data from my lab and others to support the feasibility of these types of reports. He's letting me know that he found a cancer reactivity but don't worry — it was apparently a productive response and so most likely cleared the tumour, but out of an abundance of caution we're going to do some body scanning to confirm there's no tumour there.

[5:01] Dr. Ben Larman: He did a profile of antiviral antibodies and found that my titres were good, so I can skip the vaccine this year — that's great. They know I filled out a questionnaire and mentioned I'm going to move for a new job, and they did a profile of my IgE antibodies, and now he's offering to provide a geographic map of compatibility with my allergy exposure profile. Then he mentions that they detected a new lupus auto-antibody with a 5-to-10-year prediction horizon — and this is really true, these antibodies come up many years before disease expresses itself. Based on existing evidence it's going to propose an immune reboot: some diet modifications, a short course of oral steroids, another profile, and then if it's still looking like an issue maybe B-cell depletion therapy as a preventative measure for an autoimmune disease. A lot has happened in my antibodies over a year.

[6:16] Dr. Ben Larman: All this information is there in the circulating antibody repertoire — about cancer, autoimmunity, infections, susceptibility to infections, and our allergies. We think a lot in this community about sequencing — can we really read these types of stories from cells that may or may not be abundantly available in the periphery? Maybe mass spec direct sequencing of antibodies will really develop over the next 10 years, but even if so, can we predict their targets if we know their sequence? What's more likely, and what's already been developing, is an indirect measurement of what they bind to. We're all familiar with ELISAs measuring one antigen specificity at a time — what we need are technologies that can provide the equivalent of hundreds of thousands of ELISAs simultaneously. This is what we refer to as antibody reactome profiling.

[7:32] Dr. Ben Larman: You may be familiar with some of these platforms developed over the years — protein microarrays or peptide microarrays, really just a highly multiplexed ELISA. They typically use full-length antigens, which is a great benefit, but have been challenging to standardise based on the quality of the antigens, the fact that they're dried on a surface, throughput on a slide is pretty low, and they tend to have very high costs limiting their use. An alternative that's lower cost and provides better antigen quality control is bead arrays like Luminex — but these are pretty limited in the level of multiplexing. With the advances of high-throughput DNA synthesis and sequencing, we and others have adopted molecular display platforms like phage display or yeast display to leverage the massively multiplex nature of DNA synthesis and sequencing. These can be highly multiplexed, in the hundreds of thousands.

[8:47] Dr. Ben Larman: But they tend to only display short peptides or protein fragments on their surface. So over the past couple of years we've developed a technique to overcome some of these limitations called Molecular Indexing of Proteins by Self Assembly, or MIPSA. The goal is just to put a simple DNA barcode directly covalently attached to antigens in a high-throughput way. This has the advantage that both peptide libraries encoded by oligonucleotide libraries and open reading frame libraries can be expressed and presented as an antigen library at the same time. The limitation at this stage is that these libraries are created in E. coli S30 lysates, but we're working on some eukaryotic alternatives.

[10:00] Dr. Ben Larman: The chemistry we use to link the DNA barcodes to the antigens involves a Halo tag system. The Halo tag is a 30-kilodalton protein that can be fused to a protein or peptide of interest, co-expressed, and forms a covalent bond with a Halo ligand — a small molecule that can be covalently attached to DNA barcodes. As drawn, you'd have to attach the DNA barcodes to the antigens one at a time, which isn't scalable. What we developed with MIPSA is a self-assembly process that allows the protein to add its own DNA barcode during translation of the antigen library. Joel Credle — a postdoc in the lab who worked on this and is now Director of R&D at Infinity Bio — led that. With this technology, we're able to convert an antibody profiling question into a DNA synthesis and sequencing question.

[11:18] Dr. Ben Larman: We can take databases of antigens and use the MIPSA technique to convert those into barcoded antigen libraries. If we're trying to cover protein sequences that are not previously cloned, we can design peptides that tile across each of those proteins — for example, a 60-amino-acid peptide tile spanning across a protein. We stitch together the DNA encoding each of these peptides with a Halo-tag-encoding gene and a DNA barcode. This is a self-assembly process used to maximise uniformity and minimise dropout. Then we make RNA from that DNA construct and reverse-transcribe it with a Halo-ligand-associated primer, so that we now have a cDNA barcode, and then translate this library of molecules.

[12:33] Dr. Ben Larman: Once the library is made, the antigens — full-length or peptide — and their associated barcodes are very simple to use. You mix a blood sample (for serum antibody profiling — we can also use this for monoclonal antibody analysis), capture those antibodies and the antigens they're bound to on magnetic beads, wash away all the unbound antigens, amplify the barcodes, use sample indexing and deconvolution to read out the antibody reactome of each sample. This is a relatively high-throughput process that can happen in 96-well plates and involve liquid-handling automation. Our key proof of concept was looking at COVID-19 severe disease, on the heels of nice work out of Jean-Laurent Casanova's lab where they identified type-1 neutralising interferon auto-antibodies in a subset of individuals with severe, life-threatening COVID-19.

[13:46] Dr. Ben Larman: We looked at a Johns Hopkins cohort and performed an unbiased full-length human-proteome-wide MIPSA screening campaign and identified those same type-1 interferon neutralising auto-antibodies in a subset of these patients. We also saw a couple of individuals with type-3 interferon neutralising auto-antibodies, and we confirmed that they were actually functional in a cell-based assay with interferon lambda added to the cell culture after pre-incubation with the patient antibodies. In this study we found it was really important to be looking at full-length proteins, which is not uncommon for auto-antibodies because they tend to recognise more highly conformational epitopes. So it really underscores the complementarity of antigen libraries that are peptides — which can cover everything — along with full-length proteins to capture as much conformational epitope space as possible.

[15:02] Dr. Ben Larman: Some of the benefits of MIPSA: the ability to simultaneously detect antibodies targeting full-length and peptide epitopes; relatively low non-specific binding because of the very small tag; with our barcode representation system we have much more robustness in being able to look at many barcodes associated with each antigen — these are all GC-balanced barcodes to allow undistorted amplification of those sequences; and we have a system for ensuring the data we look at is actually only associated with wild-type or intended sequences that we encode with the oligonucleotides, because oligonucleotide synthesis does have errors — so we're only looking at the barcodes associated with those perfect coding sequences.

[16:17] Dr. Ben Larman: Because of these advantages we started a company to enable the research community to access the technology — Infinity Bio, based in Baltimore. This is a 9,000 square-foot lab space, before we filled it up with automation equipment, and the team. We have a booth — please come visit. Infinity Bio is a service business, so we just receive samples and send you back the data. There's no IP restraints or anything like that. We're excited about the catalogue of libraries we currently have available and are also able to work with customers to develop custom content. Please contact us if you'd like to see some performance qualification data.

[17:31] Dr. Ben Larman: In the setting of a longitudinal study, you can see the emergence of new reactivities in a totally unbiased way between two time points — reactivities showing up more strongly at the second time-point than the first; decline in reactivity strength due to waning, indicating a recent response that didn't happen during the interval; and most of the responses we detect are very stable over time, reflecting historic immune responses. In cross-sectional studies looking at cohorts with and without a disease phenotype, we can identify exposures that may link to disease — like Epstein-Barr virus and multiple sclerosis — identify different responses to the same exposure (e.g. different responses to vaccination leading to different levels of protection), auto-antibodies and their mechanistic basis (myasthenia gravis and the targets there), and of course identify new biomarkers that can provide clinical utility in diagnosis and risk stratification.

[18:45] Dr. Ben Larman: Thinking about mechanisms — antibody reactome profiling provides an important new piece of the puzzle in understanding disease mechanisms: how genetic associations may exert their effect through immune responses, linking antigen specificity to public antibody sequences for instance. As I mentioned, the field of antibody reactome profiling really does need standardisation. The question is not how well does any specific technology work but how well it's going to solve a specific problem. To compare across platforms, we're really unable to do that right now — even unable to make data available in a way that allows people to compare platforms and reproduce studies. So I'm going to make a pitch for these standards that I imagine need to be implemented.

[20:01] Dr. Ben Larman: But I don't have the experience of this group in actually seeing that to fruition, so we'd love to hear about the feasibility and steps that might be taken to achieve these goals. One thing very clear is that we need standardised analytes that can be run on the different platforms — a collection of monoclonal antibodies, library-specific (anti-human antibodies, antiviral antibodies), to be tested on the different platforms. They should target different types of epitopes — linear, conformational, post-translational modifications — and could be premixed in different concentrations, low to high, preferably blinded to users, with the data uploaded and analysed by a third party. There's currently no repository for antibody reactome data — when my group publishes a paper, the editor will often ask us to deposit the data in the appropriate database, and we have to reply that there isn't one.

[21:15] Dr. Ben Larman: So I really think there needs to be one. It needs to be queryable, and there needs to be a minimum amount of data associated with it — like AIRR data, for instance. It would need to include information about the platform, sample metadata, antigen library member metadata (including the sequence and any quality-control information associated with that particular antigen member). We should be able to deposit our raw data, and then processed data, and the code that converted the raw data into processed data — at a minimum. There are a lot of different ways to use antibody reactome data — I described longitudinal analysis and group-wise comparisons — so these pipelines really need to be transparent and reproducible so that others can use them and reproduce findings.

[22:30] Dr. Ben Larman: Looking at epitope profiling allows you to look at the clonality of an antibody response and even aggregate over peptides tiling across an entire species of virus. Antibodies, as we heard about, can be very cross-reactive, especially with homologous sequences, so there's a real need to be able to deconvolute antiviral antibody responses in the setting of viral antibody profiling. Many researchers in this field could benefit from guidance on study design — how to establish power to detect signal of an expected effect size given prior studies, controlling type-1 error to avoid publishing false positives, and a lot of work that can be done in how to design these antigen libraries to make them most efficient. If all this was successful, we could really do a good job comparing platforms and determining what's the right platform for the right question.

[23:46] Dr. Ben Larman: I imagine there could be a third party that would generate a report card associated with that set of spike-in antibodies and report on how comprehensive the library is, how well it detected the antibodies that had targets in the library, could tell the difference between isotypes, how many false positives are generated per set of unrelated antigens, how reproducible is the assay, how concordant are the signals, and do the signals correlate with the level of the analyte. That's the missing piece for being able to evaluate all these platforms and select the appropriate technology. I'll just leave you with a study we did in a clinical setting — to convince you, hopefully, that it can be useful in a clinical setting for diagnostics. This is unbiased viral antibody profiling.

[25:03] Dr. Ben Larman: From individuals with suspected infectious encephalitis. For every library member there are about 100,000 peptides; the reactivity level at day one versus day 14 when they came into the hospital — anything off-diagonal is a new reactivity. We're able to first identify what are the changed peptide reactivities over that two-week period in an unbiased way, identify which peptides were related to each other to determine the response clonality using a network approach, and then use sequence alignment to all the viruses to figure out which were cross-reactive and not, and developed a statistical framework for deconvoluting those cross-reactive responses. We could report the evidence for each of these infections and, layering this on top of the standard clinical testing, we're able to improve the diagnostic yield by about 44%.

[26:16] Dr. Ben Larman: So in summary: circulating antibodies provide a wealth of health-relevant information about cancer, autoimmunity, infections, vaccines and allergies. Advances in DNA synthesis and sequencing have enabled massively multiplexed antibody reactome profiling technologies, but the field is in dire need of defined analytes like monoclonal antibodies, and standards for data storage, data sharing and analytical pipelines, to enable this imagined future of antibody profiling with your doctor. I want to thank all the contributors to these studies over the years and the funding agencies that have supported the work. Thank you all for your attention. [Applause]

[27:00] Audience question: One side of the protein seems to be covered by Halo tag — so are these epitopes lost for the screening?

[27:31] Dr. Ben Larman: Great question. Right now we fuse the Halo tag to the N-terminus, and so yes — if the epitope is close to the N-terminus of the antigen then it could be sterically hindered. In the future I think we'll have both N-terminal and C-terminal fusions to overcome that limitation.

[27:50] Audience question: What is the sensitivity of the assay?

[28:00] Dr. Ben Larman: It's a complicated question because I can't just give a number. We do need to do some spike-ins with known-affinity antibodies, and then for each of those we can say here's our limit of detection in terms of concentration. We'll be more sensitive to higher-affinity antibodies. You can also ask the question with someone — or a group of individuals — with a known infection like HIV, what's your sensitivity and specificity for detecting the infection. That's very high for most infections, especially if they're recent or chronic, but it does have lower sensitivity for historic vaccinations like measles in adults who were vaccinated in childhood — like any serology, it's highly dependent on what you have in the serum.

[28:45] Audience question: Have you thought about applying this platform to screening of menstrual blood for signs of infection or signatures of endometriosis?

[29:05] Dr. Ben Larman: No, but if someone is interested in that, please come visit us at the booth today.

[29:15] Audience question (Joe): Can the antigen library include glycosylation, or is that a current limitation due to the use of E. coli?

[29:25] Dr. Ben Larman: Currently it's a limitation. Other PTMs we have put on — like citrullination, which is an important modification for rheumatoid-arthritis antibody detection — but we have not done glycosylation, and I do think that using a eukaryotic translation system would definitely help.

[29:45] Audience question: How sensitive is IgE compared to currently-used clinical assays?

[30:02] Dr. Ben Larman: We're sort of in the early days of comparing clinical panels to our AllerSite assay, and there's certainly a Venn diagram. We're picking up a lot of things that the clinical assays don't pick up — partly because they don't test for them. We're using a very comprehensive antigen library — about 2,000 proteins — and there are some things we're not picking up potentially because they're post-translational modifications, non-protein components of an extract or something like that. We'll know more as we do more of this. Or in some cases, really low levels of IgE — IgE is a very low-abundance molecule compared to IgG.

[30:40] Audience question: Do you see differences between peptide and full-length anti-interferon antibody responses in COVID-19 patients with different levels of disease severity? Can you distinguish neutralising versus non-neutralising antibody responses?

[31:00] Dr. Ben Larman: Our N is too small to answer that now, but it's certainly something to think about. On distinguishing neutralising versus non-neutralising — yes, with the caveat that you have to do a prior study to make the association. With epitope libraries, peptide libraries, we have done a study in COVID-19 where we see which regions of spike or RBD correlate with neutralising titres. If you build a model like that and then ask "does this person have neutralising antibodies based on their profile?" you can make a prediction. Otherwise you have to use correlation — you can't tell just from the data unless you do some sort of modelling, which I think this community would be very good at.

[31:50] Audience question: With respect to self-addition of barcodes on the proteins — how do you handle barcode collision problems?

[32:05] Dr. Ben Larman: If "collision" has to do with difficulties in distinguishing related sequences — we're using sequences that are sampled from a huge space, so it's really not an issue. But if you need clarification, please come to the booth.

[32:20] Audience question: How expensive is the assay? Is the cost practical for clinical use?

[32:31] Dr. Ben Larman: Certainly for clinical use. The question is going to be more about large numbers of research samples from academics with the budget — again, I'd urge you to come to the booth and chat with JP.

[32:50] Audience question: Do you know the affinity range captured in these assays? How do you exclude background noise?

[33:00] Dr. Ben Larman: Two different questions. The ability to detect an antibody is a combination of its abundance and its affinity, so it's hard to say with just one of those. In terms of background — on each plate of samples that we run, we include mock IPs that include all reaction components except the antibodies, and that's how we determine our background when we do our statistical comparison of each sample to the set of backgrounds.

[33:45] Session chair: Perfect. Okay, I think we'll stop it there and go for coffee. We're going to coffee 5 minutes early — please be back here at exactly 11:00, or preferably a little bit before. Thank you.

Antibody Reactome Profiling via self-assembling libraries of DNA barcoded antigens — AIRRC7

Related resources

Antibody Reactome Profiling for Human Health

How measuring the reactivities of circulating antibodies through reactome profiling can provide novel insights into human health

Infinity Bio MIPSA Explainer

Unveiling the Antibody Reactome: A New Frontier in Precision Immunology

Antibody Reactomics: The Responsive Layer of Immune Memory

Management Blueprint Podcast: Simplify Complexity with Ben Larman

Efficient encoding of large antigenic spaces by epitope prioritization with Dolphyn

Schistosoma mansoni vaccine candidates identified by unbiased phage display screening in self-cured rhesus macaques

Allergenic food protein consumption is associated with systemic IgG antibody responses in non-allergic individuals

Detecting antibody reactivities in Phage ImmunoPrecipitation Sequencing data

Unbiased discovery of autoantibodies associated with severe COVID-19 via genome-scale self-assembled DNA-barcoded protein libraries

Deconvoluting virome-wide antibody epitope reactivity profiles