Abstracts in alphabetical order

Maša Babovic
University of Southern Denmark

Multimodal TD-MS of therapeutic proteins

Proteins can be used as therapeutics to treat a wide range of diseases. Different proteoforms of the same therapeutic protein can differ in their safety and efficacy. Top-down mass spectrometry has a great potential set of applications in therapeutic protein analysis. Fine-tuning MS/MS fragmentation conditions is important for maximizing amino acid sequence and PTM coverage. I will present the results we obtained using “topdownr”, an automated multimodal (HCD, CID, ETD, ETciD, EThcD, and UVPD) MS/MS approach for systematic assessment of fragmentation parameters on a Tribrid Orbitrap platform, for characterization of five therapeutic proteins.

Sandra M. Blois
Department of Obstetrics and Fetal Medicine, University Medical Center Hamburg-Eppendorf

Impact of galectin-glycan networks in reproductive diseases 

Galectins are a family of mammalian carbohydrate-binding proteins expressed by many cell types at the fetal-maternal interface. There are at least 15 members in the galectin family, however, only galectin‑1, ‑2, ‑3, ‑4, ‑7, ‑8, ‑9, ‑10, ‑12, and ‑13 are found in humans. Galectins are able to mediate interactions between cells; they also facilitate the bindings between cells and extracellular matrix components. These cell‑cell and cell‑matrix interactions, as well as the galectin signaling on the cell surface, are able to modulate signaling pathways and thereby influence cellular functions and behaviors. Current research indicates that galectins have important roles during gestation; they contribute to placenta development and maternal adaptation to pregnancy including immune tolerance and angiogenesis. Dysregulation of galectins are associated with poor pregnancy outcome. This lecture will focus on how different members of the galectin-glycan networks contribute to diverse aspects of a healthy gestation. 

Julia Chamot-Rooke
Institut Pasteur, CNRS, Paris, France

Unlocking the potential of proteoforms in the field of infectious disease

In past years, various examples have showcased the added value of addressing proteoforms in the field of infectious disease. For instance, for N. meningitidis, which is the etiologic agent of the cerebrospinal meningitis, a specific proteoform of the PilE protein, which plays an important role in bacterial virulence, was found tightly associated with crossing of the epithelial barrier and access to the blood stream.1 For the same protein, highly glycosylated proteoforms were obtained from patients with evidence of meningitis and linked to immune escape.2 More recently, O-mycoloylation of membrane proteins (porins) of Corynebacterium glutamicum was found to be a bone fide signal to direct the modified protein to bind to mycomembrane.3 In clinical microbiology, proteoforms can also be used to differentiate bacterial strains that cannot be discriminated with MALDI-TOF MS, the technique used in routine in many hospital settings for the rapid identification of bacterial pathogens.5 This talk will review a few of these examples and describe the top-down proteomics workflows optimized for these applications. Perspectives will be discussed.

1.   Chamot-Rooke J. et al. Posttranslational modification of pili upon cell contact triggers N. meningitidis dissemination, Science, 331,778-82 (2011).

2.   Gault J. et al. Neisseria meningitidis Type IV Pili Composed of Sequence Invariable Pilins Are Masked by Multisite Glycosylation, Plos Pathog, 11,e1005162 (2015).

3.   Carel C. et al. Identification of specific posttranslational O-mycoloylations mediating protein targeting to the mycomembrane, Proc Natl Acad Sci U S A, 114,4231-4236 (2017).

4.   Dupré M. et al. Optimization of a Top-Down Proteomics Platform for Closely Related Pathogenic Bacterial Discrimination. Proteome Res.  20, 1, 202–211 (2021).

Christoph Gstöttner, Manfred Wurher, Elena Dominguez-Vega
Center for Proteomics and Metabolomics, Leiden University Medical Center

Expanding functional characterization to proteoforms

Proteoforms are very diverse in structure, but also in functionality. While the structural mass spectrometric characterization of proteoforms has tremendously advanced in the last decades, the proteform-selective functional characterization is still largely neglected. Common approaches, such as SPR provide an overall affinity response for all different proteoforms present on the sample and assessment of their individual binding require tedious production or enrichment of specific proteoforms. In our lab, we have exploited for the first time the capabilities of Capillary Electrophoresis hyphenated with Mass Spectrometry (CE-MS) to assess functional characteristics of monoclonal antibodies (mAbs) in a proteoform-resolved fashion. Antibodies consist of an antigen binding domain (Fab) and a crystalizable fragment (Fc). This Fc domain has several key functionalities, such as recruitment of immune components via different Fcγ receptors (FcRs), activation of the complement system and recycling of the antibody via binding to the neonatal Fc Receptor (FcRn) which determine the half-life of antibodies. These interactions are strongly influenced by structural features of the Fc domain and, therefore, small variations in the Fc region (e.g. glycosylation, oxidation) can severely impact their binding.

We have developed different methods based on mobility-shift affinity CE-MS to study the binding of mAbs to various FcRs, namely FcRn, FcγRIIa and FcγRIIb. To this end, the FcR receptors were added to the background electrolyte, whereas the mixture of antibody proteoforms was injected in the CE. As a first case, we studied the interaction towards FcRn which determine antibody half-life. We will show that, by adding different amounts of FcRn to the background electrolyte, we are able to determine the relative affinity of different proteoforms based on the shifts in their mobility. We observed differences in the mobility for singly and doubly oxidized mAbs with respect to the unmodified antibodies indicating lower binding affinity. For FcγRIIa (activating) and FcγRIIb (inhibitory) receptors, glycosylation of the antibody was key for the binding. Hemiglycosylated antibodies showed a strong decrease in the binding towards both FcγRIIs. Within glycoforms differences were also observed with high mannose forms showing lower binding compared to complex type glycoforms.

The developed approach offers unique possibilities to study in solution binding of individual proteoforms and simultaneously to address their heterogeneity. We believe that this approach will have a tremendous influence on the study of the interactions of mAb proteoforms with different FcRs. Understanding these interactions is essential for developing new drugs as well as defining (and redefining) critical quality attributes of biopharmaceuticals.

Irene Fernandez-Cuesta
Universität Hamburg

Micro and nanofluidic devices for single molecule analysis.

Lab-on-a-chip devices are miniaturized chips, where multiple tasks can be integrated. These devices allow for working with minute amounts of liquid, and perform the reactions faster. In our group, we use plastic, single-use fluidic chips with micro and nano channels to analyze single biomolecules. In particular, we can read the length and genomic information of single molecules of DNA. Further integration with cell capturing and selective lysis will allow for studying single cells, and expand the applications to other biomolecules, like proteins.

Manasi Gaikwad
University Medical Center Hamburg-Eppendorf

Quantification of proteoforms in commercial protein sample using intact protein mass spectrometry

Analysis of proteoforms as intact molecules is one of the hot-topics in mass spectrometric research. Systematic but fast distinctions and quantification of proteoforms is however extremely challenging majorly because of the highly similar physiochemical properties shared by the heterogeneous proteoforms. The understanding of MS analytical methods fit for specific purpose as well as data analysis strategies for intact proteoforms is also limited. Deconvolution of isotopically unresolved mass spectrum to extract and validate quantifiable mass features is particularly a tricky task.

Given the growing demand for therapeutic proteins (TPs) and the significance of analyzing associated proteoforms, my talk highlights an approach towards fast quantification of proteoforms in TPs. The state-of-the-art high-resolution „intact protein mass spectrometry (MS)“ was employed herein for achieving proteoform quantification and included optimization of sample prefractionation, MS parameters as well as evaluation of suitability of deconvolution algorithms. Examining data processing strategies, various deconvolution algorithms, validation of deconvoluted masses with appropriate scoring led to conclude that Bayesian based open-source deconvolution algorithm- UniDec works best for quantification of complex proteoforms from commercial protein samples.

Christoph Gstöttner
Leiden University Medical Center

Unravelling the structural and functional heterogeneity of recombinant SARS-CoV-2 RBD domains

Recombinant SARS-CoV-2 proteins – including the receptor binding domain (RBD) from the spike (S) protein – are essential instruments in the fight against COVID-19. RBD structural features such as glycosylation, are critical for binding to ACE2 receptor – yet not completely understood. We have performed an unprecedented structural and functional characterization of SARS-CoV-2 RBD domains produced in two different mammalian systems. We combined state-of-the-art mass spectrometric approaches at different protein levels (released glycans, glycopeptides and intact protein), permitting us to unravel the vast heterogeneity of RBDs in great detail. Our results showed distinct glycosylation and posttranslational modifications between CHO- and HEK293-RBDs. These features are also dependent on the expressed protein length (RBD, S1 subunit or S protein). We also demonstrated the presence of a single, fully-occupied, O-glycosylation site in the RBDs, and localized the previously unknown O-glycosylation site to T323. We studied the RBDs functionally by determining the binding to SARS-CoV-2 antibodies from positive COVID-19 patients as well as to the ACE2 receptor. Our data indicate that the previously suggested involvement of RBD glycosylation in ACE2 receptor binding originates from conformational stabilization of the spike protein rather than from a direct involvement in the binding. In summary, our work offers novel insights into RBD structural and functional features but also provides a workflow for the characterization of RBDs highly relevant for the integrated structural and functional characterization of RBDs and RBD-based vaccines.

Peter Horvatovich
University of Groningen

Protein sequence variability in humans

Genomics and more closely protein sequence variability have important impact on personal susceptibility for disease, longevity, stress tolerance and healthy ageing. Protein sequence variability has hereditary components influenced mainly by germline mutations and non-hereditary ones cause by somatic mutations such as in cancer development1. It is therefore important to obtain protein sequence variant profile of samples at bulk tissue and single cell levels. The mainstream shotgun proteomics workflow using database search engines requires to predict the protein sequences expected to be present in the analysed samples. The best prediction can be made using genomics/transcriptomics data with proteogenomics data integration2, which is a challenging task due to non-complete annotation of translated proteins (dark genome and proteome), difference in time between transcript expression/degradation and protein translation/degradation and often non-defined start and stop of transcript sequence translation. This talk will present a proteogenomics data integration pipeline that aims to predict translated protein sequences from polyadenylated transcriptomics data using different genome annotation, de novo assembly and variant calling tools and subsequent identification of protein variants in proteomics data obtained from the same sample using database search tools and predicted protein sequences. Prototype of this pipeline will be shown on identifying new variants playing role in severe COPD using transcriptomics and proteomics profiling of human lung tissue samples3.

Reference

(1)       Bischoff, R.; Permentier, H.; Guryev, V.; Horvatovich, P. Genomic Variability and Protein Species – Improving Sequence Coverage for Proteogenomics. J. Proteomics 2016134. https://doi.org/10.1016/j.jprot.2015.09.021.

(2)       Barbieri, R.; Guryev, V.; Brandsma, C.-A.; Suits, F.; Bischoff, R.; Horvatovich, P. Proteogenomics: Key Driver for Clinical Discovery and Personalized Medicine; 2016; Vol. 926. https://doi.org/10.1007/978-3-319-42316-6_3.

(3)       Brandsma, C. A.; Guryev, V.; Timens, W.; Ciconelle, A.; Postma, D. S.; Bischoff, R.; Johansson, M.; Ovchinnikova, E. S.; Malm, J.; Marko-Varga, G.; Fehniger, T. E.; Van Den Berge, M.; Horvatovich, P. Integrated Proteogenomic Approach Identifying a Protein Signature of COPD and a New Splice Variant of SORBS1. Thorax 2020. https://doi.org/10.1136/thoraxjnl-2019-213200.

Aymelt Itzen
Universitätsklinikum Hamburg-Eppendorf, Zentrum für Experimentelle Medizin, Institut für Biochemie und Signaltransduktion, Hamburg, Germany

A biochemical approach to the understanding of post-translational modifications during bacterial infections

Post-translational modifications (PTMs) are covalent alterations of proteins and are involved in the modulation of diverse cellular processes. Many pathogenic bacteria use a number of enzyme-mediated PTMs to ensure their survival, proliferation and virulence during infection. In this context, AMPylation is a PTM that is apparently used by many bacteria to modify host cell proteins. Adenosine monophosphate (AMP)-transferring enzymes catalyze the attachment of an AMP moiety from the co-substrate adenosine triphosphate (ATP) to threonine, tyrosine, or serine side chains. For example, the causative agent of Legionnaires’ disease – the bacterium Legionella pneumophila – appears to manipulate vesicular trafficking by AMPylating the small GTPase Rab1.

However, understanding the substrate profile of AMP-transferases and biological impact has been hampered by the lack of general tools for enriching AMPylated target proteins. Furthermore, due to the low affinity of AMP-transferases for their respective substrates, obtaining mechanistic insights using structural biology methods is challenging. In my presentation, I will discuss our efforts in establishing tools to study AMP-transferases. Additionally, I will critically evaluate the significance of AMPylation in health and disease.

1. Du J, Wrisberg MV, Gulen B, Stahl M, Pett C, Hedberg C, et al. Rab1-AMPylation by Legionella DrrA is allosterically activated by Rab1. Nat Commun 2021, 12(1): 460.

2. Fauser J, Gulen B, Pogenberg V, Pett C, Pourjafar-Dehkordi D, Krisp C, et al. Specificity of AMPylation of the human chaperone BiP is mediated by TPR motifs of FICD. Nat Commun 2021, 12(1): 2426.

3. Barthelmes K, Ramcke E, Kang HS, Sattler M, Itzen A. Conformational control of small GTPases by AMPylation. Proc Natl Acad Sci USA 2020, 117(11): 5772-5781.

4. Ernst S, Ecker F, Kaspers MS, Ochtrop P, Hedberg C, Groll M, et al. Legionella effector AnkX displaces the switch II region for Rab1b phosphocholination. Sci Adv 2020, 6(20): eaaz8041.

5. Gulen B, Rosselin M, Fauser J, Albers MF, Pett C, Krisp C, et al. Identification of targets of AMPylating Fic enzymes by co-substrate-mediated covalent capture. Nat Chem 2020, 12(8): 732-739.

6. Hopfner D, Fauser J, Kaspers MS, Pett C, Hedberg C, Itzen A. Monoclonal Anti-AMP Antibodies Are Sensitive and Valuable Tools for Detecting Patterns of AMPylation. iScience 2020, 23(12): 101800.

Ole N. Jensen
University of Southern Denmark

Studying proteoform diversity in cells, tissues and tumors by mass spectrometry

Post-translational modification (PTM) of protein leads to a plethora of proteoforms, including co-occurring PTMs within each protein molecule. Distinct proteoforms have distinct functional and structural features that are important for protein interactions, activity, localization and turnover. We develop and apply middle-down and top-down strategies for protein analysis to define proteomes and distinct proteoforms in health and disease. Quantitative mass spectrometry, ion mobility spectrometry and computational data analysis promises to reveal details of proteoform diversity and proteoform functions.

Alois Jungbauer

Institute of Bioprocess Science and Engineering, University of Natural Resources and Life Sciences, Vienna Austria, and Austrian Centre of Industrial Biotechnology

Impact of proteoforms for manufacturing of biopharmaceuticals 

An increased understanding of the structural and molecular basis of the efficacy of protein therapeutics is of interest to the scientific, medical and bioprocess engineering communities, and will result in new approaches to develop more potent therapeutic products. Relevant information can be gained through the analysis of proteoforms, which can show quite profound differences in potency, as well as potential side effects, resulting from small structural modifications. Lacking comprehensive information about the effects of product characteristics on a molecular level, there is an increased need to monitor the production process of protein therapeutic to ensure constant product quality. In the past the “the product is defined by the process” rule was employed, which froze the process and made tech-transfer, scale up, and process improvements very difficult and time consuming. To evade this locked condition in biopharmaceutical manufacturing PAT (Process Analytical Technology) and QBD (Quality by Design) strategies has been established as current regulatory framework. The impact of variation of process conditions on proteoform composition will be exemplified by production of recombinant antibodies in mammalian cells and FGF in E.coli. Implementation real time monitoring and automation will discussed in context to improve and control proteoform composition in biopharmaceuticals.

Neil Kelleher
Northwestern University

Building a Community of Top-Down Proteomics to Advance Proteoform Measurement & Biology

Abstract: While top-down mass spectrometry has become synonymous with the direct analysis of intact proteins and their complexes, the term more generally denotes an approach to measurement that recognizes the value of retaining as much information as possible about a system prior to analysis. By avoiding proteolytic digestion, proteoform-specific identifications can be made directly. The Nov 12, 2021 paper in Science Advances on the Human Proteoform Project (https://pubmed.ncbi.nlm.nih.gov/34767442/) will be the main focal point for conversation.  Time allowing, selected vignettes will focus on both denatured and native modes of Top-Down Proteomics. I will also describe a few recent advances of top-down MS, like the most recent breakthroughs in individual ion mass spectrometry (i2MS). By more faithfully preserving post-translational modifications and non-covalent interactions throughout the measurement process, top-down mass spectrometry is positioned to make basic and translational proteomics more efficient and valuable, particularly in the detection and assignment of function to proteoforms and their PTMs underlying human wellness and disease.  

Jihyung Kim
Universität Tübingen

A fast and accurate open-source quantification algorithm for top-down proteomics

Top-down proteomics (TDP) is becoming the method of choice for detailed analysis of proteoforms. Due to the different nature of the data when compared to bottom-up proteomics (BUP), label-free quantification algorithms established for BUP are not applicable to TDP data and there are few software options for the accurate quantification of TDP features. Existing software tools often demonstrate low sensitivity and extensive runtimes. We address these issues in our quantification algorithm, FLASHDeconvQ. It is based on fast deconvolution algorithms for preprocessing the TDP data and a subsequent feature finding algorithm.  It demonstrates improved sensitivity and robustness when compared to other methods.

Oliver Kohlbacher
Universität Tübingen

The FLASH* Suite for the Analysis of Top-Down Proteomics Data

Data analysis of Top-Down mass spectrometry (TD-MS) requires dedicated tools and algorithms due to the distinct nature of TD-MS signals as compared with bottom-up mass spectrometry data. We will give an overview of the FLASH* Suite of software tools developed in recent years that permit a more efficient acquisition, deconvolution, and quantification of TD-MS data. FLASHDeconv (Jeong et al., 2020) is a rapid mass and feature deconvolution algorithm that achieves millisecond order of run time per spectrum through an efficient pre-processing of the spectra. By leveraging the quick runtime of FLASHDeconv, we recently developed an intelligent data acquisition method called FLASHIda that boosts the proteoform identification sensitivity about twofold compared to typical data-driven acquisition methods (Jeong et al., under revision). We will also present our ongoing work including FLASHDeconvQ for label-free quantification. All tools are open-source software available at www.OpenMS.org.

Marcel Kwiatkowski
University of Innsbruck

CoMetChem – A strategy to reveal site-specific histone reaction rates of acetylation and deacetylation

Histone acetylation is an important, reversible post-translational protein modification and a hallmark of epigenetic regulation. However, little is known about the dynamics of this process, due to the lack of analytical methods that can capture site-specific acetylation and deacetylation reactions. We present a new approach that combines metabolic and chemical labeling (CoMetChem) using uniformly 13C-labeled glucose and stable isotope labeled acetic anhydride. CoMetChem enables site-specific quantification of the incorporation or loss of lysine acetylation over time by LC-MS/MS, allowing the determination of reaction rates for site-specific acetylation and deacetylation. Thus, the CoMetChem methodology provides a comprehensive description of site-specific acetylation dynamics.

Maria Martin
EMBL-EBI

Capturing proteoform and function in the UniProt Protein Knowledgebase.

UniProtKB is a widely used protein knowledgebase recognized for the quality of their protein sequences and annotations and is used in proteomics experiments for the identification of protein sequences and for post-identification annotation information.  UniProtKB is composed of the manually reviewed UniProtKB/Swiss-Prot and the computationally annotated UniProtKB/TrEMBL. High throughput mass spectrometry-based proteomics datasets provide valuable information to UniProt regarding the occurrence of specific protein isoforms, processing products, and post translational modifications. In the past years, we have been developing pipelines that use publicly available proteomics datasets to confirm the existence of predicted protein sequences and to annotate post translational modifications in UniProtKB. For UniProt, it is critical to capture the correct expressed protein sequences (proteoforms), including PTMs and/or variants when present. In addition, these proteoforms should be accessioned and represented in community developed standard formats for accessibility and interoperability. However, currently only proteoforms representing different protein isoforms coming from alternatively spliced transcripts have their own unique protein identifiers. As of January 2022, UniProt manually annotates 20,286 human canonical and 22,050 additional isoforms.

Sandra Orchard
EMBL – EBI

Proteoforms in UniProt – the need for stable identifiers and data standards.

Protein function data in UniProtKB is increasingly captured at the level of the isoform or peptide chain, with examples already existing in the database of isoforms of the same gene having very different physiological effects. UniProt imports data from collaborating resources such as the IMEx Consortium of molecular interaction databases, which also use the UniProt isoform and peptide identifiers, thus enabling ease of data integration. As technologies improve, UniProt anticipates the need to capture data at a more granular level, i.e. by linking function to proteoform. In order to do this, we to need to generate, or adopt, identifiers which allow the concatenation of biological data to a single, non-redundant proteoform-specific identifier; identifiers which can also be used by partner resources to mange data capture, curation and integration with UniProtKB. UniProtKB will then make the data publicly available using community-agreed data formats and standards, including the HUPO Proteomics Standards Initiative PEFF and also ProForma once formally adopted.

Hartmut Schlüter
Universitätsklinikum Hamburg-Eppendorf

Proteoforms (Smith 2021), formerly termed isoforms or protein species (Jungblut 2008, Schlüter 2009), are the smallest unit of the proteome. A single individual gene can lead to the formation of many proteoforms, which can be very similar, differing in a few atoms only, but also very different with respect to their composition and resulting chemical properties. Neil Kelleher (2012) estimated the number of individual proteoforms in the human organism in the range of 1 billion.

Two examples demonstrate that the differentiation of proteoforms is very important: 1. A deamidation of a therapeutic antibody (molecular weight 150.000 Da) is associated with a minimal change (increase of the molecular weight by 1 Da) converting asparagine into aspartic acid. However, this event is decreasing the efficacy of the drug significantly, which can result in a reduced therapy success. 2. Only the knowledge of the ratio of the number of an enzyme activated by phosphorylation and the number of its un-phosphorylated proteoform allows an estimation of the activity of that enzyme in a cell.

These two examples already demonstrate the critical relationship between the composition and structure of a proteoform and its activity and thereby the importance of the ability to identify and quantify proteoforms within the proteome. However, current bioanalytical techniques and research strategies to a large part are ignoring the presence of proteoforms and still follow the 1-gene-1-protein-1-function hypothesis. Widely applied antibody-based analytical methods (e.g. western-blots) usually can´t distinguish proteoforms. The quantity obtained from a protein by conventional quantitative proteomics is a sum of all proteoforms (based on the tryptic peptides they have in common), which have been present before they have been digested. Thus, an exact overview about the functional status and activity of the products of a defined gene is not possible anymore. For guarantying the safety and efficacy of therapeutic proteins as well as for a better and deeper understanding of the molecular physiology in health and disease, and for the understanding of molecular mechanisms in many other areas of biotechnology and life sciences, it is mandatory to improve the bioanalytical tools and strategies for exploring the universe of proteoforms. Research focusing more consequently on proteoforms in the future, will significantly increase the number of new specific and reliable diagnostic markers, much deeper knowledge about the molecular mechanisms of life and about the development of diseases, thus yielding new and better drugs with less side effects.

References

Aebersold R, Agar JN, Amster IJ, Baker MS, Bertozzi CR, Boja ES, Costello CE, Cravatt BF, Fenselau C, Garcia BA, Ge Y, Gunawardena J, Hendrickson RC, Hergenrother PJ, Huber CG, Ivanov AR, Jensen ON, Jewett MC, Kelleher NL, Kiessling LL, Krogan NJ, Larsen MR, Loo JA, Ogorzalek Loo RR, Lundberg E, MacCoss MJ, Mallick P, Mootha VK, Mrksich M, Muir TW, Patrie SM, Pesavento JJ, Pitteri SJ, Rodriguez H, Saghatelian A, Sandoval W, Schlüter H, Sechi S, Slavoff SA, Smith LM, Snyder MP, Thomas PM, Uhlén M, Van Eyk JE, Vidal M, Walt DR, White FM, Williams ER, Wohlschlager T, Wysocki VH, Yates NA, Young NL, Zhang B. How many human proteoforms are there? Nature Chem Biol. 2018; 14: 206-214.

Jungblut PR, Holzhütter HG, Apweiler R, Schlüter H. The speciation of the proteome. Chem Cent J. 2008; 2: 16.

Kelleher NL. A cell-based approach to the human proteome project. J Am Soc Mass Spectrom. 2012; 23: 1617-24.

Schlüter H, Apweiler R, Holzhütter HG, Jungblut PR. Finding one’s way in proteomics: a protein species nomenclature. Chem Cent J. 2009 Sep 9;3:11

Smith LM, Agar JN, Chamot-Rooke J, Danis PO, Ge Y, Loo JA, Paša-Tolić L, Tsybin YO, Kelleher NL; Consortium for Top-Down Proteomics. The Human Proteoform Project: Defining the human proteome. Science Adv. 2021; 7: eabk0734

Veit Schwämmle
University of Southern Denmark

PTM crosstalk and histone proteoforms

Due to their strong association with DNA and gene regulation, histones are among the most studied proteins. The majority of histones is multiply modified which established the idea of a histone code where the post-translational state of a histone defines its biological function influencing chromatin formation and protein recruitment. Only top-down and middle-down MS approaches are capable to decipher these states as they allow extensive proteoform characterization including multiple simultaneously set modifications. This talk will review our recent work on deciphering the histone code including computational and visual approaches to determine how PTM crosstalk affects proteoform abundance. Given that histones and their PTMs are very well studied, they could play a particular role in the proteoform atlas, such as to test advanced statistical and visual approaches

with deep insight into proteoform interplay, genomics data and links to relevant literature and reported biological functions.

Lloyd M. Smith
University of Wisconsin

New Frontiers in Proteomics – Proteoforms, Proteoform Families, and the Human Proteoform Project

Proteins are the primary effectors of function in biology, and thus complete knowledge of their structure and behavior is needed to decipher function.   However the richness of protein structure and function goes far beyond the linear amino acid sequence dictated by the genetic code.  Multigene families, alternative splicing, coding polymorphisms, and post-translational modifications, work together to create a rich variety of proteoforms, whose chemical diversity is the foundation of the biological complexes and networks that control biology.   The term “proteoforms” refers to  the specific molecular forms in which proteins are present in biological systems; only direct analysis of the proteoforms themselves can reveal their structures, dynamics, and localizations in biological systems.   

Remarkably, the dominant paradigm of proteomics research, “bottom-up” proteomics, is unable to identify proteoforms – rather, proteins are enzymatically digested into peptides, which are then identified,  serving as surrogates for the likely presence of their parent proteins in the sample.   This strategy destroys the information as to what form of the protein the peptide represents, and thus the critical information needed to identify proteoforms is lost.  The entire field of Biology is thus attempting to understand life in the absence of the ability to understand the molecules that define life.  This limitation of todays technology provides a “grand challenge” to the scientific community, to devise new strategies and approaches that are able to comprehensively and quantitatively reveal the full breadth of the proteome at the proteoform level. 

In this presentation I will provide an overview of this interesting problem, along with a variety of new tools and approaches that we and others are developing to address it.  A key integrating concept is the “proteoform family”, the set of all proteoforms that derive from a given gene.  The description of the proteome of a given sample of interest may thus be considered as a set of proteoform families, one for each gene in the genome.  Identifying and quantifying the members of each proteoform family comprises a new way of conceptualizing proteome analysis in complex systems.  Developing the technology to accomplish this, building a comprehensive atlas of proteoforms present in human systems, and eventually deciphering the functional roles they play in normal and disease biology, comprise central elements in the quest to understand human biology.  

Bernd Thiede
Department of Biosciences, University of Oslo

High resolution quantitative proteomics using SILAC-2-DE-LC/MS

The proteomics field has shifted during the last decade from two-dimensional gel electrophoresis (2-DE)-based approaches to SDS-PAGE or gel free workflows due to the tremendous developments in liquid chromatography (LC), mass spectrometry (MS), and quantitative proteomics techniques. However, 2-DE still offers the highest resolution in protein separation and individual proteins can be detected as many different protein species. Bottom-up proteomics using 2-DE in combination with isotopic protein labelling and LC/MS can reveal subtle changes which cannot be found with any other proteomics workflow. Nevertheless, it is challenging to resolve the chemical differences of protein species deriving from 2-DE gels because 100% sequence coverage is required for an unambiguous identification of protein species.  

Philipp T. Kaulich1, Liam Cassidy1, Konrad Winkels1, and Andreas Tholey1
1Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine, Christian-Albrechts-Universität zu Kiel, Germany

Improved identification of proteoforms in top-down proteomics using FAIMS with internal CV stepping

For the high confidence identification of proteoforms and an increased proteome coverage, fractionation prior to mass spectrometric (MS) analysis is a crucial step in top-down proteomics. Gas-phase fractionation strategies such as field asymmetric ion mobility spectrometry (FAIMS) have been shown to be highly beneficial for that purpose. However, the need for multiple injections using different compensation voltages (CVs) leads to a huge increase in measurement time and the amount of sample required. We here investigated the use of internal CV stepping for single shot TD analysis, i.e., the application of multiple CVs per acquisition [1]. In addition, MS parameters were optimized for the individual CVs since different CVs target certain mass ranges.

Lysates of Caco-2 cells were analyzed after SPE based enrichment [2], MWCO treatment or GELFrEE fractionation of the low-molecular fraction (i.e. below 30 kDa) of the proteome using LC-MS (Fusion Lumos) either with or without FAIMS. Single CV-FAIMS data were compared with different combinations of multiple CVs.

In accordance with earlier results [3,4] we found that lower (more negative) CVs favored the identification of lower mass proteoforms, while more positive CVs showed a bias toward higher molecular weight proteoforms.

To obtain both a uniform mass distribution and a high number of proteoforms (~1,700), the combination of four CVs (60, 50, 40, 20 V or 60, 50, 40, 0 V) was found the most appropriate, as significantly more proteoforms were identified in all mass ranges compared to w/o FAIMS.

Due to the correlation between CV values and mass range, we were able to adjust the measurement conditions, using optimized resolution and number of microscans depending on the particular CVs. We investigated the optimal combination and number of CVs for different gradient lengths and, validated the optimized settings with the low-molecular weight proteome of Caco‑2 cells obtained using a range of different sample preparation techniques.

Compared to measurements without FAIMS both the number of identified protein groups (+60-94%) and proteoforms (+46-127%), and their confidence were significantly increased, while the measurement time remained identical. In total, we identified 684 protein groups and 2,675 proteoforms from CaCo-2 cells in less than 24 hours using the optimized multi-CV method.

Our data show that measurement with internal CV stepping leads to a significantly higher number of identified proteoforms compared to measurements w/o FAIMS.

  1. Kaulich PT, Cassidy L, Winkels K, Tholey A (2022), Anal Chem, in press.
  2. Cassidy L, Helbig AO, Kaulich PT, Weidenbach K, Schmitz RA, Tholey A (2021). J Proteomics, 230:103988.
  3. Fulcher JM, Makaju A; Moore RJ, Zhou M, Bennet DA, DeJager PL, Qian WJ, Pasa-Tolic L, Petyuk VA (2021). J Proteome Res, 20:2780-2795.
  4. Gerbasi  VR, Melani RD, Abbatiello SE, Belford MW, Huguet R, McGee JP, Dayhoff D, Thomas PM, Kelleher NL (2021). Anal Chem, 93:6323-6328.

Charlotte Uetrecht
Heinrich-Pette-Institut Hamburg, Universität Siegen

Flying viruses – understanding corona- and norovirus lifecycles

Viruses affect basically all organisms on earth. Some are detrimental to human development as we experience during the ongoing pandemic, whereas those targeting pathogenic bacteria or crop pathogens can be beneficial for us. Proteoforms that can be strain-specific are crucial to viral lifecycles in many ways.

An integral part of icosahedral viruses is the capsid protein shell protecting the genome. Many copies of the capsid protein often self-assemble into shells of defined size. Low binding affinity of individual subunits allows efficient assembly and gives rise to highly stable particles. These capsids can be studied by structural mass spectrometry (MS), in terms of stoichiometry, dynamics, assembly pathways and stability revealing coexisting states. A focus will be on variant specific differences of noroviruses, the main cause of viral gastroenteritis.

Moreover, the highly dynamic replication machinery of coronaviruses (CoV) has been a longstanding interest. Using native MS, processing of the polyproteins into individual subunits and subsequent complex assembly can be monitored simultaneously revealing striking differences between CoV species.

Juan Vizcaino
EMBL-EBI

Abstract

In my presentation, I will first introduce my view of the data workflow that is required for the Human Proteoform Atlas. Then, I will highlight the need to have some fundamental resources and bioinformatics infrastructure  to make proteoform-centric data FAIR (Findable, Accessible, Interoperable and Re-usable). In my view, two fundamental components are required: a standard representation of  proteoforms (this is ongoing work that it is getting finished, ProForma 2.0)  and the availability of Universal Proteoform identifiers and how they can be generated. I will finish by highlighting the importance to have the requirements clear from the beginning about the use cases that we want to support in the first iteration of the project.