How protein analysis technologies are driving the discovery of new biomarkers in the postgenomic era
For the past decade, protein analysis has evolved from difficult gel-based separations to advanced mass spectrometry technologies. Today, there are multiple approaches and methodologies to study these molecules at the cell and organism levels. Scientific and technical progress in this area continues to drive the discovery of new protein biomarkers, a crucial field of research for the detection of debilitating and life-treating diseases.
- The relevance of protein analysis studies in the postgenomics era
- What techniques can we use to identify a protein and its corresponding post-translational modifications?
- Summary of the main techniques used for protein analysis
- Approaches to protein analysis
- Proteomics in cancer research
- Concluding remarks
The relevance of protein analysis studies in the postgenomics era
Next-generation sequencing has forever changed functional studies. For many organisms, it is now possible to obtain both the genomic and transcriptomic profiles cost-effectively.
However, the presence of a gene or a given transcript in an organism at a given time is often insufficient to understand real abundance levels of specific proteins. Plus, these studies are also insufficient to understand how proteins mature and act in the organism.
For this reason, despite the low costs of DNA and RNA sequencing, proteomic analysis is becoming increasingly popular for the functional characterization of a given protein, organism or environment.
Functional studies are important for many areas of research. For instance, post-translational modifications with a high impact on the protein’s configuration, cannot be predicted based on genetic studies alone.
These modifications are often vital in the context of cell signaling and disease. Thus, they have been the focus of current pharmaceutical research in an attempt to identify new and promising drug targets or disease biomarkers for diagnosis.
The analysis of proteins and corresponding modifications has only been made possible due to the current advances in mass spectrometry (MS) techniques and bioinformatics analysis tools.
The most common and widely studied post-translational modifications associated with disease in humans are:
- Phosphorylation – changes the protein conformation leading to activation or inactivation of reactive sites
- Glycosylation – N or O-glycans are common modifications in health and disease and they are known to direct a protein to specific cell components
- Acetylation and methylation – these common modifications change the conformation of native proteins and affect their affinity towards other proteins
- Ubiquitination – ubiquitin is a small protein (76 residues) that can covalently modify other proteins, its presence is known to target a protein for degradation
What techniques can we use to identify a protein and its corresponding post-translational modifications?
Modern methods for an accurate protein analysis proceed through three main steps: separation, ionization and mass analysis.
Over the past decades, many techniques have evolved to allow the separation of proteins and peptides in complex samples. Standard techniques rely on the use of gel-based separations (e.g. SDS-PAGE, 2D, and 3D gel electrophoresis), but these techniques are time-consuming, labor-intensive and, often, unsuitable to visualize membrane-bound proteins or to resolve highly complex proteomes.
For this reason, liquid chromatography techniques (LC) have been increasingly adopted for modern protein analysis.
Separation is the first step in protein analysis. Followed by this initial stage, proteins need to be detected and measured in any given sample.
This type of analysis has only been made possible by the advent of mass spectrometry (MS) techniques. MS was developed in the 1980s to serve as a tool for structural studies. Due to this sensitivity and accuracy at determining the mass-to-charge ratios (m/z), it has been adapted and coupled with many separation techniques which now serve as core techniques in proteomic investigations.
Modern MS techniques for proteomics use soft ionization processes due to the inherent instability of proteins and peptides. These techniques include electrospray ionization (ESI) and matrix-assisted laser desorption ionization (MALDI).
Traditionally, methods relying on gel-separation followed by MALDI ionization have been the gold-standard of proteomics.
However, in recent years, more researchers are adopting complex methods of separation to avoid the pitfalls of gel-based techniques. Thus, LC-based methods coupled with ESI ionization systems are becoming recurrently used in different functional profiling studies.
Although these systems are usually sufficient to determine the accurate mass of intact or digested proteins, most approaches include an additional stage of analysis: mass analysis.
This analysis consists of fragmenting the protein or peptide (parent ion) to obtain a product ion profile. If conditions are maintained, the fragmentation pattern can be replicated and compared to in silico and experimental databases for protein identification.
Much progress has been made in mass analysis instrumentation. Currently, most popular mass analyzers are the time-of-flight (TOF) and the quadrupole (Q), which are often even used in combination (i.e. Q-TOF systems).
Summary of the main techniques used for protein analysis
All of these techniques can be used alone or in tandem:
- Protein and peptide separation
-
- Gel-based – SDS-PAGE, 2D and 3D gel electrophoresis
- Chromatography-based – liquid chromatography (LC)
- Protein and peptide ionization sources for mass spectrometry
-
- Electrospray ionization (ESI)
- Matrix-assisted laser desorption ionization (MALDI)
- Protein and peptide mass analysis
-
- Time-of-flight (TOF)
- Quadrupole (Q)
- Fourier transform ion cyclotron resonance (FT-ICR)
- Linear ion trap (LIT)
- Orbitrap
Approaches to protein analysis
There are two main approaches to proteomics:
- Top-down approach – direct analysis of complex samples containing intact proteins to study the diversity at a cellular level and the relative abundance of post-translational modifications. These studies can analyze label-free or stable-isotope labeled proteins.
- Bottom-up approach – it’s analysis at the peptide level requiring prior chemical or enzymatic digestion of a protein or mixture of proteins. The approach is more suitable for single proteins or simple mixtures of proteins, and it is the basis for their identification or de novo sequencing. Different techniques can be used for bottom-up proteomics:
-
- Peptide mass fingerprinting (PMF) – typically performed in MALDI-TOF systems, it is a fast and accurate technique. However, it requires prior purification and comparison to a highly reliable in silico or experimental database, for this reason, it has a low throughput.
- Tandem MS – these approaches are very complex and time-consuming and are typically performed in LC-MS/MS systems. They can be used for de novo sequencing of peptides (very laborious and complex) or database comparison.
Bottom-up approaches are the most widely used and mature technologies for protein identification. Thus, the techniques themselves and the bioinformatic tools for data analysis are well developed and highly robust. However, these approaches find limited application in protein identification from complex mixtures of peptides and proteins. Moreover, due to the low sequence coverage, these methodologies often only give information related to a fraction of all peptides present in a sample.
On the contrary, top-down approaches face many technical challenges. For instance, it is much harder and time-consuming to separate intact proteins than it is to separate a simple mixture of peptides. However, top-down methodologies do not require an initial protein digestion step, which is not always efficient.
For these reasons, top-down methods are often able to provide a wealth of information regarding relative protein abundance and characterization of post-translational modifications.
Recent advances in protein bioinformatics are also increasingly facilitating the identification of proteins from whole-protein analysis alone. This is more feasible when robust and high-quality databases are available. In these cases, if the database contains both parent ion masses, product ion fragmentation profiles as well as instrument operating conditions, it may be possible to perform a tentative identification of the native protein based on this type of data alone.
Due to the inherent complexity of natural samples, both types of approaches and methodologies can profit from some level of purification, and/or pre-analysis labeling (e.g. chemical or metabolic labeling, such as isobaric or stable isotope labeling, respectively).
Moreover, when these methods are used in combination with genomic and/or transcriptomics data, it is possible to construct highly accurate functional annotations and identify post-translational modifications taking place in a cell at any given time.
Proteomics in cancer research
In the post-genomic era, researchers continue to search for answers to the most challenging pathologies. Proteomics has emerged as a promising field to solve the current limitations of genomic studies.
Due to the wealth of information provided by mass spectrometry technologies, researchers are increasingly turning to these approaches in the search for new biomarkers.
Cancer research is one of the areas of research where the search for new biomarkers is most critical. Mortality for cancer does not result from the lack of effective treatments, but rather from the lack of effective diagnostics that detect cancer at an early stage.
Advanced proteomics can help us find those markers because, in tumors, proteins direct the growth, invasion, metastases, cell-to-cell interactions, and response to therapies. The malignant transformation involves significant changes in protein expression within a single cell and subsequent clonal progeny.
Early-stage cancer markers have been historically identified using conventional proteomic techniques, such as ELISA, Western Blot or gel electrophoresis. Currently, cancer research has transitioned from these conventional methods to mass-spectrometry-based approaches.
Many of the initial proteomic profiling studies date from the late 2000s and have identified many new cancer markers including:
- Eosinophil-derived neurotoxin – an ovarian cancer marker discovered from urine profiling with nano-LC-ESI-MS/MS
- Antithrombin III – a CNS lymphoma marker discovered from cerebrospinal fluid profiling by LC-TOF-MS/MS
- MMP-9, DJ-1, and A1BG – three pancreatic cancer markers discovered from peritoneal fluid profiling by gel electrophoresis separation, followed by trypsin digestion and MALDI-TOF-MS/MS
- Among others
Concluding remarks
Modern proteomics based on mass spectrometry analysis has been increasingly used to increase our knowledge on the role and expression profiles of specific proteins at a cell and organism levels.
Currently, many methods are available to separate, ionize, perform mass analysis, and subsequently identify proteins in simple and complex samples. MALDI-TOF/MS and LC-ESI-MS/MS are two of the most commonly used methods for protein analysis.
Although both methods are laborious and time-consuming, they continue to drive the discovery of new biomarkers associated with several diseases including cancer.
These new biomarkers will have the potential to enhance the development of more accurate and sensitive diagnostics techniques and aid in the prevention of diseases such as cancer or Alzheimer’s, where early detection is of utmost importance.
- Gupta, N. et al. Whole proteome analysis of post-translational modifications: Applications of mass-spectrometry for proteogenomic annotation. Genome Res. 2007; 17(9): 1362–1377. doi: 10.1101/gr.6427907
- Kreunin, P. et al. Bladder Cancer Associated Glycoprotein Signatures Revealed by Urinary Proteomic Profiling. J Proteome Res. 2007; 6(7): 2631–2639. doi: 10.1021/pr0700807
- Pascovici, D. et al. Clinically Relevant Post-Translational Modification Analyses-Maturing Workflows and Bioinformatics Tools. Int J Mol Sci. 2018; 20(1): pii: E16. doi: 10.3390/ijms20010016
- Shruthi, B. S. et al. Proteomics: A new perspective for cancer. Adv Biomed Res. 2016; 5: 67. doi: 10.4103/2277-9175.180636
- Udeshi, N. D. et al. Methods for analyzing peptides and proteins on a chromatographic timescale by electron-transfer dissociation mass spectrometry. Nat Protoc. 2008; 3(11): 1709–1717. doi: 10.1038/nprot.2008.159
- Wehr, T. Top-Down versus Bottom-Up Approaches in Proteomics. 2006; 24(9):1004-1010. Available on http://www.chromatographyonline.com/top-down-versus-bottom-approaches-proteomics-0
- Zhang, Z. et al. High-throughput proteomics. Annu Rev Anal Chem (Palo Alto Calif). 2014; 7:427-454. doi: 10.1146/annurev-anchem-071213-020216