Sunday, July 22, 2007

Darwin to Central Dogma: How We Get Protein from the Genetic Code. Part II

So, let's talk about the Central Dogma and start by moving beyond the introductory material to the meat of the subject in this, the second part of this series.

Transcription

As you can see from the first (left) overview figure in Part I, the molecular pathways of genetic information, which are analogous to strings of letters (nucleotide or amino acid alphabets) forming different kinds of sentences in different kinds of languages, transform this information from DNA or RNA to protein, but never really from protein to nucleic acid. Transcription is the name given to transferrence of DNA "sentences" into RNA "sentences;" in turn, these code for specific amino acids in polypeptide chains (on several occasions this code is redundant, more later).




Where do we begin understanding this process of gene to protein, transcription and translation? With the source, DNA. But first, we must define some terms.




DNA stands for deoxyribonucleic acid (structural model at left) and is found in pro- and eu-karyotic cells. Each DNA molecule is comprised of many smaller molecules linked together in a long, double stranded helical chain. Eukaryotic DNA (our focus) is linear, while DNA is often in a ringed or looped form with no loose ends in prokaryotic cells. The smaller molecules (monomers) that make up DNA are called nucleotides, and in DNA these come in four basic forms, A, C, G, and T, which stand for adenine, cytosine, guanine, and thymine. DNA monomers are grouped into two different categories, purines and pyramidines; however, they all consist of the same basic structure, being composed of a nitrogenous base, a deoxyribose (five-carbon, or pentose) sugar, and a phosphate region.


RNA, or ribonucleic acid, is also linear and found in eukaryotic and prokaryotic cells. The structure of its monomers are similar to those of DNA, except the sugar group contains the oxygen-containing side chain DNA is missing (hence the name "deoxy-" ribose)--or, plainly, regular ribose is incorporated into RNA monomers. In RNA molecules (polymers), the "alphabet" of the RNA strands ("sentences") is similar to that for DNA, with one exception; uracil (U) is used instead of thymine. Last, but not least, proteins are polymers of amino acids, organic compounds containing carboxyl and amino side chains off a carbon backbone.



Now, transcription.


One form of RNA, called messenger RNA (mRNA), is the runner mediating DNA-protein synthesizing compound information flow. So information doesn't just go from DNA to RNA to protein; it goes from DNA to mRNA first. This is accomplished by a series of specific transcription steps--promotion, factor addition, transcription, and termination--with a similar general form in eukaryotes and prokaryotes; however, the mode of operation differs between the two groups in terms of how transcription is stopped. Before describing this difference, I will describe the similarities in transcription processes.



Transcription, like DNA replication, is carried out by a special enzyme, RNA polymerase. There are several types of RNA polymerases (RNAPs) in eukaryotic cells, called RNAP I, RNAP II, and RNAP III. RNAP II carries out transcription of DNA into mRNA. Furthermore, only certain regions of DNA are transcribed. How? DNA sequences contain segments known as "promoters" that signal RNAP II (sometimes called Pol II) to begin transcription of the "downstream" DNA (5'-->3'), a specific set of nucleotides called the transcription unit.



The promoter region contains a "TATA box," a TATA sequence around 25 nucleotides from the beginning of the transcription unit (transcribed DNA strand). Successful transcription requires a group of transcription factors binding to DNA before RNAP II can start working. Once this event takes place, RNAP II comes in and binds the DNA strand with these other molecules, the double helix is unwound, RNAP II begins putting together RNA molecules with one strand of the DNA double helix (the template strand) to form the mRNA product, or mRNA transcript.



Now for the differences in pro- and eukaryotic organisms... RNAP II (aka Pol II) continues working downstream along the DNA template, matching nucleotides to the template and elongating the mRNA transcript. However, at the end of the transcription unit, the process has one of either (at least) two fates, which are dependent upon which type of organism transcription is occurring in. When RNA polymerase reaches the end of the template DNA in, say, bacteria, for instance, a termination sequence of nucleotides signals polymerase to pop off of the DNA molecule and transcription is halted. On the contrary, wen RNA polymerase II reaches the end of the transcription unit in humans, for example, a special polyadenylation sequence, AAUAAA, signals nearby proteins to clip off the mRNA transcript, while the polymerase keeps chugging along until it basically (oddly) falls right off the DNA molecule. All of this takes place in the nucleus, with great speed and great precision!




Click here to see an activeX movie of transcription!!!!




RNA processing



You may assume the first mRNA transcript is the one that gets translated into protein. Directly from the DNA, baby. This is a nice story, except for the fact that this isn't what really happens! What really happens is, the first mRNA transcript is actually the pre-mRNA. It must first be manipulated by proteins in the nucleus before shipping to the extra-nuclear cytosol where it will direct protein synthesis. So, what exactly does this processing entail? What makes this process different from transcription in other life forms, say prokaryotes?



Messenger RNA is manipulated at both ends during processing. At the 5' end, a phosphorylated guanine nucleotide is added, called the 5' cap. At the other, 3' end of the mRNA transcript, the poly-A tail is added just beyond the polyadenylation sequence (AAUAAA, just mentioned). The new 3' end consists of a repeat of 50-250 A's. These end modifications aid in extra-nuclear transport, ribosome attachment, and protection of the fledgling transcript from hydrolysis. So, from left to right, the mRNA is composed of the 5' cap, the mRNA regions, and the poly-A tail. Within the main body of the mRNA, however, some regions of nucleic acid are fated for near-future translation, while others represent untranslated regions, or UTRs. Protein-coding segments of mRNA are signaled at each end by start (AUG) and stop codons (UAA, UGA, UAG). Codons, in turn, are three-nucleotide groups read by protein synthesis machinery, which play a pivotal role in translation of mRNA to polypeptide chains, which we turn our attention toward in just a moment, after discussing the most bizarre part of mRNA processing, RNA splicing.



UTRs contain regions of nucleic acid sequence that are not expressed, called intervening sequences, or introns. Conversely, translated regions of mRNA include exons, expressed regions of genetic material which influence protein synthesis. RNA splicing is the term given to the portion of mRNA processing that splices introns out of mRNA transcripts. So, we see that pre-mRNAs contain 5' caps, UTRs, TRs, polyadenylation signals, and poly-A tails, with UTRs containing introns and TRs containing exons! Hard to keep track of all that right? That's just the way it is. After RNA processing is completed, including RNA splicing (which I forgot to mention is carried out by enzymes called spliceosomes), the mature mRNA transcript is more simplex, consisting of a 5' cap, short UTR, TR or coding region, another short UTR, and the poly-A tail.



So, this begs the question, "If machinery [spliceosomes] is in place to remove UTR segments called introns, isn't this disadvantageous, a waste of energy? Why hasn't evolution selected against intron survival in animal genomes, when this seems to be the most parsimonious modus operundi?" Brilliant. Recent inquiry has shed a great deal of light on this subject, and it appears that introns are in place for a non-random reason. Instead of representing a random, useless, or vestigal process/pattern introns remain in eukaryotic genomes because they help get mRNA transcripts out of the nucleus, but also for a more important reason. Some gene regions may be treated by translational machinery as introns or exons for different reasons; in turn, this leads to differential RNA splicing called "alternative RNA splicing," which confers upon single genes the ability to code for more than one protein. In effect, a consequence of alternative RNA splicing is that genomes can generate a far greater volume of protein products per unit nucleotide length. Also, equally important is the idea of exon shuffling, which says that introns spacing exons provide more length of genome (albeit non-coding length), increasing the probability that crossover events will rearrange exon ordering while maintaining the coding sequences to give rise to new sets of exons (mutant alleles) giving rise to new proteins, giving natural selection more material to work with!!



Translation




Good public speakers often give the advice to begin with an overview, a roadmap for the content to follow, but to guard your secrets for "money slides" that come later, say in a powerpoint. I'm skipping that advice on the translation part here. Why? Translation is so simple, it would waste your time to overview. So what does it entail? Translation is the process of transferring molecular genetic information from processed mRNA to amino acid chains. Translation is accomplished by transfer RNA (tRNA) and ribosomes and involves three steps: (1) attachment of mRNA to ribosomal subunits and proteins and intitiation of amino acid chain translation; attachment of tRNAs to mRNA using anticodon sequence regions, and ribosome addition of tRNA amino acids to the lengthening polypeptide chain; and termination of the translation process.

Now, you realize there are multiple types of RNA molecules. It is also important to realize physically what this process of translation entails, which requires knowing something extra about the RNA and proteins involved. RNA, as you know, is single-stranded nucleic acid in composition. Also, RNA doesn't form helical arrangements like DNA. However, RNA does not take the simple form of linearity suggested by textbook cartoons and the fact that it is single-stranded; instead, RNA molecules fold into complex structures called hairpin loops based on pairings of nucleotides from different regions of the strand. Transfer RNA is no different in this respect. While often depicted in simple two-dimensional form, tRNA has a complex and specific three-dimensional structure with four nucleotide base-pair regions and three loops of exposed nucleotides. At the 3' end of each tRNA is an amino acid attachment site. One nucleotide loop contains a three base sequence called the anticodon.

Translation requires tRNA processing before normal progress can be made, just like mRNA transcripts require processing before joining in the translation equation. While mRNA (and other types of RNAs) are processed in the nucleus, it is in the cytosol that amino acids (AAs) are "activated," or joined to free tRNAs, by enzymes called aminoacyl-tRNA synthetase, producing the processed tRNA product called an aminoacyl tRNA or activated tRNA. Aminoacyl-tRNA synthetase enzymes have active sites for amino acids and tRNAs and catalyze reactions attaching the two molecules through covalent bond formation. The addition of amino acids to aminoacyl-tRNA synthetase is energetically expensive, costing cells energy in the form of ATP floating in the cytosol. After AAs and tRNAs are joined, the finished product pops out of the aminoacyl-tRNA synthetase active site and into the cytosol, where the new complex is available to the translation machinery. An important point is that each tRNA must bind with a specific AA; therefore, because there are 20 types of eukaryotic amino acids, the animal cell contains 20 different types of these synthetases, one for generating each aminoacyl tRNA necessary for translation of mRNA phrases into analogous phrases in the amino acid language.

So, enough of this background already! What about the act of translation, how complex is that? Biochemically, very complex. Generally, easy to remember with some practice.

Let's look at this in terms of materials and methods. Materials required for translation include small and large subunits of ribosomal RNA that make up ribosomes, aminoacyl tRNAs, mRNA transcripts, GTP, and a host of protein binding and release factors that guide the translational processes at the surface of the ribosome. Now, the method.

During the initiation phase of translation, the small ribosomal subunit (ribosomal RNA made in the nucleolus and associated protein) binds, along with the initiator tRNA (containing the anticodon of the mRNA start codon), to an mRNA transcript. The initiator tRNA is an aminoacyl tRNA with a methionine amino acid group attached and an anticodon sequence of UAC, which makes it possible for this initial tRNA to bind the start codon, AUG by joining bases with hydrogen bonds. After this complex is formed, a large ribosomal subunit moves in opposite the small subunit and GTP is hydrolyzed to provide energy fueling the attachment of the initiator tRNA to a special site (fold) on the large ribosomal subunit. Note, there are three dfferent sites on the large ribosomal subunit, called the A site, P site, and E site. The initiator tRNA binds to the P site as a result of this energy input. That's the end of initiation.

The translational elongation phase is exactly what the name implies. As in mRNA trascription, the elongation phase of translation represents the expansion of the polymer product, in this case an amino acid chain. At the end of the initiation phase, both ribosomal subunits, the methionine aminoacyl tRNA, and the mRNA transcript are bound together with several protein complexes. Elongation is set up because the A and E sites of the large ribosomal subunit are exposed. Elongation is a cyclic process involving the following steps: (a) binding of an incoming aminoacyl tRNA to the A site, (b) attachment of the polypeptide initiator (methionine, or the last amino acid added) to the next amino acid at the tip of the just arrived activated tRNA (set into the A site in step a), (c) shifting tRNAs over one site to the left (see figure below paragraph), which moves the chain-containing tRNA to the P site and the original tRNA to the E, or "exit, site, from which point it pops off the ribosome. This three-step process is carried out, moving the mRNA transcript 5' to 3' through a nook in the ribosome, and continues until the ribosome falls off the mRNA molecule.


Translation is terminated when the codon just before the stop codon is translated into amino acid, such that the stop codon is the next available in the A site. A protein release factor steps into the A site, hydrolyzing the bond between the tRNA holding the AA chain at its end (in the P site) and the AA chain itself, causing the two to separate and the whole translation assembly to disassociate into its respective parts. And that's it!

Significance

Understanding how genetic information is shuttled at the molecular level from DNA to proteins is the foundation of molecular biology and an integrative area of impactful research with implications for evolution, development, and world health. The structural complexity of the ribosomal machinery alone has taken over forty years to decipher. Just last year, the 2006 Nobel Prize in Chemistry was awarded to Roger Kornberg of Stanford University for his study of how DNA is transcribed into mRNA, which has yielded stunning crystallographic and digital images of all the biochemical factors involved in this process (see this link and this link).

Studies of transcription and translation are medically important, especially from the perspective of pharmaceutical applications. For example, transcription studies have led to the development of drugs capable of halting bacterial transcription, including streptomycin, which are useful for fighting human bacterial infections and associated diseases.

Last, but not least, transcription and translation, in essence the arrows in the Central Dogma of Molecular biology overview figure from Part I, matter to our understanding of evolutionary processes and evolutionary biology, its applied and pure scientific frontiers. As I have explained in Part I, neither Darwin nor his contemporaries had access to a robust field of genetics as we do today. Darwin's cousin, Francis Galton, was one of the leading scientists in proto-genetic studies during their lifetimes and his interests and those of other scientists lacked the Mendelian focus that gave rise to so much greater an understanding of heredity in the 20th Century, leading to the Central Dogma. Had these men (and maybe some women scientists of their day, if they existed) known what we know now, theirs would be our more fully-orbed view of evolution. Evolution cannot happen without starting material (origins of life, not our concern), without life itself. More importantly, however, evolution could not have given (and continue giving) rise to the tremendous diversity of life we now see around us without genetic variation, a product of errors in processes incorporated into the Central Dogma.

In other words, no mutation means no evolution, and mutations arise due to slips in DNA replication, mRNA transcription, amino acid translation, and expression regulating processes. Without transcription and translation, hereditary information could not be transformed into the structural and functional complexity of living things. Alternatively, living things could not evolve, could not flourish with modification, without stochastic changes in the hereditary information and its flow through biological organisms. More on evolution and mutation next time. Later. ~ JB

Follow this link to information about translation and the Nobel Prize.

No comments: