|
|
||||||||
Systematics and Phytogeography |
Centre for Plant Biodiversity Research, Commonwealth Scientific and Industrial Research Organisation, Division of Plant Industry, Canberra, ACT 2601 Australia; School of Botany and Zoology, The Australian National University, Canberra, ACT 2601 Australia
Received for publication January 3, 2002. Accepted for publication May 3, 2002.
| ABSTRACT |
|---|
|
|
|---|
Key Words: chloroplast noncoding DNA group II introns molecular evolution phylogenetic analysis RNA structure rpl16 intron
| INTRODUCTION |
|---|
|
|
|---|
|
One of the most exciting features of group II introns as phylogenetic tools is the uniformity of their structure and function. It should be clear in the following discussion that functional requirements induce structural constraints on group II introns and that these constraints may contribute to heterogeneous mutation patterns across G2 intron sequences. Understanding the connection between structure, function, and evolutionary constraints in a G2 intron is therefore fundamental to improving all levels of phylogenetic analysis based on G2 intron sequence data.
The information presented here is intended to assist molecular systematists in the use of G2 intron sequences for phylogeny estimation in higher plants. This review does not cover the special case of the trnL intron, the sole group I intron in the chloroplast genome; it is expected, however, that most of the methodological approaches described here for G2 intron analysis will apply to similarly structured RNA molecules, including group I introns.
| GROUP II INTRON STRUCTURE AND FUNCTION |
|---|
|
|
|---|
Whereas group I introns are found in all genomes of prokaryotic and eukaryotic organisms, group II introns are restricted to plant and fungi organelles and certain prokaryotes of cyanobacterial and proteobacterial lineages. Mitochondrial genomes of plants and fungi have their own set of group II introns that appear to differ historically from those found in chloroplast genomes (Michel and Ferat, 1995
; Toor, Hausner, and Zimmerly, 2001
). It has been demonstrated that each organelle may have maintained its unique group II intron assembly by vertical descent since the incorporation of the organelle into the eukaryotic cell (Toor, Hausner, and Zimmerly, 2001
), and there is no contradictory evidence as yet to reject this conclusion. The situation is markedly different in organellar group I introns, several of which seem to have repeatedly and independently invaded mitochondrial genomes (Turmel et al., 1995
; Cho et al., 1998
; Cho and Palmer, 1999
; Goddard and Burt, 1999
; Holst-Jensen et al., 1999
; Palmer et al., 2000
). The sole group I intron in plastomes, that of the trnL gene, may predate endosymbiosis and is not thought to derive from post-endosymbiotic invasion of the genome (Besendahl et al., 2000
).
Function
The primary function of a group II intron, whether in a chloroplast, mitochondrion, or prokaryote genome, is to self-direct its extrication from gene transcripts prior to translation of the mRNA into a protein. This process requires two rounds of autocatalytic chemical reactions, termed "splicing reactions." Splicing refers to the capacity of the intron to break the ribonucleic acid chain of the pre-mRNA transcript at the exon boundaries and reconnect the strand after the intron's removal. Splicing completely excises the intron from the disrupted gene transcript, allowing the pre-mRNA to continue its maturation pathway and subsequent translation. Failure to properly or efficiently remove the intron from the transcript prevents further transcript processing and translation; the protein is not synthesized, and presumably both the organism and the intron are strongly selected against.
These reactions define the primary functional phase of an intron and occur while the intron and host gene are a pre-mRNA transcript. Thus, in terms of evolutionary constraints, changes in the intron DNA sequence should be considered in terms of its RNA counterpart. This detail has significant implications when using G2 introns for comparative sequence analysis and phylogeny construction.
The two stages of intron-directed cis-splicing reactions are as follows (Jacquier, 1996
; Podar, Perlman, and Padgett, 1998
; Holländer and Kück, 1999
; Jurica and Stoddard, 1999
; Costa, Michel, and Westhof, 2000
). The first stage consists of the folding of a pre-mRNA intron transcript into its secondary and tertiary structural formation, called a "ribozyme" (a term designating a catalytic RNA). This folding brings a single adenine in domain VI (Fig. 1) against the now neighboring intron/exon boundaries. The proximity of the adenine to the G1 nucleotide (the first nucleotide of the 5' end of the intron) triggers a nucleophilic attack, and a transesterification reaction cleaves the ribonucleic acid at the 5' intron-exon boundaries to form a structure known as a "lariat" (Schmelzer and Müller, 1987
; Jacquier and Jacquesson-Breuleux, 1991
). The second stage of splicing involves another transesterification reaction involving the 3' intron/exon boundary, rejoining the exon pre-mRNA fragments together and releasing the intron, still in lariat form.
|
Toor, Hausner, and Zimmerly (2001)
surveyed 39 maturase ORFs from prokaryotic and organellar G2 introns and identified six phylogenetic lineages that correspond closely with individual G2 intron structural categories. For example, all maturases from class IIA1 mitochondrial introns belong to a single maturase lineage. Furthermore, their survey of 142 known group II introns revealed that most prokaryote G2 introns have ORFs, about half of lower eukaryote G2 introns maintain ORFs, but only two of the 42 higher eukaryote organellar G2 introns still have ORFs coding for a functional maturase (matK in the chloroplast, matR in the mitochondrion). Their data suggest a model of G2 intron evolution in which nearly all chloroplast and mitochondrial G2 introns have lost their functional maturases, and those G2 introns still maintaining them have each coevolved with their maturase since ancient times. These findings concur with those of an earlier study by Mohr, Perlman, and Lambowitz (1993)
.
Chloroplast introns have managed to persist despite the loss of their unique maturases. They still seem to require an RNP complex for efficient splicing in vivo (e.g., Holländer and Kück, 1999
), but it is thought that the maturase matK can successfully form RNP complexes with any of the G2 introns in the chloroplast (Mohr, Perlman, and Lambowitz, 1993
; Ems et al., 1995
; Vogel, Börner, and Hess, 1999
). This arrangement may have freed the remaining G2 introns in the plastome from having to maintain their own ORFs, a situation that could in part be responsible for the high levels of sequence variation noted in many chloroplast intron domain IVs (e.g., Learn et al., 1992
; Downie et al., 1998b
; Downie, Katz-Downie, and Watson, 2000
). Not surprisingly, there is evidence that many chloroplast G2 introns contain degenerate maturase ORFs in their domain IV helix (Toor, Hausner, and Zimmerly, 2001
).
Maturases are also reverse transcriptases and have several reverse transcriptase (RT) domains in their coding sequence. This enables some G2 introns to move about in their host genomes (Lambowitz and Belfort, 1993
; Mueller et al., 1993
; Zimmerly et al., 1995b
; Eickbush, 1999
; Jurica and Stoddard, 1999
). Mobile G2 introns are mostly known from prokaryotes and yeast, and several examples of G2 intron transpositioning have been demonstrated within the yeast mitochondrial genome (Mueller et al., 1993
; Lazowska, Meunier, and Macadre, 1994
; Moran et al., 1995
; Yang et al., 1998
; Sellem, Begel, and Sainsard-Chanet, 2000
). Besides having complete RT domains, the typical maturase of a mobile group II intron contains a "zinc finger" that aids in targeting by sequence recognition (a process known as "homing"; Jurica and Stoddard, 1999
; Mohr et al., 2000
). It is this feature of some G2 introns that has made them candidates for medical applications, such as gene therapy in humans (Tanner, 1999
; Guo et al., 2000
; Mohr et al., 2000
).
Mobility in chloroplast group II introns has not been detected. The lack of a zinc finger and complete reverse transcriptase domains in matK (Mohr, Perlman, and Lambowitz, 1993
; Young and dePamphilis, 2000
) is consistent with the expectation that matK RNP complexes do not possess homing capabilities.
A recent discovery has suggested that splicing of at least some chloroplast G2 introns in maize may also involve two nuclear-coded gene products, CRS1 and CRS2 (Jenkins, Kulhanek, and Barkan, 1997
; Vogel, Börner, and Hess, 1999
; Till et al., 2001
). It is suspected that CRS1 is a required cofactor in atpF splicing reactions and that CRS2 may be a cofactor in group IIB intron RNPs, the most frequent class of introns in the plastome.
There is sound in vivo and in vitro experimental evidence that group II intron function is largely consistent in both the mitochondrion and the chloroplast (Herdenberger, Holländer, and Kuck, 1994
; Holländer and Kück, 1998
, 1999
). One in vivo system developed by Herdenberger, Holländer, and Kuck (1994)
for point mutation studies of group II introns introduces the mitochondrial rI1 group II intron from the green algae Scenedesmus obliquus into the chloroplast gene tscA of Chlamydomonas reinhardtii. The inserted intron efficiently completes its splicing reactions in the chloroplast, enabling proper translation of the tscA protein. Such studies highlight the probable universality of mechanisms involved in G2 intron splicing reactions.
Function is inextricably linked to structure in G2 introns. We can therefore infer that site-specific mutations that terminate function in one G2 intron will likely have the same effect in other G2 introns if such mutations occur in homologous structural positions. As discussed in the final section of this paper, this concept may have important applications when using G2 introns for molecular phylogenetics.
Structure
Jacquier (1996)
estimated that a G2 intron would need no less than 600 nucleotides to maintain all structural features involved in proper splicing. The standard group II intron folding model was created by identifying conserved secondary structures of folded RNA intron sequences among a wide variety of organisms and genomes (Michel and Dujon, 1983
; Michel, Umesono, and Ozeki, 1989
; Michel and Ferat, 1995
). The Michel, Umesono, and Ozeki (1989)
model still remains the best estimate of G2 intron structure and has largely been validated by in vitro and in vivo experimental investigations, including point mutation studies (Peebles et al., 1995
; Abramovitz, Friedman, and Pyle, 1996
; Holländer and Kück, 1999
), chemical footprinting (Konforti, Liu, and Pyle, 1998
; Costa, Michel, and Westhof, 2000
), and NAIM analysis (Boudvillain and Pyle, 1998
; Boudvillain, de Lencastre, and Pyle, 2000
). Updated detailed models of the four group II intron structural categories can be found in Toor, Hausner, and Zimmerly (2001)
.
Although group II introns from organisms as diverse as cyanobacteria, Euglena, higher plants, and fungi share little in the way of nucleotide sequence similarity, sequences from each organism can be folded into the same core secondary structure. The Michel, Umesono, and Ozeki (1989)
model has six main domains, multiple subdomains, and a nomenclatural system (Fig. 1). The general structure consists of six major structural helices that radiate from a "central wheel" of single-stranded RNA segments. Domain I (D1; Table 2) is the most complex, with many structurally important subhelices, and it typically comprises more than half of the total intron sequence. This domain interacts strongly at the tertiary level with domains V and VI (Boudvillain and Pyle, 1998
; Konforti, Liu, and Pyle, 1998
) and with external binding sites in the flanking exons. Domains II (D2) and III (D3) are considerably shorter and vary in length in plants (Learn et al., 1992
). These domains seem to contribute relatively little to tertiary structure and splicing efficiency (Kwakman et al., 1989
; Koch et al., 1992
; Konforti, Liu, and Pyle, 1998
). Domain IV (D4) can be quite large in chloroplast G2 introns, and in the trnK intron it is the site of the maturase ORF, matK. Domain V (D5) is the most highly restricted in terms of length and sequence variation (Michel, Umesono, and Ozeki, 1989
; Learn et al., 1992
) and is almost always 34 nucleotides in length in plants. Domain V is not known to possess any binding sites with the mRNA substrate, and its high degree of conservation is most likely due to its fundamental role in ribozyme folding (Peebles et al., 1995
; Jacquier, 1996
; Konforti, Liu, and Pyle, 1998
). Domain VI (D6) has tertiary interactions with domains I and V (Koch et al., 1992
; Dib-Hajj et al., 1993
; Podar and Perlman, 1999
) and may vary in length, usually in its terminal loop.
|
There are several intraribozymic interactive sites in a group II intron sequence that assist in giving the mature ribozyme its functional tertiary structure. Those proposed or identified by experimental evidence are indicated by Greek symbols in Fig. 1 (see Table 3 for details). Some of these interactions are essential for splicing, an example being
'. Holländer and Kück (1999)
were able to demonstrate in vivo that splicing of an intron in the chloroplast depends on the ability of
' to form a Watson-Crick (canonical) pairing. The
' interaction conserves the sequence identity of the terminal helix in domain V and the bulge in D1d1 (Peebles et al., 1995
). Other interactions, such as
', may not be as essential, for the absence of this interaction does not significantly diminish splicing efficiency in those introns investigated (Koch et al., 1992
).
|
| MITOCHONDRIAL GROUP II INTRONS |
|---|
|
|
|---|
Two factors of mtDNA evolution may be partly responsible for the apparent difficulty in developing mitochondrial group II introns as phylogenetic tools. The first factor is the slow rate of synonymous substitution in mitochondrial DNA, which is estimated to be almost five times slower than that of chloroplast DNA (Wolfe, Li, and Sharp, 1987
; Schuster and Brennicke, 1994
). The second factor is the frequency of recombination in plant mtDNA that can sometimes relocate exon and intron elements of a disrupted gene to separate regions of the genome. The result can be a fragmented intron, of which one or more domains are scattered through the genome (Chapdelaine and Bonen, 1991
; Wissinger, Schuster, and Brennicke, 1991
; Knoop and Brennicke, 1993
; Malek and Knoop, 1998
; Sainsard-Chanet, Begel, and d'Aubenton-Carafa, 1998
). One such recombination event gave rise to a "tripartite" intron in Oenothera berteriana (Knoop, Altwasser, and Brennicke, 1997
), in which three separate fragments of a group II intron must now be brought together as post-transcriptional pre-mRNAs to form a functional splicing ribozyme. The reaction is referred to as trans-splicing (see Bonen, 1993
; Knoop and Brennicke, 1993
; Doetsch et al., 2001
) and is thought to be the general mechanism for the splicing of all fragmented G2 introns in the mitochondrial genome. A targeted mitochondrial intron that is fragmented could make polymerase chain reaction (PCR) amplification difficult or impractical.
| CHLOROPLAST INTRON LOSS IN ANGIOSPERMS |
|---|
|
|
|---|
Several higher plant lineages have been extensively surveyed for the presence of chloroplast group II introns. The rpl2 intron has the most widespread reported losses, with absences in members of at least 17 angiosperm families: Aizoaceae, Amaranthaceae, Basellaceae, Cactaceae, Caryophyllaceae, Chenopodiaceae, Convolvulaceae, Cuscutaceae, Didiereaceae, Droseraceae, Fabaceae, Geraniaceae, Menyanthaceae, Nyctaginaceae, Phytolaccaceae, Portulacaceae, and Saxifragaceae (Downie et al., 1991
; Doyle, Doyle, and Palmer, 1995
; Lai et al., 1997
). The rpl16 intron is absent in certain members of the Geraniaceae, Goodeniaceae, and Plumbaginaceae (Downie and Palmer, 1994
; Campagna and Downie, 1998
). The rpoC1 intron is missing in members of the Aizoaceae, Cactaceae, Fabaceae, Goodeniaceae, Passifloriaceae, and the Poaceae (Downie, Llanas, and Katz-Downie, 1996
; Katayama and Ogihara, 1996
; Downie et al., 1998a
). The rps12 intron has been lost in at least three members of Anemone (Hoot and Palmer, 1994
). The absence of the rps16 intron in Epifagus (Wolfe, Morden, and Palmer, 1992
) and several genera in Fabaceae (Downie and Palmer, 1992
; Doyle, Doyle, and Palmer, 1995
) is due to loss of the rps16 gene itself in these plastomes.
| MUTATION PATTERNS IN GROUP II INTRONS |
|---|
|
|
|---|
For example, conservation of amino acids in a protein can lead to codon-specific substitution patterns in protein-coding sequences due to the flexibility of the genetic code (e.g., Reeves, 1992
; Olmstead, Reeves, and Yen, 1998
; Berg, 1999
; McClellan, 2000
). Conservation of the active site in the RuBisCo protein complex restricts the number of mutable sites in the rbcL sequence of plant chloroplast genomes (Kellogg and Juliano, 1997
). Conformational requirements of ribosomal RNA can influence the degree, distribution, and nature of nucleotide change in rDNA (e.g., Hickson et al., 1996
; Soltis and Soltis, 1998
; Hershkovitz, Zimmer, and Hahn, 1999
). Finally, group II intron function in higher plants may create mutation rate variation among ribozyme structures experiencing differing functional constraints (Learn et al., 1992
; Clegg et al., 1994
; Downie et al., 1998b
).
Heterogeneous mutation patterns in the rpl16 intron in Myoporaceae
As part of an ongoing phylogenetic analysis of Myoporaceae (a probable lineage of Scrophulariaceae sensu Olmstead et al. [2001]
), the chloroplast rpl16 intron was sequenced by the author for 46 taxa representing nearly 30 morphologically defined lineages. The rpl16 intron is among the fastest evolving sequence regions in the plastome (Wolfe, Li, and Sharp, 1987
; Small et al., 1998
; Downie, Katz-Downie, and Watson, 2000
) and is often used for inter- and infrageneric phylogeny estimation in plants (Table 1). Mutation patterns were assessed across the entire sequence as well as on a partition by partition basis. The results were presented at a recent conference (Kelchner et al., 2000
) and will be summarized here to illustrate the heterogeneous manner of mutation accumulation in a chloroplast group II intron.
The mutation data was compiled by a direct tally of mutations across a matrix of aligned sequences. In general, a tally of mutations is not an optimal approach to mutation pattern assessment for at least three reasons: first, a tally cannot accommodate superimposed mutation events; second, it does not take into account any influence that historical relationship may have on the distribution of character state variation; and third, it does not adequately test the possibility that observed heterogeneity in mutation patterns may still be the result of a largely stochastic mutation process under a uniform model. However, in this study sequence variation is very low, what variation exists is largely autapomorphic, and there is no available independently derived phylogeny for the taxa. A statistical test of the difference between observed and expected mutation patterns is not readily applicable, for without a model of mutation and a phylogeny to map change upon, it is difficult to determine expected values for the manner and distribution of mutations in these sequences.
The probable recency of the family's origin and the abundance of autapomorphic change in the rpl16 intron sequences provide an interesting opportunity for estimating mutational "tendencies" in a group II intron. The majority of observed rpl16 intron mutations across Myoporaceae are autapomorphic (110 substitutions), and potentially informative character state transformations are relatively few in number (38 substitutions). Tallying mutations across such a sequence matrix should minimize the potential influence of hierarchical structure on observed mutation patterns. From the very low p distance values between sequences (0.00% to 1.73%) we would expect that superimposed substitutions are limited in number (but certainly not impossible; see Kelchner and Clark, 1997
).
Sequence alignment followed the criterion of Kelchner (2000)
, which integrates structural and mutation class arguments for character homology with the conventional sequence similarity approach. Secondary structures for each sequence in its RNA form were determined using the domain-by-domain folding method (see below, Techniques: Inferring G2 intron secondary structures). Three data partitions were considered: partitioning by each of six G2 intron domains (D1D6); partitioning by four structural categories of stem, loop, bulge, and single-strand interhelix sequence (in the manner of Vawter and Brown, 1993
); and a partition consisting of the entire intron sequence.
All rpl16 intron nucleotide characters in Myoporaceae were readily assigned to domain partitions because of the distinctiveness of domain boundary sequences (Michel, Umesono, and Ozeki, 1989
). The classification of nucleotides into four structural categories was more difficult. Multiple minimum free energy foldings exist for terminal loops in helices D3 and D4. In these cases, nucleotides not decisively placed in a structural category were classified as "ambiguous" and removed from the analysis. Difficulties notwithstanding, 822 of the 953 nucleotides in the aligned matrix (86.25%) could be assigned to a structural class.
Rate heterogeneity (
parameter) for the unpartitioned data set was estimated under the HKY85 +
likelihood model. The tree and parameter estimation analysis took nearly three weeks on a G3 computer using PAUP*4 beta 4 (Swofford, 1998
). Substitution class frequencies, base composition, and distribution of variable and potentially informative characters were calculated for all partitions.
If selective constraints are consistent across all nucleotide sites in a G2 intron sequence, then each subset (partition) of an intron sequence should reflect the mutation pattern of the entire sequence as a whole. Any strong deviation of mutation patterns among partitions or between a partition and the entire sequence should indicate the presence of heterogeneous mutation processes in the data. Furthermore, if the group II intron sequences were under minimal selective constraints and evolving in a neutral fashion, we would expect to find nearly equal base composition (i.e., 25% frequency of each nucleotide), a more or less equal distribution of substitutions among sites, and about twice as many transversions as transitions. Each character partition, if sufficiently large in sample size, would also be expected to show these patterns of mutation.
The results of the mutation pattern assessment for the rpl16 intron in Myoporaceae are presented in Fig. 2. Overall, there does not seem to be a consistency in mutation pattern between all partitions, suggesting that an heterogeneous mutation process underlies this data. Several of the following points are particularly interesting in the context of phylogenetic analysis.
|
There is also evidence of variation in substitution types by partition, as well as unequal frequency of substitution classes across the entire sequence. The most common substitution class in the matrix is A/G, which is ten times more frequent than C/G substitutions (Fig. 2C). Transitions are more frequent than transversions when averaged across the matrix, but there is variation in the degree of the transition : transversion ratio between partitions. The stem category of structural partitions shows a very high transition rate, nearly four times higher than the transition rate across the entire matrix (Fig. 2D). This is perhaps the most dramatic example in the study of a structural partition deviating from an averaged sequence value for a mutation category.
In terms of base composition, loops, bulges, and interhelical sequences are particularly rich in A, loops having almost twice the A content of stems (Fig. 2E). In the domain partitions, domains IIV and domain VI are all high in A/T content, although domain IV has nearly twice the frequency of T as domain V (Fig. 2F). Base composition in domain V approaches equivalency for all nucleotide states, perhaps due to strong functional constraints resulting in a relative increase in G/C content.
Although this was not a statistical test of mutation dynamics, the variation of mutation patterns between subsets of a group II intron sequence is consistent with the findings of other researchers (e.g., Learn et al., 1992
; Downie, Katz-Downie, and Cho, 1996
; Downie, Katz-Downie, and Watson, 2000
) and suggests that heterogeneous modes of mutation may be a general feature of group II introns. The presence of heterogeneous processes in G2 intron sequence data has important implications for their use in phylogenetic analysis (see below, Alignment and analysis).
Parameter values in likelihood analysis try to account for such mutational biases, but are usually estimated in a likelihood framework using the entire sequence. Figure 2, however, illustrates that parameter values derived from the entire sequence may differ from those estimates derived within specific partitions. For example, a transition : transversion ratio for the entire rpl16 intron data in Myoporaceae is 1.37 : 1, but for nucleotides in the stem category this ratio is more than 5 : 1. Applying the total sequence average of 1.37 to an analysis of Myoporaceae rpl16 intron sequences would be treating 57% of the categorized nucleotides (those in RNA stem positions) under an improper value for transition rate.
Constraints on sequence evolution
Site-specific limitations on character state transformation
Several sites in G2 intron sequences experience a restriction in potential character states due to specific functional requirements of a nucleotide or secondary structure. Many of these sites have been tested experimentally by point-mutation studies to assess their influence on splicing reaction efficiency. Figure 1 indicates the many sites that are thought to be restricted solely to purines (R) or pyrimidines (Y). Most of these nucleotides are involved in tertiary interactions with other regions of the ribozyme. Other nucleotides are highly conserved in all group II introns and are examples of the "invariable" character in phylogenetic analysis (see Lockhart et al., 1996
; Steel, Huson, and Lockhart, 2000
). Two specific cases include the A in the D6 bulge (marked in bold with an asterisk, Fig. 1) and the primary 5' intron nucleotide G (also referred to as the "G1" nucleotide; Holländer and Kück, 1999
). Both nucleotides are involved in the transesterification reactions that cleave the pre-mRNA substrate, and any change in character state for either site will prevent splicing.
Mikheeva et al. (2000)
recently investigated the conservation of the sequential GA nucleotide couplet just upstream of domain III (Fig. 1). The GA couplet is present at this structural location in almost all group II introns, which suggests these nucleotides possess a functional importance. Mikheeva et al. (2000)
found that the dinucleotide contributes to the second step in intron splicing reactions and occupies an important spatial position in the ribozyme tertiary structure. Alteration or deletion of the GA led to highly reduced splicing efficiency.
The
' interaction must pair canonically (G-C, A-U) or significant reduction in splicing efficiency results (Holländer and Kück, 1999
). We may therefore infer that any change in one
nucleotide must be accompanied by a simultaneous and compatible mutation in its
partner to retain Watson-Crick pairing of these two sites. Site mutation studies of the D5 loop sequence, GAAA, greatly reduced splicing efficiency (Chanfreau and Jacquier, 1994
), probably by limiting its
' tertiary interaction with domain I (Costa and Michel, 1995
; Jacquier, 1996
).
Group II intron loops consistently demonstrate very high A and T content in comparison to the relatively G/C rich stem nucleotides, a feature reminiscent of loop sequence in group I introns and mitochondrial and nuclear RNAs (e.g., Ballard et al., 1998
; Gutell et al., 2000
). Additional character state restrictions that may be present in RNA stems are discussed below (High transition rates in stems).
G-U "wobble" base pairing
The G-U, or "wobble," base pair is a non-Watson-Crick association that is fundamental to nearly every class of RNA, including introns (Varani and McClain, 2000
). Group II intron folding models invoke a great number of G-U pairings, emphasizing why it is essential to fold intron DNA sequences as RNA transcripts. In their review of G-U pairing and its importance in biological systems, Varani and McClain (2000)
list several properties of the pairing that may be invaluable for catalytic RNA. Among these properties are the conformational flexibility that permits "sharp turns" in RNA structures, the positioning of metal ions at active sites, increased electronegativity in the major groove of paired nucleic acid strands to create a recognition site by induced fit or chemical identity, and provision of a thermodynamically viable alternative to canonical base pairing. In group II introns, some G-U pairing can be highly conserved, two examples being the functionally important G-U pair in domain V (Peebles et al., 1995
; Abramovitz, Friedman, and Pyle, 1996
; Konforti, Liu, and Pyle, 1998
; see Fig. 1) and the G-U pairs often surrounding the branch site (the A bulge) in domain VI (Chu et al., 1998
).
Though less energetically stable, C-A pairing may also be prevalent in group II intron structures. Varani and McClain (2000)
suggest that C-A bonding can provide many of the structural features of the G-U pairing. More importantly, perhaps, for mutation dynamics in G2 introns, C-A pairs may be of minimal hindrance in the formation of key stem structures and therefore may not be strongly selected against in certain structural positions.
High transition rates in stems
Stem nucleotide substitutions may be more likely to persist in group II intron sequences if they are transitions. This can be understood in terms of selective constraints conserving stem structure. If stem formation in a ribozyme must be maintained for a functional reason, substitutions occurring in stem nucleotides should only persist if they do not significantly reduce the likelihood of proper stem formation. In nuclear RNA, it has been reasoned that mutation leading to nonpairing nucleotides within a stem must eventually result in compensatory mutation to maintain stem structure (Wheeler and Honeycutt, 1988
; Dixon and Hillis, 1993
; Muse, 1995
; Springer, Hollar, and Burk, 1995
; Hickson et al., 1996
). It is proposed that such compensatory mutations happen in a step-wise process over time (Rousset, Pelandakis, and Solignoc, 1991
; Kraus et al., 1992
; Gatesy et al., 1994
), which requires the mispairing to persist until a compensatory mutation can take place.
Consider a case, however, in which an RNA structure may be so highly constrained that any mispairing would result in an altered structure and loss of function. For example, the domain V helix of group II introns is nearly invariable in its stem and loop lengths and is an essential pillar in the tertiary folding of the intron ribozyme (Fig. 3). Point mutation studies of this domain reveal a link between precise structure formation and splicing efficiency of the intron (Peebles et al., 1995
; Höllander and Kück, 1999
). In such a structure, selection may only tolerate those mutations that maintain immediate pairing after the substitutionin other words, it would not be possible in such cases to achieve compensatory mutation by a stepwise process.
|
If correct, this reasoning provides at least one possible explanation for the low number or total absence of observed compensatory mutations in the some group II intron studies (e.g., Laroche and Bousquet, 1999
; this paper) and the high rate of transitions in intron stems. It also suggests that any RNA sequence demonstrating exceptionally high transition rates may be under intense structural conservation as a stem structure.
Positional rate heterogeneity
As a consequence of variable functional constraints on different structural features, we would expect a certain heterogeneity in mutation rates per site in group II intron sequence data. The phenomenon is referred to as "positional" (Steel and Penny, 2000
) or "among-site" rate heterogeneity (Kuhner and Felsenstein, 1994
; Yang, 1994
, 1996
). Some sites are immutable in G2 introns (discussed above); others, such as nonpairing nucleotides in RNA, may experience particularly high levels of substitution (e.g., Hickson et al., 1996
; Downie, Katz-Downie, and Watson, 2000
; and Fig. 2B).
Under a neutral evolution hypothesis, site mutation probabilities are expected to follow a normal distribution. When positional rate heterogeneity is present, site mutation probabilities more closely follow a gamma distribution. The
parameter of likelihood/distance models describes the shape of a gamma distribution function for site-mutation probabilities. A low
value (near zero) indicates a highly skewed distribution of mutation rates and strong positional rate heterogeneity; a higher value describes a more equal mutation probability per site (Yang, 1994
; Swofford et al., 1996
).
We might expect a certain level of substitution rate heterogeneity between sites in group II intron sequences due to structural constraints that may occasion heterogeneous mutation processes. Interestingly, for the 46 rpl16 intron sequences in Myoporaceae, the
parameter estimate under an HKY85 +
model was "infinite." An infinite value for the
parameter signifies that site mutation probability is equivalent for all sitesin other words, the estimation method has not detected significant positional rate heterogeneity, as assessed by a full-sequence estimate of the
parameter under the HKY85 +
model in Myoporaceae.
Does this mean a group II intron may have no significant positional rate differences in its sequence? The reality of site mutation rates in a G2 intron may be more complex than it first appears. The author ran the same likelihood analysis (with
parameter estimated under the given model) on the aligned sequences in Table 4
for the conserved regions of rpl16 intron sequences for 21 higher plants. This time, the
parameter estimate was 0.367, indicating significant positional rate heterogeneity in these partitioned sequences from structurally conserved regions. Therefore, at this high taxonomic level across angiosperms, a partition of conserved characters in a group II intron shows a skew in site mutation probabilities that is not detectable in a lower-level analysis of complete intron sequences.
|
|
parameter in terms of both the entire sequence and each constituent partition (structural and domain) before incorporating an
parameter estimate in a likelihood analysis of G2 intron sequences.
Linked mutations in G2 introns
Nonindependence of nucleotide characters is plainly as much an issue in group II intron sequences as it is in any sequence underlying a structured molecule (Kjer, 1995
; Huelsenbeck and Neilsen, 1999
; Kelchner, 2000
; Tufféry and Darlu, 2000
; Felsenstein, 2001
). Nucleotide sites in conserved intron secondary structures evolve in conjunction with their pairing nucleotides in an RNA stem. In the 46 Myoporaceae taxa of this study, 57% of all categorized nucleotides in each rpl16 intron sequence occurred in stem formations, illustrating the extent of nonindependent characters in a G2 intron sequence.
Huelsenbeck and Neilsen (1999)
discuss a form of character nonindependence that involves correlated mutation events through timefor example, one mutation may increase the likelihood of additional mutations that are linked with the primary event. Temporally correlated mutations that may occur in group II introns include length variation due to increased slipped-strand mispairing activity in a region of numerous adjacent sequence repeat units (Kelchner, 2000
). Another example, described by Kelchner and Wendel (1996)
, are the multiple minute inversion events linked with the formation of a hairpin just downstream of helix D1d2 in the rpl16 intron. Both situations may give rise to accelerated rates of mutation due to the presence of a "mutational trigger" (Kelchner and Clark, 1997
)a specific sequence pattern that increases the likelihood of subsequent mutation events linked to that sequence pattern (see also Graham et al., 2000
).
Nonindependence of characters in a sequence data set can be somewhat alleviated by the application of complex models of character evolution during phylogeny estimation and evaluation of clade support. Compensating for nonindependent characters in G2 intron sequence analysis is discussed in the section ALIGNMENT AND ANALYSIS.
| TECHNIQUES |
|---|
|
|
|---|
Highly conserved structural features of a G2 intron may serve as excellent sites for internal primers, such as the 3' primary stem sequence of domain III and the adjacent 5' primary stem sequence of domain IV. Although this is a case of how intron structure can be exploited for more efficient PCR reactions, some phenomena related to intron structure may negatively affect amplification of double-stranded intronic DNA. Lengthy stem structures can form in single-stranded DNA template during the PCR reaction, particularly if the DNA version of the RNA stem is composed solely of canonical pairings (G-C, A-T). If such a stem has a strong (exceptionally negative)
G value, this structure in the PCR template could make amplification and sequencing difficult.
One may address this difficulty in a similar manner as countering secondary structure-based problems in ITS and other rDNA. Baldwin et al. (1995)
suggest using high-temperature PCR and sequencing reactions in such cases to assist in disassociating secondary structures in the template. Dimethyl sulfoxide (DMSO) can also be helpful in limiting structure formation (Winship, 1989
), as well as formamide (Zhang, Reading, and Deisseroth, 1992
) and mild detergents (Bechmann, Luke