|
|
||||||||
Systematics |
Department of Plant Sciences, University of Oxford, South Parks Rd., Oxford, OX1 3RB, UK
Received for publication October 2, 2001. Accepted for publication January 17, 2002.
| ABSTRACT |
|---|
|
|
|---|
Key Words: allopolyploid domestication Fabaceae hybridization Leucaena nrDNA pseudogene rDNA
| INTRODUCTION |
|---|
|
|
|---|
In a recent monographic treatment of the genus, Hughes (1998a)
showed Leucaena to comprise 24 species, including two named hybrids, along with six infraspecific taxa. The genus ranges from Texas, in the United States, to central Peru in South America, with the greatest diversity of species in south-central Mexico and northwest Central America. All species are small- to medium-sized trees that grow mainly in seasonally dry deciduous tropical forests and to a lesser extent in semi-arid thorn scrub forest, dry mid-elevation matorral, and, in the north, subtropical or warm temperate habitats. Several species of Leucaena are widely cultivated for the production of livestock feed, green manure, small wood products, and for soil conservation (Pound and Martínez-Cairo, 1983
; National Academy of Sciences, 1984
; Brewbaker, 1987
; Hughes, 1998b
), and one species, L. leucocephala, is pantropically naturalized and weedy (Hughes and Jones, 1999
), making Leucaena one of the most common and familiar trees of the tropics. Leucaena is also an interesting genus for evaluating ideas about indigenous plant domestication processes. The indigenous use of the unripe seeds of Leucaena species for food in many parts of south-central Mexico has been widely documented (Whitaker and Cutler, 1966
; Zárate, 1987
, 1994
, 1997
, 1998
, 1999
; Casas and Caballero, 1996
; Hughes, 1998b
), although the full extent and implications of this indigenous use in terms of cultivation, translocation, incipient domestication, and spontaneous hybridization, are only now starting to be more fully understood (Hughes, 1998b
; Zárate, 1998
, 1999
).
The occurrence of hybrids and allopolyploid species with their reticulate as opposed to divergent histories complicates conventional analyses of species relationships. Joint application of cpDNA and nrDNA markers is well suited to unraveling reticulate from divergent relationships (e.g., Soltis, Doyle, and Soltis, 1992
). The chloroplast genome is usually nonrecombining and uniparentally inherited, making it useful for tracking haplotype lineages and distinguishing maternal from paternal parents. In contrast, nuclear ribosomal DNA (nrDNA) provides recombining, biparentally inherited markers, potentially identifying hybrid origins that may not be revealed by analysis of cpDNA data alone. Previous work to estimate species relationships within Leucaena and understand polyploid origins has relied on morphological data (Zárate, 1994
; Hughes, 1998a
), with its inherent limitations for detecting reticulations and disentangling them from divergent relationships (McDade, 1990
, 1992
, 1995
; Rieseberg and Ellstrand, 1993
; Rieseberg, 1995
; Hughes, 1998a
), on analysis of cpDNA restriction fragment data (Harris et al., 1994
), which, given the maternal inheritance of the chloroplast genome in Leucaena (S. A. Harris, unpublished data), are also of limited value as a sole source of evidence for estimating species relationships or detecting hybrids, and on cytological data (Hartman et al., 2000
). To address this gap, a new DNA sequence data set for the 5.8S subunit and flanking internal transcribed spacer regions (ITS 1 and ITS 2) of nrDNA has been assembled for a substantial subset of the accessions used in Harris et al.'s (1994)
cpDNA study.
Unlike the sole use of cpDNA data, nrDNA data alone can provide direct evidence of reticulate evolution if concerted evolution fails to act across the repeat units contributed by different parent species (Doyle, Doyle, and Brown, 1990
; Baldwin et al., 1995
; Buckler and Holtsford, 1996a
; Waters and Schaal, 1996
; Hershkovitz, Zimmer, and Hahn, 1999
; Zhang and Sang, 1999
), and there is a growing number of reports of intraspecific and intra-accession ITS polymorphism potentially attributable to interspecific hybridization (Suh et al., 1993
; Sang, Crawford, and Stuessy, 1995
; O'Kane, Schaal, and Al-Shebaz, 1996
; Buckler, Ippolito, and Holtsford, 1997
; Campbell et al., 1997
; Emshwhiller and Doyle, 1998
; Jobst, King, and Hemleben, 1998
; Fuertes-Aguilar, Rosello, and Feliner, 1999
; Kuzoff et al., 1999
; Vargas et al., 1999
; Widmer and Baltisberger, 1999
; Gaut et al., 2000
). Conversely, there are reports that suggest that concerted evolution has proceeded to homogenize ITS repeat units, even in recent allopolyploids (Wendel, Schnabel, and Seelanan, 1995
; Ainouche and Bayer, 1997
). However, it is also apparent that detecting divergent repeat types, especially where they occur at low frequencies, may not be straightforward, suggesting that unless specific search strategies are used, divergent repeat types may be missed (e.g., Buckler, Ippolito, and Holtsford, 1997
; Lim et al., 2000
). Where concerted evolution has proceeded such that only a single copy type is present, direct evidence of hybrid parentage is lost, but nrDNA may still provide important evidence of hybrid parentage when compared with other data (e.g., cpDNA). In this paper we analyze the ITS data and explore the implications of these data in combination with a reanalysis of the cpDNA restriction fragment length polymorphism (RFLP) and morphological data for understanding diploid species relationships and polyploid origins within Leucaena. We present evidence of divergent ITS paralogues, including putative pseudogenes within accessions of four tetraploid and one diploid species of Leucaena, and discuss what this means for the origins of these species.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Outgroup selection
Desmanthus fruticosus and Schleinitzia novoguineesis were chosen as outgroups based on a series of recent morphological and DNA sequence analyses (Luckow, 1997
; Hughes, 1998a
; Luckow, White, and Bruneau, 2000
; C. E. Hughes et al., unpublished data) that consistently placed Desmanthus Willd. and Schleinitzia Warburg ex Nevling & Niezgoda as sister groups in a clade that is sister group to Leucaena. These three genera, together with the poorly known monotypic genus Kanaloa Lorence & Wood, form the recently re-circumscribed informal Leucaena group within the tribe Mimoseae (Luckow, 1997
; Luckow, White, and Bruneau, 2000
). The original cpDNA analysis (Harris et al., 1994
) included an additional outgroup, Microlobius foetidus (Jacq.) Sousa & Andrade, but recent analyses (Luckow, White, and Bruneau, 2000
) indicate that Microlobius C. Presl. is distantly related, and it is not included in the present study. Inclusion of Microlobius created significant restriction site mapping difficulties in the original cpDNA analysis, prompting adoption of the fragment occurrence analysis approach used in that study (see below). Furthermore, inclusion of ITS sequences of distantly related mimosoid taxa (C. E. Hughes et al., unpublished data) significantly complicates alignment of the more variable regions in ITS 1 and ITS 2 and would have necessitated omission of part of the ITS 1 sequence data from the analysis.
Morphology
The morphological data matrix used here comprises 24 characters and is the same as that presented by Hughes (1998a
: Table 5) with the following modifications. Firstly, in the interest of maximizing taxon matching in the combined data set, two outgroup species, Calliandropsis nervosus (Britton & Rose) H. M. Hern. & Guinet and Desmanthus balsensis J. L. Contr., were omitted. This means that the following five characters from the original 29-character morphology data matrix are no longer potentially informative: character 3, brachyblasts present/absent; 8, involucel present/absent; 12, staminodial flowers present/absent; 13, floral bracts peltate or sessile; 27, pod dehiscence. Secondly, new chromosome counts (Cardoso, Schifino-Wittmann, and Bodanse-Zanettini, 2000
; Schifino-Wittmann et al., 2000
) have been added to replace data previously missing from the original matrix (character 29 in Hughes, 1998a
).
DNA extraction
DNAs were extracted from fresh leaves of greenhouse-grown plants (from seed), herbarium specimens, or silica-gel-dried samples of field-collected leaf material (details at http://ajbsupp.botany.org/v89/hughes/hughes-voucher.doc). DNA isolation followed the cetyltrimethyl ammonium bromide (CTAB) technique of Doyle and Doyle (1987)
. Most samples were further purified using caesium chloride gradients (Maniatis, Fritsch, and Sambrook, 1982
), and DNAs were resuspended in tris-ETDA (TE) or water and stored at 20°C.
Chloroplast DNA restriction data
The cpDNA restriction site data used in this study were previously reported by Harris et al. (1994)
. Fourteen 6-base pair (bp) cutters (BamH-I; Bcl-I, Bgl-II, Bsc-I, EcoR-I, EcoR-V, Hind-III, Nru-I, Nsi-I, Pst-I, Pvu-II, Sac-I, Stu-I, and Xho-I) were used to digest total DNA and probed with six Vigna Savi chloroplast DNA sequences (MB1, MB2, MB3, MB5 + MB7, MB9, MB11 + MB12; Palmer and Thompson, 1981
) for a total of 84 probe-enzyme combinations (listed at: http://ajbsupp.botany.org/v89/hughes/hughes-cpDNA.doc). For the purpose of this study, the original autoradiograms were rescored for the presence/absence of restriction sites (rather than fragments, as previously treated by Harris et al., 1994
) to minimize the potential problem of scoring two fragments resulting from one restriction site as independent characters (Bremer, 1991
). In the present analysis, all nonidentical accessions were retained as terminals, where previously Harris et al. (1994)
, with a few exceptions, had treated variation between accessions of a taxon as single polymorphic terminals.
Nuclear ribosomal DNA ITS
Polymerase chain reactions (PCR) were run using Qiagen (Qiagen, Crawley, West Sussex, UK) Taq polymerase (final concentrations: about 1.5 units Taq, 100 µmol/L of each dNTP, 1%[v/v] PCR buffer, and 1%[v/v] Q solution, and 0.5 µmol/L of each primer). Amplifications were performed on a Progene thermocycler (Techne Limited, Cambridge, UK). Several combinations of ITS4/ITS5 (White et al., 1990
) and 17SE/26SE (Sun et al., 1994
) primers were used to obtain amplifications from all the taxa of interest. All amplifications began with a 3-min 94°C denaturation step, followed by 35 rounds of (1) 1 min at 94°C denaturation; (2) 1 min annealing at 48°C (primer combinations ITS4 + ITS5 and 17SE + ITS4), or 53°C (primer combination ITS5 + 26SE); and (3) a 1-min 72°C extension. The PCR products were cleaned using the Concert Purification System (Life Technologies, Paisley, UK) or Qiagen Gel Extraction Kits for direct sequencing or cloning. Both strands were sequenced using the PCR primers and "Big Dye" termination chemistry (Applied Biosystems, Warrington, UK). The PCR band polymorphism or "dirty" sequence traces for several templates identified the potential for heterogeneous copy types. These products were cloned (pGEM; Promega, Madison, Wisconsin, USA) using one-half the reaction volume described by the manufacturer. Clones were screened for the presence of an ITS insert using the PCR amplification primers and subsequently sequenced. In order to detect the range of maintained ITS polymorphism within accessions of all three subspecies of L. leucocephala, a more elaborate amplification/restriction digestion procedure using an Acc-I restriction site identified in the 5.8S subunit of some L. leucocephala sequences was required (see below).
Sequence alignment
Sequence fragments were edited and joined into contigs using Sequencher (Gene Codes, Ann Arbor, Michigan, USA). Complete sequences were provisionally aligned using ClustalX version 1.8 (Thompson et al., 1997
) and then adjusted by eye in WinClada (Nixon, 1999a
). ClustalX default parameters for multiple alignments were changed to a gap opening cost of eight and gap extension cost of six to generate reasonable starting alignments. Contiguous gaps were scored as characters using the "simple gap coding" method formalized by Simmons and Ochoterena (2000)
. Sequences are available in GenBank (accession numbers are available on the American Journal of Botany Supplementary Data Site at: http://ajbsupp.botany.org/v89/hughes/hughes-voucher.doc).
The ITS pseudogenes
A number of ITS studies have reported the occurrence of potentially nonfunctional pseudogene sequences (Buckler, Ippolito, and Holtsford, 1997
; Yang et al., 1999
; Hartmann, Nason, and Bhattacharya, 2001
). Most attempts to distinguish pseudogenes from functional copies have used pairwise comparisons of base pair differences and the occurrence of insertions/deletions (indels) across sequences of the normally highly conserved 5.8S subunit, with 5.8S variability taken as an indicator of nonfunctionality (Buckler, Ippolito, and Holtsford, 1997
; Hershkovitz, Zimmer, and Hahn, 1999
; Yang et al., 1999
). Other criteria, such as stability of secondary structure and substitution rates at methylation sites, were also used by Buckler, Ippolito, and Holtsford (1997)
. They concluded that individual criteria may not be sufficient to identify pseudogenes unambiguously and recommended examination of a suite of sequence characteristics.
Here we have attempted to identify putative pseudogenes using two approaches. The first approach used two different types of pairwise comparisons. First, we identified all base pair and indel differences within the 5.8S region between the outgroup Desmanthus fruticosus and all the other sequences. The absolute number of 5.8S base pair differences, or presence of indels, were assessed as indicators of potential pseudogenes. Second, instead of simply counting absolute differences in the 5.8S region, the percentage of total ITS variation contributed by the 5.8S was calculated by dividing the 5.8S contribution by the total number of base pair differences across the ITS region (ITS 1, 5.8S, ITS 2). The observed percentages of the overall ITS region (corrected for length differences due to indels) made up by the 5.8S were then compared to values that would be expected for a relatively unconstrained 5.8S; i.e., if the 5.8S region contributed variation levels similar to the ITS 1 and ITS 2, the sequence was considered to have been released from selective constraints and to be a putative pseudogene.
Alongside these two pairwise comparison approaches, a tree-based approach to identify possible pseudogenes was also used. This method is based on the principle that relatively unconstrained (e.g., 5.8S) and unconstrained (ITS 1 and ITS 2) regions can be compared to identify if a branch has changed in a manner consistent with a functional or nonfunctional copy. If the region that is typically highly constrained is changing at a rate similar to the relatively unconstrained region, the pattern of substitution is not consistent with functionality. The percentage of variation across the entire sequence contributed by the constrained region should be much less than the representative base contribution, i.e., length, of the constrained region. Thus, if a constrained region such as the 5.8S subunit represents X percentage of the total sequence length we would expect the variation contributed by the constrained region to be roughly X for a pseudogene branch and much less than X for a branch changing in a manner consistent with function. In this case it was expected that a functional 5.8S region would show a much lower rate of change than the two ITS regions, which are considered to be relatively unconstrained and more freely evolving (although short <26-bp conserved ITS regions have been reported by Liu and Schardl, 1994
; Buckler and Holtsford, 1996b
; Gernandt and Liston, 1999
; Hershkovitz, Zimmer, and Hahn, 1999
). The observed percentage of 5.8S change for all branches of ten or more steps was calculated by summing the total number of 5.8S substitutions unambiguously optimized (including autapomorphies) to the branch and then dividing this value by the total number of substitutions (ITS1, 5.8S, and ITS2) optimized along the branch (e.g., 12 5.8S substitutions ÷ 45 total substitutions = 27%). An expected pseudogene percentage for the branch was then calculated by dividing the total length of 5.8S sequence optimized to the branch by the total length of the entire sequence optimized to the branch (e.g., 164 bp 5.8S ÷ 600-bp ITS region = 27% sequence contribution from the 5.8S). Highly constrained, presumably functional, 5.8S regions should be easily detectable from those consistent with lack of function that are changing at rates equivalent to ITS 1 and ITS 2. These pseudogene detection methods are the focus of more detailed discussion elsewhere (C. D. Bailey et al., unpublished data).
The limited size of the combined 5.8S, ITS 1, and ITS 2 region reduces the scope for statistical testing of any of these comparions whether absolute or tree-based. We used t tests to assess the significance of the differences between mean numbers of 5.8S substitutions and percentages of ITS variation from the 5.8S for putative functional and pseudogene copies presented in Table 1.
|
Prior to simultaneous analysis of the combined data, each partition of the combined matrix (i.e., morphology, ITS, and cpDNA) was tested one against another for matrix compatibility using the incongruence length difference (ILD) test (Mickevich and Farris, 1981
; Farris et al., 1995
) implemented in WinClada (Nixon, 1999a
). Each of the three pairwise comparisons was made using 1000 random partitions, each analyzed with ten random addition sequences holding one tree per random addition sequence followed by swapping to a maximum of 100 equally most parsimonious trees (program commands: 1000 replicates; mult*10/per replicate; hold/1 per random addition; max*; hold100 per replicate).
In practical terms, the combined matrix was constructed using a concatenation approach (Nixon and Carpenter, 1996
) fusing individual accessions into a single potentially polymorphic terminal representing each species/infraspecific taxon. For example, the L. pulverulenta terminal with 828 characters combined from ITS (708), cpDNA (96), and morphology (24) encompasses character information from all six L. pulverulenta accessions studied, even if an accession was only present in one of the three matrices. Multiple character states for individual characters were scored as subset or full polymorphisms to encompass precisely all the variation observed for a taxon.
Phylogenetic analysis
All characters were scored as unordered and equally weighted. Parsimony-based analyses were conducted with NONA (Goloboff, 2000
) generated from WinClada (Nixon, 1999a
) using 1000 random addition sequences, tree bisection and reconnection (TBR), holding 100 trees per replication, and attempting to swap to completion (program commands: hold/100; mult*1000; max*). Preliminary analysis of the cpDNA data suggested that swapping all equally most parsimonious trees to completion would not be possible. Therefore, the parsimony ratchet (Nixon, 1999b
) was also used in an attempt to search a greater portion of tree space than is typically explored using a standard random addition sequence approach. This method involves iterative character weighted and unweighted steps holding few trees per replication in conjunction with many replications to search more efficiently for most parsimonious trees among tree islands (Nixon, 1999b
). Ratchets were conducted using NONA (Goloboff, 2000
) generated from Winclada (Nixon, 1999a
). Following the guidelines presented by Nixon (1999b)
, 100 iterations per ratchet were performed perturbing 20 (about 20%) of the informative characters (weighted step), constraining 10% of the nodes, and holding one tree per iteration. One hundred of these ratchet replicates were run on the cpDNA matrix.
The strict consensus bootstrap approach, which only considers clades to be supported if they are present in all of the equally most parsimonious trees identified within a replicate, was used here as it provides a more accurate and conservative measure of branch support than the more traditional "within replicate" measures (Davis et al., 1998
). One thousand strict consensus bootstrap replicates each comprising ten random addition sequences and holding 100 trees (program commands: hold/100; mult*10) were spawned from Winclada into NONA.
| RESULTS |
|---|
|
|
|---|
Concatentation of the morphology, cpDNA, and ITS data sets into a single combined diploid data matrix and fusion of accessions into 22 single species/infraspecific terminals left 164 potentially informative characters (data matrix available at http://ajbsupp.botany.org/v89/hughes/hughes-matrix1.txt). Standard parsimony analysis identified eight equally most parsimonious trees (length, L = 325; consistency index, CI = 0.60; retention index, RI = 0.70), and the strict consensus is presented in Fig. 1. The combined analysis supports a monophyletic Leucaena (100% bootstrap) with three main clades resolved within the genus. When compared to results of separate analyses of the individual data sets (data not shown), the simultaneous analysis showed greater resolution, especially within Clade 1 (albeit with most subclades only weakly or moderately supported), more robust support for the three main clades, and a novel placement of L. cuspidata (whose position was unstable or unresolved in analyses of the three individual data sets) within Clade 3, albeit this last also with only moderate bootstrap support. Relationships among the three main clades remained unresolved in the combined analysis.
|
Standard parsimony analysis was interrupted when 105 000 trees (L = 177, CI = 0.54, RI = 0.85) had been discovered. From the 100 parsimony ratchet runs, 4897 (of 20 000 saved) additional equally most parsimonious trees were found (L = 177). The strict consensus trees calculated from these two analyses were identical (Fig. 2), suggesting that additional searching would be unlikely to identify shorter trees or cause further loss of resolution.
|
These results are largely consistent with the earlier cpDNA analysis of Harris et al. (1994)
. However, beyond the substantial revision of names used by Harris et al. (1994)
there are also a number of minor differences in resolution and placement of a few taxa. Differences of this magnitude are to be expected given the differences in methods used, i.e., fragment occurrences rescored as presence/absence of mapped restriction sites, treatment of all accessions as separate terminals rather than as single composite polymorphic terminals, and removal of a number of putative hybrid accessions as well as L. shannonii4 and L. diversifolia8, which represented probable contaminant DNAs. Placement of the tetraploid species largely mirrors that found in the earlier cpDNA gene tree of Harris et al. (1994)
with strong bootstrap support (98%) for Clade 3. Within Clade 3, the two tetraploid species L. diversifolia and L. leucocephala were placed in a group with the single diploid species L. pulverulenta and L. pallida in Clade 2 as sister to L. pueblana. However, one important difference in this analysis was the placement of L. involucrata in Clade 2 with moderate 73% bootstrap support.
nrDNA ITS
In the early rounds of sequencing, 35 Leucaena accessions produced clean, readily readable traces with few or no polymorphisms. However, preliminary sequencing from about 15 accessions of four of the five tetraploid species, L. confertiflora (both subspecies), L. involucrata, L. leucocephala (all three subspecies), L. pallida, as well as diploid L. pulverulenta, and Schleinitzia novoguineensis, produced "dirty" or overlapping traces, suggesting that heterogenous ITS arrays might be present (e.g., Baldwin et al., 1995
; Buckler, Ippolito, and Holtsford, 1997
; Hershkovitz, Zimmer, and Hahn, 1999
). These PCR products were cloned and resequenced. Intra-accession size variation between clones warranted sequencing of the two or more size classes and at least two clones were sequenced from each accession. Approximately 15 accessions added in the later stages of the study were cloned without preliminary sequencing of PCR products to further sample potential ITS diversity.
Preliminary analysis of cloned sequences of the three allopolyploid L. leucocephala subspecies identified two divergent ITS types. Eight L. leucocephala accessions grouped with what was then a single accession of L. pulverulenta-6, a result that was in line with the maternally inherited cpDNA restriction site analysis (Fig. 2). However, a single highly divergent clone from L. leucocephala ssp. leucocephala-1 grouped with L. lanceolata ssp. sousae-1. Six of the eight L. leucocephala sequences in the core L. leucocephala clade included an Acc-I restriction site in the 5.8S subunit that was absent from all other ITS haplotypes. In order to explore whether other ITS types might be found in other L. leucocephala accessions, 1020 clones from each of five accessions (L. leucocephala ssp. glabrata-5,6; L. leucocephala ssp. ixtahuacana-1,2; and L. leucocephala ssp. leucocephala-3) were screened for the presence/absence of the Acc-I restriction site. A single clone from L. leucocephala ssp. leucocephala-1 lacked the Acc-I site. The precloning PCR products from these accessions were digested to clarify whether other types might have been amplified; all showed little or no uncut PCR product. Alternative PCR strategies and cocktails were explored but never produced significant amplification of the uncut type in the presence of the Acc-I-digestible type. To further investigate the possible extent of polymorphism, genomic DNAs from accessions of each of the L. leucocephala subspecies were then digested to remove haplotypes containing the Acc-I site as potential PCR templates. Given that the Acc-I restriction enzyme is methylation insensitive, digestion should have removed virtually all Acc-I types and would thus not tend to introduce any bias towards methylated haplotypes. The cleaved genomic DNAs were re-amplified, cloned, and further screened with Acc-I. Subsequent sequencing of clones lacking the Acc-I restriction site identified four additional L. leucocephala sequences representing each of the subspecies that grouped with L. lanceolata ssp. sousae-1 outside the previously identified core L. leucocephala group.
Internal transcribed spacer variation was also detected within accessions of three other tetraploids, L. confertiflora, L. involucrata, and L. pallida, as well as in the diploid species L. pulverulenta in the initial round of cloning and sequencing.
A total of 87 ITS sequences from 65 accessions were generated for the ITS analysis. Alignment and indel coding were relatively straightforward. The final matrix included 671 aligned bases representing 309 potentially informative substitution characters and 37 potentially informative gap characters (data matrix available at http://ajbsupp.botany.org/v89/hughes/hughes-matrix3.txt). Standard parsimony analysis swapped to completion and identified 3618 equally most parsimonious trees (L = 885, CI = 0.54, RI = 0.85); the strict consensus of these trees is presented in Fig. 3.
|
One unexpected feature of the ITS analysis is the strongly supported (100% bootstrap; six unique synapomorphies) grouping of two of the three accessions of L. collinsii ssp. zacapana with the two accessions of L. magnifica. The third accession of L. collinsii ssp. zacapana is placed in a clade with L. collinsii ssp. collinsii and L. trichandra.
Pseudogenes
In both of the pairwise comparisons used to identify pseudogenes, all other sequences were compared against the Desmanthus fruticosus outgroup sequence. This assumes that the Desmanthus sequence itself represents a functional ITS copy type. Several observations suggest that this assumption is reasonable. Firstly, the Desmanthus 5.8S sequence differs from previously published sequences in GenBank across divergent eudicot families including Fabaceae, Scrophulariaceae, Araliaceae, Lythraceae, and Solanaceae, by only one or two base pairs. Secondly, maximally divergent putative functional copies of Leucaena (identified by comparison with D. fruticosus) also differed from members of these same families by at most two base pairs. In contrast, minimally divergent putative 5.8S pseudogenes differed from their closest matching previously published GenBank sequences by at least ten nucleotides across the 5.8S region (maximally divergent types differed by as many as 23 sites). The marked discrepancies between low sequence variation in presumed functional copies across widely divergent taxonomic groups and high variation in presumed pseudogenes support the functionality hypothesis for the Desmanthus sequence.
Based on the assumption that the Desmanthus sequence represents a functional type, simple pairwise comparisons of all 5.8S sequences identified 26 potential pseudogene sequences from the 87 ITS sequences (Table 1; Fig. 4). The division between putatively functional and nonfunctional types was a discrete one given this measure. Putative functional copies differed from Desmanthus by a maximum of 5-bp differences, while sequences with 1120 differences were interpreted as potential pseudogenes, and the difference between the mean values for putative functional and pseudogene copies is highly significant (Table 1). The discrepancies between functional and nonfunctional types are further exaggerated by the occurrences of multibase deletions (1331 bp) in the 5.8S subunit in four of the presumed pseudogene sequences (Table 1). Deletions from the highly constrained functional 5.8S subunit are often taken as an indicator of lack of function (e.g., Buckler, Ippolito, and Holtsford, 1997
).
|
The tree-based approach to distinguishing functionality identified four clades containing potential pseudogenes whose constituent sequences correspond to those identified in the pairwise comparisons (Fig. 4). The observed percentage divergences for the 5.8S regions of the putative nonfunctional copies closely match the expected values for a pseudogene sequence (assuming equal rates of change across the entire ITS region) in three of the four pseudogene clades (Fig. 4). One subclade of pseudogene clade D (L. leucocephala ssp. leucocephala-3b and 1c) showed a lower than expected level of variation.
Accurate detection of potential pseudogene sequences is important in order to be able to assess how comprehensively nrDNA diversity has been sampled. We have detected and sequenced no functional nrDNA copies for three out of the five L. pulverulenta accessions, three out of the nine L. leucocephala accessions, and one out of five L. confertiflora accessions suggesting that ITS diversity still remains undersampled in this study. However, all taxa except L. leucocephala ssp. leucocephala are represented by at least one putatively functional nrDNA copy.
| DISCUSSION |
|---|
|
|
|---|
The great morphological diversity among diploid species within Clade 1 stands in contrast to the marked lack of molecular variation and resolution within this clade in the cpDNA and ITS gene trees (Figs. 2 and 3). The Clade 1 diploid species (Fig. 1) encompass the full range of quantitative leaf diversity and all three pollen types found within the genus as a whole, as well as diverse flowering shoot and anther gland types (Hughes, 1998a
), but the separate cpDNA and ITS gene trees provide minimal resolution within Clade 1. Comparable examples of morphological change outstripping molecular change have been found in other groups (e.g., Oxalis, Emshwhiller and Doyle, 1998
; Afromomum, Harris et al., 2000
). This means that the resolution within Clade 1 in the simultaneous analysis, albeit much of it weakly supported, is mainly provided by the morphological data, and the Clade 1 topology (Fig. 1) mirrors that found in the analysis of morphology alone (Hughes, 1998a
; data not shown). The relevance of the simultaneous analysis including morphology is also demonstrated by the strong support for inclusion of L. matudae in Clade 2 and L. greggii and L. retusa in Clade 3 (Fig. 1), which contrasts with their unresolved placement in the cpDNA gene tree (Fig. 2) and weakly supported placement in the ITS gene tree (Fig. 3).
Leucaena magnifica and L. collinsii ssp. zacapana
The unexpected placement of two accessions of L. magnifica in a strongly supported group with two accessions of L. collinsii ssp. zacapana in the ITS gene tree is not mirrored in the cpDNA analysis, where L. magnifica was placed in a weakly supported group with L. shannonii and several accessions of L. trichandra. In the morphological analysis (data not shown), L. magnifica was placed in a clade with L. shannonii, L. salvadorensis, and L. lempirana, while in the combined analysis (Fig. 1) it was placed as sister to L. shannonii. Since its discovery in 1984, L. magnifica has always been considered to be either a sister species or, as originally described, a subspecies of L. shannonii (Hughes, 1991
, 1998a
; Harris et al., 1994
). However, two other studies provide evidence suggesting a possible association between L. magnifica and L. collinsii ssp. zacapana. Firstly, isozyme studies by Chamberlain (1993)
showed that one population of L. magnifica shared isocitrate dehydrogenase (IDH) isozyme patterns with L. collinsii ssp. zacapana that were not present in other nearby L. magnifica populations, nor parapatric populations of L. shannonii. Secondly, Harris (1995)
presented an analysis of RAPD data that grouped L. collinsii ssp. zacapana with L. magnifica (referred to in that study as L. shannonii ssp. magnifica). Taken together these data suggest possible gene exchange between L. magnifica and L. collinsii ssp. zacapana. The distributions of these two taxa confirm that gene exchange between them is a possibility. Leucaena magnifica is endemic to a small area in the Department of Chiquimula in southeast Guatemala, adjoining the distribution of L. collinsii ssp. zacapana, which is endemic to the Motagua Valley system (distribution maps in Hughes, 1998a
). Populations of the two taxa occur in close proximity to each other near the villages of Ipala, San Jose La Arada, and El Carrizal 1020 km south of Chiquimula. Further work to improve population sampling and resolution and support within Clade 1 is needed to shed light on the potentially reticulate relationships of L. magnifica.
Internal transcribed spacer polymorphism
Identification of heterogeneous intra-individual nrDNA arrays is a critical issue for understanding ITS gene trees and has important implications for inferring species phylogenies (Sanderson and Doyle, 1992
; Buckler, Ippolito, and Holtsford, 1997
; Denduangboripat and Cronk, 2000
). Early ITS studies rarely detected multiple types within individuals (Baldwin et al., 1995
), even in allopolyploids (e.g., Wendel, Schnabel, and Seelanan, 1995
; Ainouche and Bayer, 1997
; Yang et al., 1999
). However, reports of ITS diversity within genomes are now much more common (e.g., Suh et al., 1993
; Sang, Crawford, and Stuessy, 1995
; O'Kane, Schaal, and Al-Shebaz, 1996
; Campbell et al., 1997
; Emshwhiller and Doyle, 1998
; Jobst, King, and Hemleben, 1998
; Fuertes-Aguilar, Rosello, and Feliner, 1999
; Kuzoff et al., 1999
; Vargas et al., 1999
; Widmer and Baltisberger, 1999
; Gaut et al., 2000
), suggesting that such variation may be the rule rather than the exception (Buckler, Ippolito, and Holtsford, 1997
; Hershkovitz, Zimmer, and Hahn, 1999
). At the same time, it is increasingly clear that detection of intra-accession ITS variation may not always be straightforward. The effects of PCR selection, PCR drift, secondary structure, and copy number (e.g., Baldwin et al., 1995
; Buckler, Ippolito, and Holtsford, 1997
; Hershkovitz, Zimmer, and Hahn, 1999
; Lim et al., 2000
) mean that direct sequencing of pooled PCR amplification products, and even Southern analysis, may fail to detect ITS diversity within genomes (e.g., Volkov et al., 1999
). Our attempts to sample ITS diversity within accessions of the tetraploid L. leucocephala, for which an elaborate amplification/restriction digestion procedure was needed to identify the range of maintained polymorphism, bear this out. Other recent studies have also revealed some of the complexities and difficulties associated with sampling ITS diversity (Buckler, Ippolito, and Holtsford, 1997
; Lim et al., 2000
; Hartmann, Nason, and Bhattacharya, 2001
), suggesting that directed strategies using more sensitive techniques, such as those used here, are needed. Given that most nrDNA studies have not used such strategies (e.g., Kovarík et al., 1996
; Hershkovitz, Zimmer, and Hahn, 1999
; Volkov et al., 1999
; Yang et al., 1999
), negative results, especially for known polyploids, need to be interpreted with caution.
With few exceptions, heterogeneous intra-individual ITS arrays have been associated with polyploidy or multiple nucleolar organizing regions (NORs) (e.g., Campbell et al., 1997
; Hershkovitz, Zimmer, and Hahn, 1999
). The question remains why such polymorphisms persist in the face of concerted evolution, which in many cases, and even for some apparently recently derived allopolyploids (e.g., Wendel, Schnabel, and Seelanan, 1995
; Ainouche and Bayer, 1997
; Yang et al., 1999
), appears to be highly effective across ITS homeologues (Baldwin et al., 1995
). Internal transcribed spacer polymorphisms may persist when concerted evolution is not fast enough to eliminate different repeat types in the face of high rates of mutation or gene flow/migration (e.g., Hartmann, Nason and Bhattacharya, 2001
) or recent interspecific hybridization (Campbell et al., 1997
). Concerted evolution has been suggested to proceed faster within than between rDNA loci (Arnheim, 1983
; O'Kane, Schaal, and Al-Shebaz, 1996
; Campbell et al., 1997
; Hershkovitz, Zimmer, and Hahn, 1999
), so for at least some allopolyploids, concerted evolution may proceed independently in each of the parental genomic contributions (Suh et al., 1993
), allowing two different nrDNA types to persist for longer than if the species was a typical diploid (Campbell et al., 1997
). Concerted evolution can also be disrupted due to loss of sexual recombination or to location of nrDNA loci on nonhomologous chromosomes (Campbell et al., 1997
).
Obviously one or more of these mechanisms may be involved in maintaining ITS polymorphism in Leucaena. With the exception of the diploid L. pulverulenta, all the Leucaena ITS polymorphisms occur in polyploids, suggesting that maintained polymorphisms were mostly associated with mechanisms linked to polyploidy (Thompson and Lumaret, 1992
; Soltis and Soltis, 1999
). There is circumstantial ethnobotanical and biogeographic evidence to suggest that L. leucocephala may be a recent hybrid (Hughes, 1998a
) and therefore that concerted evolution within this species has simply not reached completion. However, this alone is unlikely to explain the ITS diversity identified within all the polyploid Leucaena species or even all the variation found within L. leucocephala. The ITS polymorphism for L. leucocephala that exists within Clade 2 is most parsimoniously interpreted as having been passed to L. leucocephala from one of its diploid parents, L. pulverulenta (further discussion below). In this case, multiple NORs on nonhomologous chromosomes in L. pulverulenta, and its allopolyploid derivative, are the most likely cause of the maintained polymorphism. Cytogenetic evidence to support this hypothesis was found by Hartman et al. (2000)
, who identified six major and two minor NORs in L. leucocephala.
For the other Leucaena polyploids there is neither a clear indication of recent origin, nor of an obvious pattern of maintained polymorphism derived from a diploid progenitor. In these cases, and for polymorphism identified in L. leucocephala that was not obviously derived from a diploid progenitor, persistence of ITS variation is more likely attributable to limited parental genome interaction in the combined genomes of these polyploid Leucaena species.
Pseudogene identification
The discovery of intra-accession ITS polymorphism raised the possibility that some sequences represent nonfunctional nrDNA pseudogenes (e.g., Buckler, Ippolito, and Holtsford, 1997
). The two pairwise tests used here agreed with respect to which sequences represent potential pseudogenes. However, the second test, based on the relative contribution of 5.8S variation, was less decisive because the variation across the putative pseudogene 5.8S subunit was somewhat lower than would be expected for relatively unconstrained variation (Table 1). Buckler, Ippolito, and Holtsford (1997)
, using Kimura distances, observed similarly lower than expected levels of 5.8S variation among putative pseudogenes in Gossypium, Nicotiana, Tripsacum/Oryza, Winteraceae, and Zea. They suggested two possible explanations for these discrepancies. First, they point out that when two functional nrDNA arrays diverge (within a genome), the ITS regions will diverge faster than the 5.8S subunit, until functionality is lost, as discussed by Baldwin et al. (1995)
. Second, they observed that the base composition substitution model for the ITS vs. 5.8S comparisons might be too simple.
The tree-based approach presented here for characterizing pseudogenes identified four clades containing potential pseudogenes. These included all the putative pseudogenes identified by the pairwise comparisons (Fig. 4; Table 1). The percentage of variation from the 5.8S subunit was calculated on all branches longer than ten steps. Shorter branches were not considered because the level of variation was too small to provide a meaningful comparison. What is striking is that nearly all pseudogene branches analyzed in this way showed levels of variation in their 5.8S region close to that expected for ITS 1 and ITS 2, contrary to the comparable pairwise comparison method. Thus, the tree-based approach removed from consideration the possibility that ITS 1 and ITS 2 variation, prior to silencing, might be confounding estimates of relative 5.8S variation in those putative pseudogenes (Buckler, Ippolito, and Holtsford, 1997
).
One pseudogene lineage (the L. leucocephala ssp. leucocephala-1b and 3c subclade in pseudogene Clade D) showed a lower percentage of 5.8S variation than would be expected for a pseudogene. Given that this branch has a length of 46 steps, this result is unlikely to be due to random bias caused by short branch length. Base substitution model discrepancies, as suggested by Buckler, Ippolito, and Holtsford (1997)
, provide a potential explanation. However, in this case we cannot rule out the possibility that ITS 1 and ITS 2 variation, prior to silencing, might still be confounding estimates of relative 5.8S variation (Buckler, Ippolito, and Holtsford, 1997
).
Nearly all nonpseudogene clades and some subclades that include pseudogenes were subtended by branches that were too short (
10 steps) to assess subsequent behavior of derived branches. Within the clades that contain pseudogenes, branches derived from a pseudogene node were all considered to be potential pseudogenes, although this need not be the case. Additional character information might suggest whether potential reversions back to functionality following a pseudogene event would be possible.
Phylogenetic analysis of pseudogenes
Potential nrDNA pseudogenes are sometimes removed a priori from phylogenetic consideration (e.g., Yang et al., 1999
). Inability to align sequences is one good reason for excluding them. Another concern associated with the inclusion of pseudogenes in phylogenetic analyses is the potentially spurious placement of terminals due to long-branch attraction (Felsenstein, 1978
). While this is clearly a legitimate worry, it is not, a priori, a reason to exclude pseudogenes. The Leucaena ITS gene tree including all functional and potential pseudogene sequences (Fig. 4) shows three groupings, none of which were strongly supported, that may be the result of long-branch attraction; viz. two of the branches supporting a close association of the three basal sequences in pseudogene Clade A, two branches supporting the three L. leucocephala sequences in pseudogene Clade D, and the terminal branches of the L. pallida/L. involucrata subclade of pseudogene Clade B. These all have long branches subtended by relatively short and weakly supported nodes, and placement of these long-branch sequences should be viewed with caution.
In order to assess the effect of the pseudogene sequences on the phylogeny of functional ITS copies, an analysis excluding potential pseudogene sequences was conducted. The strict consensus of this analysis does not differ in topology (minus the pseudogene sequences) from Fig. 3, except in the placement of the L. leucocephala ssp. glabrata-4b sequence, which is transferred from Clade 1 to an unresolved position relative to the three major clades. Exclusion of pseudogenes from the ITS analysis does provide higher bootstrap support for the major clades (data also shown on Fig. 3) although this could be affected by the reduced number of terminals in the analysis.
We believe that inclusion of all available relevant information should provide the most complete understanding of gene diversification and that this is essential for inferring accurate species phylogenies. Inclusion of pseudogenes is potentially even more critical and useful when trying to unravel reticulate relationships among hybrid/allopolyploid taxa where duplication of function may lead to pseudogene formation. This is borne out by the current analysis of pseudogene sequences. First, inclusion of pseudogene sequences revealed the greater extent of ITS polymorphism providing additional insights into polyploid origins. Second, the resolution provided among accessions of L. leucocephala and L. pulverulenta in pseudogene Clade A is greater than that revealed by functional copies and provides possible evidence of multiple origins of tetraploid L. leucocephala.
Implications for polyploid species origins
The ITS data set provides significant new insights into the origins of the five tetraploid species of Leucaena (summarized in Fig. 5), particularly when viewed alongside the simultaneous analysis of diploid species (Fig. 1) and the maternally inherited cpDNA gene tree (Fig. 2). Each polyploid Leucaena species had at least one sequence whose placement on the ITS gene tree was congruent with the cpDNA gene tree. Additional ITS variation, in som