|
|
||||||||
Genetics and Molecular Biology |
Department of Biology, Virginia Polytechnic Institute and State University, Blacksburg, Virginia 24061 USA
Received for publication March 23, 2001. Accepted for publication August 16, 2001.
| ABSTRACT |
|---|
|
|
|---|
Key Words: molecular evolution Oryza Oryza barthii Oryza glaberrima Poaceae prolamin genes protein
| INTRODUCTION |
|---|
|
|
|---|
100 kDa (kilodaltons). Hilu and Esen (1988)
The molecular diversity and patterns of evolution of the 1016-kDa gene families are not well studied in the oryzoid-bambusoid grasses in spite of their possible ancestral status due to the basal position of the species in which they occur. Published work on the structure of the 13- and 16-kDa prolamin genes is limited to the Asian rice species O. sativa L. and the 10-kDa genes to O. sativa, its proposed ancestor O. rufipogon Griff., and related O. longistaminata Chev. et Roehr. The 1016-kDa genes are simpler in structure than most of the other prolamin gene families and encode low molecular mass (LMM) prolamin. Due to their small size and ancestral status, accumulation of sequence data on genes encoding the LMM prolamins could provide valuable information on the evolution of other prolamin multigene families, the phylogeny of the groups possessing them, and the controversial issue of early evolution in the grass family.
Oryza includes
22 species distributed in Asia, Australia, Africa, and the Americas (Vaughan, 1994
). Two of these species are the domesticated Asian rice (O. sativa) and African rice (O. glaberrima). The former is globally the second most important crop. Both rice species are diploids (x = 12) and are considered to possess the AA genome (Tsunoda and Takahashi, 1984
). It has been proposed that the Asian rice evolved from O. rufipogon and the African rice from O. barthii, two wild species that are widely distributed in Asia and Africa, respectively (Vaughan, 1994
). The first two represent the sativa complex while the latter two the glaberrima complex (Cordess, Second, and Delsney, 1990
). The evolutionary relationship between the two groups has been disputed. Nayar (1973)
proposed a recent (10th century AD) and direct origin of O. glaberrima from O. sativa plants introduced to Africa. Chang (1976)
indicated a common origin for the Asian and African rice groups from different populations of the O. perennis complex (to which O. longistaminata belongs) in Asia and Africa. In contrast, Cordess, Second, and Delsney (1990)
proposed a distinct evolution for the two complexes, where the Asian complex evolved in Africa from a common ancestor with the African species O. longistaminata, followed by dispersal to Asia, while the African glaberrima complex had an independent evolution sharing a "more ancient ancestor" with the other complex. The data generated in this study will be used to shed some light on the evolution of the sativa and glaberrima groups.
This study explores the degree and patterns of substitution in the 10-kDa and 16-kDa prolamin genes in closely related species as a first step in assessing their utility in evolutionary studies and in providing an insight into the evolution of these multigene families. Consequently, domesticated African rice Oryza glaberrima Steud. and related wild species O. barthii A. Chev. were examined and contrasted with available sequences for the genus.
| MATERIALS AND METHODS |
|---|
|
|
|---|
DNA isolation and PCR amplification
Whole genomic DNA was isolated from fresh leaves following Hilu (1994)
and used as templates in the polymerase chain reaction (PCR). The design of the primers used for the amplification of the 10-kDa prolamin genes was based on the cDNA sequence of Barbier and Ishihama (1990)
to complement two relatively conserved sectors flanking most of the coding region. The primers match the 5' region leader sequence and a unique sequence at the 3' end. Five primers were designed for the amplification of the 16-kDa prolamin genes (Fig. 1) because our alignment of available sequences show the genes fall into two groups that differ by an internal 15-bp insertion and a number of mutations at the 3' end. The sequences, relative position, and sources of the primers are noted in Fig. 1. Primers OSa and Osb amplify the whole coding region of the "rice major" prolamin genes (Kim and Okita, 1988a
; GenBank number X60979) as they complement sequences close to the 5' and 3' ends. To amplify the whole coding region of the 16-kDa long form (form containing the 15-bp insertion), the OSa primer designed for the 5' end of rice major was used because of the conserved nature of the region in both forms. However, a new primer, OSc (Fig. 1), was designed based on the RICPLOL17A clone (Kim and Okita, 1988a
; GenBank number AA751313). For specific amplification of the long form, two internal primers, OSdF and OSdR (Fig. 1), were designed to complement the 15-bp insertion at positions 232246 unique to this form of the 16-kDa genes. The latter two primers were designed to amplify in different directions when used in combination with the primers designed for the 5' and 3' regions (primers Osa and OSc). BamHI and EcoRI sites were added to all primers to improve efficiency of DNA cloning. DNA synthesis was carried out using 2.5 units of Taq DNA polymerase, 0.2 µg total DNA, 10 pmol/L of each primer, and 2.5 mmol/L MgCl2 in a final volume of 50 µL. Thirty cycles of amplification were carried out in a PTC-100 thermal cycler (MJ Reserach, Watertown, Massachusetts, USA) under the conditions of 1 min denaturation at 95°C, 1 min annealing at 45°C, and 3 min DNA synthesis at 72°C.
|
32P-ATP following the procedure recommended for T4 kinase (Promega, Madison, Wisconsin, USA) and used as a hybridization probe. For the estimation of gene copy number, EcoRI and HindIII genomic DNA digests for the two species were used in a Southern blot following the procedure cited above. The 10-kDa and 16-kDa rice prolamin cDNA inserts were labeled with 32P using the Random Primers DNA Labeling System (Promega, Madison, Wisconsin, USA) and used as hybridization probes. The membranes were hybridized with the radioactive probe overnight at 65°C in 3x SSC (0.15 mol/L NaCl, 0.015 mol/L sodium citrate), 20 mmol/L phosphate buffer pH 7.0, 7% SDS, 10x Denhardt's solution, and 100 µg/mL salmon sperm DNA. Membranes were washed twice in 2x SSC for 15 min each, once in each 1x and 0.5x SSC for 10 min, and exposed to Kodak XAR-5 films.
Cloning, nucleotide sequencing, and data analysis
Prolamin-positive PCR products (verified by the Southern blot assay) were isolated by low-melt agarose gel purification and phenol-chloroform extraction method of Appels and Lagudah (unpublished data) as described in Hilu and Stalker (1995)
. The purified inserts were cloned into BamHI/EcoRI site of pUC18R vector for further mapping and sequencing. Search for clones with inserts was done using the blue-white colony screening method (Sambrook, Fritsch, and Maniatis, 1989
). Positive clones were sequenced by ABI Prism TM 377 Automated DNA Sequencer using Taq polymerase, DyeDeoxy TM terminator cycle sequencing method (PE Biosystems, Foster City, California) with pUC primers. Alignment of sequences, generation of statistical data, and deduction of amino acid composition were calculated by using various options of DNAStar computer program (DNASTAR, Madison, Wisconsin, USA). Phylogenetic analysis for the 10-kDa sequences rooted with Phyllostachys aurea sequence (Hilu and Sharova, 1998
) and the 16-kDa sequences of the glaberrima group rooted with rice major 16 were computed with Fitch parsimony (MP) as implemented in PAUP*4.0b5 (Swofford, 2001
) using heuristic searches consisting of 1000 replicates of random stepwise addition of taxa with MULPARS on and tree-bisection-reconnection (TBR) branch swapping. To take into consideration the different patterns of evolution in the 16-kDa genes, the data were also analyzed with Maximum Likelihood (ML) using the HKY85 model (Hasegawa, Kishino, and Yano, 1985
) and assuming empirical base frequencies, 1.26 transition/transversion ratio (estimated from MP trees), 0.45 proportion of invariable site, and equal rates for all sites. Heuristic search settings were similar to the MP analysis.
| RESULTS |
|---|
|
|
|---|
|
Two long and one short 10-kDa sequence were generated for O. barthii (Fig. 2). The two long sequences (Ob10-8-10-4 and Ob10-8-10-9) were identical, while the short sequence differed from the long sequences in the presence of three mutations. The DNA sequences were translated into 132 amino acids for the long sequences and 93 for the short one. The amino acid composition of these genes clearly reflects their sulfur-rich structure (Fig. 4). The three mutations in the short sequence of O. barthii did not interrupt the reading frame, but two of them were nonsynonymous, resulting in mutations at the amino acid level from glutamine to arginine and leucine to histidine. The four clones sequenced for O. glaberrima represent two long and two short sequences (Fig. 2). Three of the sequences of O. glaberrima (one long and two short) were 99100% similar to each other at the nucleotide and amino acid levels. One of these short sequences was identical to the long form, while the other differed from them by two nucleotide substitutions that resulted in synonymous mutations at the amino acid levels. These three sequences of O. glaberrima were quite similar (98.699.7%) to the 10-kDa genes sequenced from O. barthii. The fourth (long) sequence (clone Og10-8-10-8 of O. glaberrima), however, was relatively divergent, showing 92.593.5% similarity to the other three sequences of O. glaberrima. This distinct sequence also had a unique TGA insertion that corresponded to one we found in the GenBank sequence of O. longistaminata (Fig. 2). The deduced amino acid sequence of this distinct form was 90% similar to the long sequence and its identical form of the short sequence, only 84% similar to the second short sequence, and 8489% similar to the 10-kDa prolamins of O. barthii. Phylogenetic analysis of the 10-kDa sequences resulted in an almost complete polytomy due to the extremely low sequence variation, and, thus, it is not discussed further.
|
T produces TAG stop codons in both positions). The MP analysis of the 16-kDa sequences resulted in six most parsimonious trees that are 109 steps long and have 0.91 and 0.89 consistency and retention indices, respectively. Of the overall aligned 486-bp sequences, 89 (0.18%) were variable, and 48 (54%) of those are parsimony informative. The consensus tree is shown in Fig. 5. The ML resulted in a single tree (not shown) similar in topology to one of the MP trees and differs from the consensus in resolving all the O. glaberrima clones in a single lineage.
|
|
To demonstrate whether or not O. glaberrima and O. barthii have the long form of the 16-kDa prolamins reported for O. sativa, PCR amplification was carried out on genomic DNA using primer combinations OSa-OSdR and OSdF-OSc specific for this form. A single band of
250 bp was amplified for each of the OSa-OSdR and OsdF-OSc primer combinations. The
250 bp products represent the 5' and 3' sections of the gene. The oligonucleotide probe specific to the insertion unique to the long form hybridized strongly to the two products, confirming the presence of the long 16-kDa form in O. barthii and O. glaberrima. Figure 6 shows the amplification and Southern blot results for the 5' section of the coding region.
|
| DISCUSSION |
|---|
|
|
|---|
8389% similarity. Our alignment of the 10-kDa GenBank sequences of rice, O. sativa and its proposed progenitor O. rufipogon, resulted in 100% identity. These sequence similarity values can be contrasted with the relatively lower similarity values (
8796%) calculated between O. longistaminata and the other four species. The two rice crops, thus, appear to have 10-kDa gene sequences almost identical to their proposed respective wild progenitors. These data present further evidence in support of the proposed direct evolution of the Asian rice from O. rufipogon and the African rice from O. barthii (Oka, 1974
Compared to available sequences for 10-kDa prolamin genes in Oryza, the sequence obtained for the O. glaberrima clone Og10-8-10-8 appeared distinct, differing from the other sequences obtained for this species by 14 point mutations and 11 amino acid substitutions. The clone had an in-codon TGA insertion that was translated to an additional methionine and changed the next amino acid from lysine to asparagine. Thus, clone Og10-8-10-8 of O. glaberrima represents a distinct form of the 10-kDa prolamin gene family. The distribution and potential evolutionary implication of this form in oryzoid grasses need to be examined.
In contrast with the 10-kDa, the 16-kDa gene sequences were considerably more variable (Table 1; Fig. 3). The sequence variability was higher among the O. glaberrima as they form a polytomy at the base of the MP tree (Fig. 5). Although the ML resolved the O. glaberrima in one lineage, the branch leading to it was short. In contrast, the O. barthii sequences appeared in a clade supported by 80% bootstrap (Fig. 5). The 16-kDa sequences display unique mutations characteristic of an individual sequence or a group of sequences (Fig. 3). For instance, clones Ob16-9-4-8 and Obarh16-9-4-3 were 99% similar and appeared in a clade supported by 100% bootstrap (Table 1, Fig. 5) and possess unique mutations that are not found in the other clones of O. barthii, O. glaberrima, or O. sativa (Fig. 3). Clone Og16-9-4-4 has a number of indels at the 3' end (starting at position 370; Fig. 3), changing the reading frame and making the coding region much shorter than the other 16-kDa genes characterized in Oryza. This gene is distinct and produces a truncated protein. Therefore, it appears that several forms (loci) of the 16-kDa gene family are present.
|
Comparative sequence analysis among 10-kDa and 16-kDa genes in Oryza
To understand the patterns of nucleotide substitutions and their impact on amino acid composition, sequences generated here and others obtained from GenBank and other published works (Kim and Okita, 1988a, b
; Masumura et al., 1989
; Barbier and Ishihama, 1990
; Feng et al., 1990
) were aligned (Figs. 23). Among the 10-kDa sequences, clone Og10-8-10-8 of O. glaberrima remains quite distinct from the others by 13 point mutations and the three-nucleotide insertion (Fig. 2). The insertion was shared only with another African species, O. longistaminata, presenting a synapomorphy of potential geographic and taxonomic significance. The remaining clones were either identical to one another or differing in 12 mutations (Fig. 2). These mutations were nonsynonymous for the most part, resulting in seven amino acid substitutions and pointing to relatively relaxed selectional constraints at the protein level. In fact, the fluctuation in amino acid composition was, by far, more pronounced in the 10-kDa than the 16-kDa genes (Fig. 4). The overall low number of nucleotide substitutions observed in the different forms of the 10-kDa genes underscores the conserved nature of this gene family and points to their potential utility in understanding the phylogenetic relationships above the species level.
Compared with the 10-kDa genes, the pattern of variation in the 16-kDa gene is more complex. A comparative alignment of the 16-kDa sequences from O. glaberrima, O. barthii, and O. sativa showed a range of similarities from 76.0 to 100% at the nucleotide level and 74.5100% at the amino acid level (Table 1). The two sequences chosen to represent the diversity of the 16 kDa genes in O. sativa, RICPROL17A sequence and rice major prolamin (Kim and Okita, 1988a
), were different from each other. The RICPROL17A sequence is 18 bp and six amino acids longer than the rice major prolamin (Fig. 3), contains an insertion at positons 232246, and has two indels at positions 250251 and 265. These indels represent a frame shift followed soon by a restoration. A closer look at the carboxy termini of the RICPROL17A and the rice major form showed that they are different in structure, while the amino termini were identical. This hypothesis was confirmed when primer combinations OSa/OSb and OSa/OSdR produced high PCR yield while OSdF/OSb failed. Thus, the two represent different families of 16-kDa prolamin genes. We will refer to the RICPROL17A as the "16-kDa long form." In spite of the presence of mutational differences among the two 16-kDa sequences of O. sativa, they differ from the glaberrima groups (O. glaberrima and O. barthii) by two major indels (positions 362367, 448486) and several synapomorphic mutations (Fig. 3).
The O. barthii and O. glaberrima 16 kDa displayed considerably higher sequence similarities to the 16-kDa rice major (8494%) than to the RICPROL17A clone (72%), revealing the same deletion found in the former. The presence of the long form of the 16-kDa prolamin in O. barthii and O. glaberrima was evident when primer combinations OSa/OSc, OSa/OSdR, and OSdF/OSc produced high PCR yield. This was further confirmed when a Southern blot of the PCR products was probed with a 5' end labeled oligonucleotides specific to the unique region found in the GenBank sequences. The probe hybridized strongly with the PCR-amplified products obtained from using the primers specific to the long form (Fig. 6).
Variation in the 10-kDa and 16-kDa genes also corresponds to variation in amino acid composition (Fig. 4). The most variable amino acids were alanine, cystine, glycine, isoleucine, valine, serine, and threonine. Although variation in amino acid frequencies was noticeable in both gene families and the 16-kDa genes resulted in overall higher rates of nonsynonymous substitutions, the magnitude of variation in the 10-kDa was quite pronounced in alanine, isoleucine, serine, and threonine (Fig. 4). It is worth noting that the latter two amino acids and the variable tyrosine are involved in protein phosphorylation. In spite of this variation, the overall amino acid profiles were prolamin specific (Fig. 3), i.e., rich in hydrophobic and uncharged and poor in charged amino acids.
The overall intragenomic homogenization of these multigene families can be achieved through concerted evolution (Zimmer et al., 1980
) operating through molecular-drive mechanisms such as biased gene conversion, unequal crossing over, among others (reviewed in Elder and Turner, 1995
). However, the presence of a large copy number of those genes per genome may allow for a certain number of mutations that are not functionally deleterious to be fixed and also for the presence of pseudogenes at relatively low frequencies. Such diversity within a gene family underscores the need to differentiate between orthologous and paralogous genes (Sanderson and Doyle, 1992
; Buckler, Ippolito, and Holtsford, 1997
), a prerequisite for segregating species trees from gene trees in molecular evolutionary studies based on multigene families. Due to the high sequence similarities among most of the 16-kDa clones (excluding rice 16), the emergence of O. barthii and O. glaberrima in distinct clades does not appear to be based on paralogous loci.
The two multigene families encoding the 10- and 16-kDa prolamin appear to evolve at different rates. The 10-kDa genes are conserved at the intrageneric level but display an appreciable degree of divergence among genera (Hilu and Sharova, 1998
). The large number of substitutions (26%) and length mutations underscores the potential application of the 16-kDa gene family in assessing variation and resolving evolutionary patterns at the population level. The ML as well as MP analyses underscore the utility of these genes among closely related species (Fig. 5). These valuable attributes render the prolamin genes as potentially useful molecules in evolutionary studies at different taxonomic levels. However, discrimination among families of divergent paralogs is necessary to avoid inaccurate organismal phylogenies (Buckler, Ippolito, and Holtsford, 1997
). A notable example here is rice 16, which represent a distinct 16-kDa gene family and thus a paralog and hence it was excluded from the analysis. The maintenance of prolamin-specific amino acid profiles in both gene families in spite of an appreciable number of nonsynonymous mutations reflects a certain degree of selectional constraint operating on these genes. Consequently, a study tracing the subsequent evolution of these gene families in the Poaceae is quite conceivable and would have the potential to provide information on evolution of multigene families, a basic feature of plant and animal nuclear genomes.
| FOOTNOTES |
|---|
| LITERATURE CITED |
|---|
|
|
|---|
Barker N. P. H. P. Linder E. H. Harley 1995 Polyphyly of Arundinoideae (Poaceae): evidence from rbcL sequence data. Systematic Botany 20: 423-435[CrossRef][ISI]
Buckler E. S. A. Ippolito T. P. Holtsford 1997 The evolution of ribosomal DNA: divergent paralogues and phylogenetic implications. Genetics 145: 821-832[Abstract]
Chang T. T. 1976 The origin, evolution, cultivation, dissemination, and diversification of Asian and African rices. Euphytica 25: 425-441[CrossRef][ISI]
Clark L. G. W. Zhang J. F. Wendel 1995 A phylogeny of the grass family (Poaceae) based on ndhF sequence data. Systematic Botany 20: 436-460
Cordess F. G. Second M. Delsney 1990 Ribosomal gene spacer length variability in cultivated and wild rice species. Theoretical and Applied Genetics 79: 81-88[ISI]
Elder J. F. B. J. Turner 1995 Concerted evolution of repetitive DNA sequences in eukaryotes. Quarterly Review of Biology 70: 297-320[CrossRef][Medline]
Esen A. K. W. Hilu 1989 Immunological affinities among subfamilies of the Poaceae. American Journal of Botany 76: 196-203[CrossRef][ISI]
Feng G. L. Wen J. K. Huang B. S. Shorrosh S. Mythukrishnan G. R. Reeck 1990 Nucleotide sequence of a cloned rice genomic DNA fragment that encodes a 10-kDa prolamin polypeptide. Nucleic Acids Research 18: 683
Gojobori T. W.-H. LI D. Graur 1982 Patterns of nucleotide substitution in pseudogenes and functional genes. Journal of Molecular Evolution 18: 360-369[CrossRef][ISI][Medline]
Hasegawa M. H. Kishino T. Yano 1985 Dating of the human-ape splitting by a molecular clock of a mitochondrial DNA. Journal of Molecular Evolution 21: 160-174
Hilu K. W. 1994 Evidence from RAPD markers in the evolution of Echinochloa millets (Poaceae). Plant Systematics Evolution 189: 147-157
Hilu K. W. L. A. Alice H. Liang 1999 Phylogeny of Poaceae inferred from matK sequences. Annals of Missouri Botanical Garden 86: 835-851[CrossRef][ISI]
Hilu K. W. A. Esen 1988 Prolamin size diversity in the Poaceae. Biochemical Systematics Ecology 16: 457-465[CrossRef]
Hilu K. W. L. Sharova 1998 Cloning and characterization of two prolamin genes in the bamboo grass Phyllostachys aurea. Riv. American Journal of Botany 85: 1033-1037
Hilu K. W. H. T. Stalker 1995 Genetic relationship between peanuts and wild species of Arachis section Arachis: evidence from RAPD. Plant Systematics Evolution 198: 167-178[CrossRef]
Kim W. T. T. W. Okita 1988a Nucleotide and primary sequence of a major rice prolamin. FEBS Letters 231: 308-310[CrossRef][ISI][Medline]
Kim W. T. T. W. Okita 1988b Structure, expression, and heterogeneity of the rice seed prolamins. Plant Physiology 88: 649-655
Masumura T. T. Hibino K. Kidzu N. Mitsukawa K. Tanaka S. Fujii 1990 Cloning and characterization of a cDNA encoding a rice 13 kDa prolamin. Molecular General Genetics 221: 1-7
Masumura T. D. Shibata T. Hibino T. Kato K. Kawabe G. Takeba K. Tanka S. Fujii 1989 cDNA cloning of an mRNA encoding a sulfur-rich 10 kDa prolamin polypeptide in rice seeds. Plant Molecular Biology 12: 123-130
Nayar N. M. 1973 Origin and cytogenetics of rice. In E. W. Caspari [ed.], Advances in genetics, 153292. Academic Press, London, UK
Oka H. I. 1974 Experimental studies on the origin of cultivated rice. Genetics 78: 475-486
Reed K. C. D. A. mann 1985 Rapid alkaline transfer of DNA from agarose gels to nylon membranes. Nucleic Acids Research 13: 6207-7221
Sambrook J. E. F. Fritsch T. Maniatis 1989 Molecular cloning: a laboratory manual, 2nd ed. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York, USA
Sanderson M. J. J. J. Doyle 1992 Reconstruction of organismal and gene phylogenies from data on multigene families: concerted evolution, homoplasy, and confidence. Systematic Biology 41: 4-17[CrossRef][ISI]
Shewry P. R. J. A. Napier A. S. Tatham 1995 Seed storage proteins: Structures and biosynthesis. Plant Cell 7: 945-956[CrossRef][ISI][Medline]
Swofford D. L. 2001 PAUP*: phylogenetic analysis using parsimony, 4.0b5. Sinauer, Sunderland, Massachusetts, USA
Tsunoda S. N. Takahashi 1984 Biology of rice. Japan Scientific Society Press, Tokyo, Japan
Vaughan D. A. 1994 The wild relatives of rice. International Rice Research Institute, Los Banos, Philippines
Zimmer E. A. S. L. Martin S. M. Beverley Y. W. Kan A. C. Wilson 1980 Rapid duplication and loss of genes coding for the alpha chains of hemoglobin. Proceedings of the National Academy of Sciences, USA 77: 2158-2162
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |