Abstract: The variability in sorghum germplasm is an invaluable input for sustaining and improving sorghum productivity. A wide range of variability in phenotypic traits exists among landraces in Uganda. However, the diversity of the germplasm at the molecular level is not described and therefore not known which hinders its use in modern plant improvement programs. This study was therefore undertaken to classify 241 sorghum accessions collected from different agro-ecological regions based on genetic distances estimated using 21 Simple Sequence Repeat (SSR) markers. The SSR primers were highly polymorphic with average Polymorphic Information Content (PIC) of 0.65 ranging from 0.09-0.89. A total of 205 alleles (9.8 alleles per locus) as well as a number of rare alleles were observed across all the accessions and this provides an opportunity for generation of a comprehensive fingerprint database. Gene diversity ranged from 0.09-0.90 with an average of 0.68. The average heterozygosity detected was 0.18 ranging from 0.00-92%. Analysis of molecular variation showed that variation was higher within races and agro-ecologies than among races and agro-ecological zones, respectively and this indicated the significance of gene flow. Cluster analysis delineated the accessions into two distinct clusters each with seven sub-clusters mainly according to agro-ecological zone. Clusters IA and IB had the most distinct accessions and these could be utilized in pre-breeding programmes aimed at overcoming yield barriers. The results confirm the ability of SSR markers to discern variability and also serve as guide for germplasm collection and conservation strategies.
INTRODUCTION
Sorghum (Sorghum bicolor [L.] Moench) evolved and was domesticated in the region of Eastern Africa (Purseglove, 1988; Reddy et al., 2004; Dillon et al., 2007). This region, especially Ethiopia and Sudan, contains a tremendous amount of genetic diversity for sorghum (Doggett, 1988; Purseglove, 1988; Mukuru, 1993; Grenier et al., 2004). Sorghum is rated as the third most important cereal in Uganda after maize and finger millet (Ebiyau et al., 2005). The crop is grown mainly in the south western highland and in the lowland areas of East and Northern regions of Uganda. In the lowland areas which experience drought conditions, sorghum is the crop of choice because of its ability to tolerate drought and salt toxicity consequently it is known as a food security crop (Iqbal et al., 2000; Naeem et al., 2002; Ebiyau et al., 2005). Yield of the crop is very low estimated at 650 kg ha-1, yet up to 3000 kg ha-1 are attainable and is largely grown as a subsistence food crop (DeVries and Toenninssen, 2001). The sorghum grain is consumed mainly as a thick or thin porridge, or processed into traditional beer (Mukuru, 1993; Ebiyau and Oryokot, 2001; Grenier et al., 2004). Recently a newly released variety (Epuripur) with excellent brewing qualities has been commercialized for production of beer in the country. The sorghum grains have high level of proteins and energy for synthesis of infant foods and ice-cream cones as well as making of animal and poultry feeds in the feed industry (Ebiyau et al., 2005; Bharadwaj et al., 2011; Kigozi et al., 2011). The sorghum present in Uganda possess variable morphological and agronomic traits such as maturity, plant height, plant pigmentation, mid rib colour, panicle length and width, panicle compactness and shape, glume colour, grain colour size and weight (Mukuru, 1993). However, the level of sorghum diversity in the country is not described, poorly understood at the molecular level and therefore under-utilized in modern sorghum improvement largely because of the difficult of identifying useful genetic variants hidden in the background of low yielding local varieties or lines (Tanksley and McCouch, 1997). Landraces can serve as sources of new desirable traits to enhance performance of germplasm under abiotic stresses such as drought, low soil fertility and acid soils (Beck et al., 1997). The improvement of crop genetic resources is dependent on continuous infusions of wild relatives, traditional varieties and the use of modern breeding techniques. These processes all require an assessment of diversity at some level, to select highly productive varieties (Mondini et al., 2009). Diversity is important for a breeding program since it directly affects the potential for genetic gain through selection (Hasanuzzaman et al., 2002; Kotal et al., 2010). It also allows the plant breeder to make a classification of germplasm into heterotic groups to maximize heterosis (Menz et al., 2004). Genetic diversity within and between populations is routinely assessed using morphological, biochemical and molecular techniques. Though morphological characterization has been traditionally used to assess genetic variation, the genetic information provided by morphological characters is often limited and expression of quantitative traits is subjected to strong environmental influence (Rao, 2004; Mondini et al., 2009; De Vicente and Fulton, 2003). Biochemical methods based on seed protein and enzyme electrophoresis have been useful in analysis of genetic diversity as they reveal differences between seed storage proteins or enzymes encoded by different alleles at one (allozymes) or more gene loci (isozymes). Use of biochemical methods eliminates the environmental influence, however, their usefulness is limited due to their inability to detect low levels of variation, only a limited number of enzymes are available and thus, the resolution of diversity is limited (Rao, 2004; De Vicente and Fulton, 2003). Molecular marker techniques comprising of various DNA markers have been employed in analysis of variation. these molecular techniques are based either on restriction hybridization of nucleic acids or techniques based on the Polymerase Chain Reaction (PCR) or both and work by highlighting the differences or polymorphisms within a nucleic sequence between different individuals. Molecular markers offer numerous advantages over conventional phenotypic based alternatives as they are not affected by environmental factors, applicable to any part of the genome, do not possess pleiotropic or epistatic effects and are able to distinguish polymorphisms which do not produce phenotypic variation. Many DNA based marker techniques such as Restriction Fragment Length Polymorphisms (RFLP), Random Amplified Polymorphic DNA (RAPD), Simple Sequence Repeats (SSR) and Amplified Fragment Length Polymorphism (AFLP) have been used in ecological, evolutionary, taxonomical, phylogenetic and genetic studies of plant sciences (Ayad et al., 1997; De Vicente and Fulton, 2003; Karp et al., 1997; Rao, 2004; Mondini et al., 2009). Simple Sequence Repeats (SSRs) or microsatellites are highly versatile genetic markers because of their co-dominant inheritance, high abundance, enormous extent of allelic diversity, ease of assessing SSR size variation through PCR with pairs of flanking primers and high reproducibility (Mondini et al., 2009; De Vicente and Fulton, 2003; Agarwal et al., 2008; Karp et al., 1996). To gain an insight into the extent of sorghum variability in Uganda, a molecular study of 241 sorghum accessions from the eastern, western and northern parts of the country was undertaken based on genetic distances using a panel of SSR markers.
MATERIALS AND METHODS
A total of 241 sorghum accessions consisting of 236 landraces and 5 released varieties were collected from farmers fields in the eastern, northern and western parts of the country during January and February 2007 (Table 1). In each field two heads for the phenotypically different sorghum genotypes were collected from the farmers fields.
In 2008, DNA was isolated from shoots and roots of 4 day old sorghum seedlings grown in an incubator (Heraus CO2-auto-zero incubator at 30°C) using a modified Celytrimethylammonium Bromide (CTAB) protocol (Mace et al., 2004). Three to four sorghum seedlings from each accession were sampled and used for DNA extraction. The roots and shoots were crushed together with CTAB buffer using a Geno/Grinder 2000, SP2100-115 at 1X rate for 20 min. DNA quality was assessed by running 1 μL of the DNA samples on a 0.8% agarose gel whereas DNA quantification was done using the nanodrop spectrophotometer. DNA samples were diluted to 10 ng μL-1 and used to perform polymerase chain reaction (Fig. 1, 2). Twenty one Simple Sequence Repeats (SSRs) markers (Table 2) were used for the amplification of the DNA samples.
To carry out amplification, a 5 μL PCR mix consisting of 10 ng of DNA, 1X reaction buffer, 1.5 mM MgCl2, 0.1 mM dNTPs, 0.2 U Taq polymerase, 0.2 pmols of forward and reverse primers and 2.23 μL of sterile water was amplified in and amplified in GeneAmp PCR system 9700 (Applied Biosystems, Foster City, Ca, USA). The touch-down PCR cycle consisted of one 15 min denaturation followed by 10 cycles of 94°C for 30 sec, 61°C for 45 sec and 72°C for 60 sec, then by 30 cycles of 94°C for 30 sec, 54°C for 45 sec and 72°C for 60 sec, a final extension of 20 min at 72°C was included.
Table 1: | List of germplasm accessions along with their biological status, geographical region and agro-ecological zone |
WMAF and S: Western mid altitude farmlands and the Semuliki flats, CWS: Central wooded savannah, NWF-WS: Northwestern farmlands and wooded savannah, WNF: West Nile farmlands, NMF: Northern moist farmlands, NECG-BF: Northeastern central grassland and bush farmland | |
Amplification products were resolved on ABI 3730 DNA analyzer system (Applied Biosystems, Foster City, Ca, USA). This method is cost effective, precise and accurate system of analysis involving many primers and genotypes. After amplification, the PCR products were first separated on a 2% agarose gel prestained with GelRed nucleic acid stain and visualised under UV light for verification of amplification and to determine the amount of the product to be co-loaded for fragment analysis (Fig. 3). Products from any three primers with different dye labels were pooled into groups (co-load groups) based on their respective agarose band strength and resolution capacity of the dye labels.
Table 2: | List of the primer combinations used in the study |
Fig. 1: | Undiluted DNA quality and yield from sorghum seedlings |
Fig. 2: | DNA dilution to 10 ng μL-1 for PCR |
Fig. 3: | PCR products for the 21 optimized markers |
Fig. 4(a-b): | Electropherogram generated by GeneMapper showing (a) Homozygous and (b) Heterozygous status |
One microliter from each co-load group was added to 8 μL of solution containing 0.108 μL of GSLIZ500 internal size standard to 8 μL HiDi (Applied Biosystems, Foster City, Ca, USA) and centrifuged at 1,000 RCF( Relative Centrigugal Force) for thorough solution mixing. Centrifuged samples were denatured in a thermocycler for 3 min at 94°C and immediately placed on ice for 3 min before capillary electrophoresis. Fragment analysis was performed on ABI3730 (Applied Biosystems, Foster City, Ca, USA) and allele sizing and detection of homozygotes and heterozygotes was done with Gene Mapper software 4.0 (Applied Biosystems, Foster City, Ca, USA) (Fig. 4). Data was then exported to Microsoft Excel for sorting before analysis.
Polymorphic Information Content (PIC) values for each primer set were calculated using the algorithm described by Smith et al. (1997) as:
where, f2i is the frequency of the ith allele. PIC provides an estimate of the discriminatory power of a locus by taking into account, not only the number of alleles that are expressed but also the relative frequencies of those alleles. PIC values range from 0 (monomorphic) to 1 (very highly discriminative, with many alleles in equal frequencies).
Statistical analysis of allelic data was performed using PowerMaker Ver. 3.25 to obtain total number of alleles per marker, allele size range, abundant and rare alleles, gene diversity, heterozygosity and PIC. Tree construction was performed following Neighbour Joining method using the dissimilarity indices in DARwin software 5.0.158 (Perrier and Jacquemoud-Collet, 2006). Bootstrap analysis using 1000 bootstrap values was performed for node construction.
RESULTS AND DISCUSSION
In this study, the 21 SSR primers located on different chromosomes were highly polymorphic (Table 3) and therefore provided a powerful assay for discriminating genetic diversity among sorghum accessions. This is in agreement with the findings of Agrama and Tuinstra (2003). The higher level of polymorphism associated with SSR markers may be a function of the unique replication slippage mechanism responsible for generating SSR allelic diversity (Pejic et al., 1998). For the 21 SSR markers, 205 putative alleles were observed ranging from 2- 23 with an average of 9.8 alleles per locus. Based on the allele frequencies, the PIC (polymorphism information content) values were estimated for different SSR loci analysed. The Polymorphic Information Content (PIC) over the 21 SSR markers ranged from 0.09-.89 with an average of 0.65. This further confirms the fact that SSR markers are more informative and detect more alleles than RAPDs markers (Random Amplified Fragment Polymorphisms) compared to the findings of (Pu et al., 2009).
Similar high PIC values have been reported for chickpea microsatellite analysis by Udupa et al. (1999), Upadhyaya et al. (2008) and Bharadwaj et al. (2011) and attributed this to polymorphism of TAA motif. The PIC of an SSR marker provides an estimate of the discriminatory power of that SSR marker and thus their usefulness in genetic analysis (Smith et al., 1997; Abu Assar et al., 2005; Bharadwaj et al., 2011). The mean PIC value of the SSR markers in this study was in the range reported by (Smith et al., 1997). They used acrylamide gels for allele detection in their study. However, capillary electrophoresis (used in this study) gives better resolution which can distinguish alleles with up to two base pair differences. The findings of this study confirm the effectiveness of capillary electrophoresis in allele detection. Di- and tri-repeat containing markers gave higher PIC values compared to penta- and hexa-repeat markers.
Table 3: | Diversity indicators revealed by 21 SSR primer combinations used in the study |
H: Heterozygosity, GD: Gene diversity, PIC: Polymorphic Information Content, aUnpublished, CIRAD, bTaramino et al. (1997), cSchloss et al. (2002), dKong et al. (2000), eBhattramakki et al. (2000), fUnpublished-ICRISAT |
Fig. 5: | Variation of PIC with number of markers |
Similar results have been reported by Smith et al. (1997) for di-repeats, however, such repeats are associated with stutter bands that complicate accurate genotyping. The use of Polymorphic Information Content (PIC) to evaluate the number of SSR markers needed to provide sufficient information on allele diversity in a given dataset (Fregene et al., 2003) was applied in this study. The curve (Fig. 5) revealed that little or no increase in PIC is obtainable with more than ten markers as earlier reported by Folkertsma et al. (2005).
Average gene diversity observed was high and this could be attributed to the fact that sorghum is predominantly inbreeding but the gene pool as a whole maintains a high level of allelic variation (Folkertsma et al., 2005). The number of alleles per locus observed ranged from 2-23 and may be as a result of different locus specific mutation rates (Estoup et al., 2002) and this reflects strong differences in allelic diversity between SSR loci which affects estimating genetic diversity since the diversity index according to Nei (1973), depends both on the number of alleles per locus and the respective allele frequency (McCouch et al., 1997).
Besides locus specific mutation rates, the number of alleles per locus and gene diversity can be affected by size homoplasy which occurs when different copies of a locus are identical in state, although they are not identical by descent (Estoup et al., 2002). The high variability associated with di-nucleotide repeat containing SSRs is as a result of higher mutation rates among such repeats (Casa et al., 2005).
MsbCIR and XTXP SSRs usually had more alleles than XCUP loci and this could probably be due to differences in the SSR origins (Casa et al., 2005). XTXP markers are reported to be extracted from either small-insert genomic libraries (Brown et al., 1996; Kong et al., 2000) or bacterial artificial chromosome end sequences (Bhattramakki et al., 2000). These loci therefore were more likely to include non coding regions than the XCUP SSRs that are mainly developed from low-copy RFLP probe sequences located primarily near or in genes (Schloss et al., 2002). Rare SSR alleles were observed (data not shown) and this provides a great opportunity for generation of a comprehensive fingerprint database (Bharadwaj et al., 2011).
Analysis of molecular variance (AMOVA) indicated that within races variation was higher than among races. The p-value was significant indicating that the level of differentiation was significant. The Fixation index (FST) was 0.036 signifying little genetic differentiation on the basis of races (Table 4).
Fig. 6: | NJ phylogenetic tree using SSR markers in 241 sorghum accessions. Bootstrap values (≥40) are indicated at the node of each cluster |
Table 4: | Analysis of molecular variation (AMOVA) for races |
FST(0.036); p-value: 0.00000+-0.00000 |
Table 5: | Analysis of molecular variation (AMOVA) for agro-ecologies |
FST (0.053), p-value = 0.00000+-0.00000 |
Similarly analysis of molecular variance for agro-ecologies indicated that within agro-ecologies variation was higher than among agro-ecologies. The results generally reveal that variation is higher from one accession to another and lower from one region to another. This could be due to adaptation of varieties or accessions to their respective agro-ecological conditions aided by the utility value to the farmer as suggested by Mujaju and Chakauya (2008). This is an important guide to the development of conservation strategies as well as the best place to preserve accessions whether in situ or on farm (Mujaju and Chakauya, 2008). The high similarity between agro-ecological zones could also be due to the role of Non Governmental Organizations (NGOs) in seed distribution. Almost all the agro-ecological zones sampled were either directly or indirectly affected by the rebel movement of the Lords Resistance Army and as such had many NGOs distributing aid including seeds to the Internally Displaced People (IDPs) as well as the returnees. Similar observations were made in Zimbabwe by Mafa (1999) and Chakauya et al. (2006). The p-value was significant indicating that the level of differentiation on the basis of agro-ecologies was significant. The FST index was 0.053 which shows moderate genetic differentiation (Table 5). This level of diversity could be caused either by the occurrence of frequent gene flow among regions as a consequence of seed exchanges among farmers, or by a restriction of the intensity of genetic drift due to a high effective population size or the materials had a common heritage (Dje et al., 1999). Gene flow/pollen flow could also be explained by the fact that farmers grow different accessions mixed in the same field and may not have a spatial strategy that would limit pollen flow among accessions (Barnaud et al., 2008).
The genetic dissimilarity matrix was analysed using neighbor joining clustering algorithm by DARwin 5.0.158 software (Fig. 6). The radial tree representation clearly delineated the accessions into two distinct clusters according to agro-ecological zones (Table 6).
Table 6: | Cluster analysis based on Darwins grouping of the 241 sorghum accessions |
Accessions in cluster I were mainly from the northern moist farmland and central wood savannah agro-ecological zones while accessions in cluster II were mainly from Northeastern central grassland bush farmland zone. Both arms in the radial tree between the two sub-clusters are quite diverse indicating variability at molecular levels between accessions from northern moist farmland and central wood savannah and those from Northeastern central grassland bush farmland zone. In cluster I, there are seven sub-clusters with sub-cluster IA appearing more distinct though at closer levels with sub-cluster IB. Accessions in cluster I were distinct from the rest of accessions with sub-clusters IA and IB being the most distinct and this offers opportunities which could be exploited in pre-breeding programmes. Sub-cluster IC-1 comprised of accessions from the northern moist farmland and Northeastern central grassland bush farmland agro-ecological zones while IC-2 comprised of accessions from central wooded savannah and western mid altitude farmlands and the Semuliki flats zones. The occurrence of distinct groups of sorghum accessions as revealed by SSR marker analysis can be utilized effectively in pre-breeding efforts to overcome yield barriers (Nass and Paterniani, 2000; Bharadwaj et al., 2011). Great genetic gains can be obtained if these accessions are incorporated in the breeding programme. The contribution of this diversity to farmers livelihoods needs to be investigated for the different agro-ecological zones. This study also highlights the ability of SSR markers to discern genetic variation.
ACKNOWLEDGMENTS
We thank Mwathi Margaret of International Crops Research Institute for the Semi-arid Tropics (ICRISAT) in Nairobi, Sarah Seruwagi of Makerere University Biotechnology Laboratory and Moses Biruma of National Agricultural Research Organization (NARO) for the technical assistance. This study was funded by the East African Regional Network for Biotechnology, Biosafety and Biotechnology Policy Development programme (BIO-EARN).