Abstract: Background and Objective: With the advent of high-scale genotyping platforms, association studies have become important tools for finding genomic regions of interest in breeding programs, due to the fact that their improved more accuracy than the other tools. The aim of this work was to map genomic regions associated with grain maturation in common maize strains. Materials and Methods: For linkage disequilibrium mapping, 72 strains were previously genotyped for SNP markers on the 650K platform and their respective genotypic values were predicted for male and female flowering and area below the moisture curve. The analysis of association between the SNPs markers and the characters was performed using mixed linear model and stepwise multiple regression. Results: The significant associations detected for male and female flowering were found to be distributed in all chromosomes, with a higher concentration in genomic regions of chromosomes 1, 2, 3, 5, 9 and 10. For the area below the moisture curve, it was found a smaller number of significant associations, being concentrated in the chromosomes 1, 2, 3, 6, 9 and 10 and absent in chromosomes 4 and 8. By stepwise analysis, it obtained complete models that account for 79, 93 and 56% of the variation for the genotypic values, respectively, with the identification of genomic regions pre-dominantly on chromosomes 1 and 3. Conclusion: Thus, the detection of similar and distinct genomic regions for these traits, reveals the potential for the use of significant associations detected in chromosomes 1 and 3 to obtain the germplasm maturity required in breeding programs.
INTRODUCTION
The maize seed market has undergone major changes, mainly in the producing regions of the south of Brazil, where the planting seasons of summer and of the second crop have been anticipated1. In this context, improvement programs continue to seek to develop hybrids with greater productive potential, but with different maturity in relation to what was being commercialized. Thus, the knowledge of the germplasm and the understanding of the genetic control for maturation is of fundamental importance in order to optimize the selection of lineages and to speed up the process of development of competitive hybrids.
The precocity of a maize lineage or hybrid is determined primarily by the number of days for flowering and the rate of grains water loss after physiological maturity2-4. The genetic control involved in determining the number of days for flowering in maize has been investigated in several studies, detecting different magnitudes and types of gene action in the expression of the character, in addition to having been found QTLs in all the binding groups of maize5,6. The speed of grain moisture loss is a feature that is poorly genetically studied, because it is labor intensive and highly influenced by environmental conditions7. Thus, it is of utmost importance to map the genetic factors and to estimate the gene effects and interactions in order to allow selection based on genomic information8.
The use of molecular markers has been quite effective in the study of the genetically controlled characteristics, for detecting polymorphisms directly at the molecular level and not being influenced by the environment9. The development of single nucleotide polymorphism (SNP) markers, large-scale genotyping at an affordable cost and increased computational analysis capabilities, allowed the development of new genetic mapping methods, including association genetics, which enables the researcher to map the germplasm without the need to generate crossings, using association genetics7.
In this analysis strategy, the basis of the mapping occurs in the linkage disequilibrium of the unrelated population, in which the linkage disequilibrium between the marker and the QTL is statistically evaluated, which are variations generated by mutations and amplified by recombinations during the evolution of the species10. The strategy of genetics association in plant breeding was initially proposed in maize crop by Bernardo and Yu11. Using this strategy, Durand et al.12 found a set of genes closely linked to flowering. The objective of this work was to identify genomic regions for number of days for flowering and loss of moisture after physiological maturity in common maize strains.
MATERIALS AND METHODS
Plant materials: The experiments were carry out in the years 2014-2015 in the experimental field and in the Biotechnology Laboratory of the company COODETEC-Cooperativa Central de Pesquisa Agrícola (lat 24°53'8.54"S, long 53°32'4.72"W and alt 678 m asl), in Cascavel-PR, Brazil. The association mapping was conducted from a panel of 72 maize lines from the Maize Improvement Program of this company. For the association analysis, the work was divided in two stages, the first one was genotyping the lines and the second the phenotyping for days for male (DMF) and female (DFF) flowering and area below the moisture curve (ABMC).
Genotyping: The genomic DNA of the 72 lines from the association panel was obtained from seeds using the method described by McDonald et al.13 and genotyping with SNP markers was performed with the Axiom™ Genome-Wide Maize 600 Array Platform at Affymetrix company (Santa Clara, USA). From the total of 616, 201 SNP markers contained in the Array, 418, 287 markers were used that presented high resolution results and polymorphic in the samples used to perform the analyzes.
Phenotyping: The experimental trial for phenotypic analysis was carried out in an incomplete block design with three replications in the 2014/15 crop in the experimental farm of COODETEC, in Cascavel, Parana, Brazil. The experimental unit consisted of four 4 m streets, with 20 plants per row and the row spacing was 0.76 m. The lateral and between the blocks border was seeded with a commercial hybrid.
The number of days for male (DMF) and female (DFF) flowering was determined in stage R1 in total plot, assessing the date when 51% of the plants had pollen emission (DMF) and exposed-style-stigmas (DFF).
The collection of the cobs for the determination of the moisture content began at the moment that the earliest strains reached stage R6 (physiological maturation) and presented around 30% of humidity. Three cobs were randomly sampled in the observations, with moisture established in automatic moisture determinator type 999 ES Motomco®, with maximum dial of 35.8%. The cobs were collected without straw detachment, placed in plastic bags of "onion type" network, labeled and transported quickly to a protected environment, to measure the moisture. In total, 7 collections were performed at 4th day intervals, totaling 28 days of grain moisture counts. The grain moisture percentages for the seven samples were tabulated in a graph to estimate the area below the moisture curve (ABMC) using methodology described by Shaner and Finney14.
Data from DMF, DFF and ABMC were submitted to genotypic analysis for the prediction of genotypic values, using SELEGEN-REML/BLUP software model 2115.
Population structure analysis: In order to avoid spurious associations between SNP markers and the characteristics evaluated, population structure analysis was initially performed. The analysis was performed using the Markov Chain Monte Carlo algorithm (MCMC) for the generalized bayesian model, using the InStruct program16. The algorithm implemented in the InStruct program does not require the Hardy-weinberg equilibrium assumption in the population.
In order to obtain a good convergence of the data, the allele frequencies were estimated for each number of simulated subpopulations (k), and subsequently, the probability of the lineage it was considered to belong to a population k. For this purpose, 5000-burn-in simulations with 50,000 repetitions (run length periods) were used, testing 8th possibilities of sub grouping (k). The best value of k (groups) was the one that presented the lowest value in the deviance information criterion (DIC) among all simulated k.
Association analysis: The analysis of association between the SNPs markers and the characters of interest, DMF, DFF and ABMC was performed using the TASSEL software version17 5.2.12, using mixed linear model (MLM) followed by multiple regression analysis using the Stepwise method of model selection using the JMP program18.
The values of -Log10 (P) obtained in the MLM analysis were used to obtain the Manhattan Plot type graphs. Significant 0.1 or 0.5% SNP markers in the MLM analysis were used in the multiple regression analysis by the stepwise method of choice of the model, with an input or output probability of 5% in order to obtain a linear model containing only not redundant markers.
RESULTS AND DISCUSSION
Phenotypic analysis: The accuracy estimates of the field test were higher than 98% and the coefficients of experimental variation did not exceed 2.3 (Table 1).
Table 1: | Predicted genotypic values and genetic parameters for days of male flowering (DMF), days for female flowering (DFF) and area below the moisture curve (ABMC) estimated by the REML/BLUP (SELEGEN, Model 21) evaluated in maize germplasm (Cascavel/PR, harvest 2014/15) |
IC: Confidence interval, Gv: Genotypic variance, Rv: Residual variance, Pv: Phenotypic variance, h2: Heritability, Gvc (%): Genotypic variance coefficient, Rvc (%): Residual variance coefficient |
Accuracy measures the correlation between predicted genetic values and true values. Tests with accuracy above 90% are considered of excellent experimental accuracy15. This indicated that the phenotypic evaluations reflect with high precision the genotype of the plants. This condition is fundamental to the success in identifying any association that may exist between molecular markers and phenotypic characteristics.
The averages number of days for male flowering (DMF), number of days for female flowering (DFF) and area below the moisture curve (ABMC) were 68.2, 69.4 and 836, respectively. The genotypic values for DMF ranged from 58.5-77.7 days, with a confidence interval of 1.3 days, being close to that obtained for DFF from 58.5-79.7 days, with a confidence interval of 1.5 days. For ABMC, the amplitude was of 606.4-976.9, with a confidence interval of 25.6 (Table 1).
On the whole, the genotypic values were similar to the phenotypic averages, which can be justified by the high accuracy of the experiment and the high relation between genetic and residual variation (CVgi/CVe), denoting the greatest contribution of the genetic effects to expression of the traits studied, since the environment had little influence.
The heritabilities observed for the traits studied were high (above 97%) (Table 1). Camara et al.19 also found high heritability values for DMF and DFF in the order of 94.12 and 95.88%, respectively. Heritability estimates aid in plant breeding by allowing the determination of the maximum possible genetic gain for the character of interest by selecting and obtaining superior genotypes, with less time and resources20. In addition, estimates of heritability are important in QTL mapping as they indicate the accuracy of estimates of the average genotypic values that will be used in the analyzes21.
Association analysis: In the association analysis using linear mixed model, 1432, 1279 and 203 SNP markers associated were identified with DMF, DFF and ABMC, respectively, distributed in the 10 maize chromosomes (Fig. 1). These results show in a preliminary way, that the ABMC characteristic is associated to a smaller number of genomic regions, when compared to the DMF and DFF characters, despite having a strong genetic correlation (rg = 0.8). Genomic regions in all chromosomes are associated with DMF and DFF, more frequently on chromosomes 1, 2, 3, 5, 9 and 10. For ABMC, the highest frequency of associations is on chromosomes 1, 2, 3, 6, 9 and 10. No association of ABMC with SNPs markers on chromosomes 4 and 8 was observed.
The number of days for flowering has been extensively studied in several plant species22,23, including maize. In the case of maize, the number of days for flowering is one of the most important characteristics for the evaluation of lineages and hybrids5,6,24-27.
Several studies have identified many genomic regions associated with the number of days for flowering in maize, similar to the results obtained in the present study, regardless of the mapping strategy employed. Buckler et al.5, studied the genetic architecture for days for flowering in maize, identified several QTLs with little additive effect. The authors identified that the sharing of QTLs by different populations depends on the genetic constitution of the founder lines.
Li et al.6 used three panels of lines with broad genetic background and detected approximately 1000 SNPs associated to the character number days for flowering. Most of these markers were mapped in genomic regions related to 220 candidate genes, including Vgt located on chromosome27 8 and ZmCCT located on chromosome25 10. These genes are related to the determination of the transition between the vegetative phase and the photoperiod response, respectively. These results confirm the complexity of the characteristic, which indicated the need to identify associations that contribute most to the determination of the characteristic and which can be used in a breeding program through marker-assisted selection.
In order to reduce the information redundancy of the associations detected by the LMM analysis, the significant markers were used in a multiple regression analysis with stepwise method of model selection. According to Schuster and Cruz28, this statistical method tests the model that best explains the variation of each character, including associated main variables, without redundancy. The use of stepwise regression analysis is usually limited by the large number of SNPs29, but when the number of markers can be reduced by the application of some type of filter, the effects can be reduced to the possibility of false-positive identification30,31. In this work, the number of SNPs markers was reduced by reducing the probability value in the selection of the markers used in the multiple regression.
In the model obtained from the stepwise regression analysis for the DMF characteristic, six markers were included in chromosomes 1, 2 and 3, which explain 79% of the variation observed in DMF (Table 2). The markers AX-91579515 and AX-90671783, on chromosomes 3 and 1 respectively, accounted for 45% of the variation observed.
For the DFF characteristic, seven markers from chromosomes 1 and 3 were included in the model and they explained 93% of the variation observed (Table 2). Markers AX-90583917 and AX-90830961 explain 66% of the variation.
Fig. 1(a-c): | Manhattan plot analysis of association in maize using LMM (a) Days for male flowering (DMF), (b) Days for female flowering (DFF) and (c) Area below the moisture curve (ABMC) |
Table 2: | Designation and position of the SNP markers on the respective chromosomes, effect and percentage of the phenotypic variance explained by the markers (R²) based on the stepwise multiple regression analysis for DFF, DMF and ABMC |
DMF: Days for male flowering, DFF: Days for female flowering, ABMC: Area below the moisture curve |
By comparing the markers identified for DMF and DFF, it is verified that there is no coincidence of marks for these highly correlated characters (r = 0.92), but AX-90829656 is found at 438 kb of the marker that shows the greatest effect for number of days for male flowering. In addition, the stepwise regression model included the markers AX-90583917 and AX-90674467, which even though they are mapped at 165 kb apart are important for the determination of DFF. These results are partially in agreement with those obtained by Yang et al.32, which despite having identified genomic regions associated with maize bloom in virtually all chromosomes, the marks with significant effect were distinct for male and female flowering, including five on chromosome 1 and two on chromosome 3.
By stepwise analysis for ABMC, a model with six markers explained 79% of the variation observed. The markers are mapped on chromosomes 1, 3, 9 and 10 (Table 2). Despite the high genetic correlation coefficients between ABMC and DMF (r = 0.81) only the marker AX-90671783, mapped on chromosome 1, was common for these phenotypic attributes. In addition, marker AX-90830960, mapped in genomic region of chromosome 3 and with effect on ABMC, is located only 55 kilobases from AX-90830961, which has a significant association for DFF (r = 0.78).
In relation to the significant markers by the stepwise analysis in the different chromosomes, the existence of two genomic regions of chromosome 1 involved in the determination of these traits related to the maturation of grains in maize is detected (Table 2). The marker AX-91456741, which is associated with DMF, represents one of the genomic regions. In the other genomic region of the same chromosome, the marker AX-90671783 was significant for DMF and ABMC, besides the markers AX-90583917 and AX-90674467 are very close (165,3 kilobases) and associated with DFF . For chromosome 3, we observe two genomic regions involved in the determination of ABMC, being represented by the markers AX-90794386 and AX-91559262, respectively. There is a third genomic region associated with DMF (marker AX-90821080) and a fourth region that have influence on the three phenotypic evaluated characteristics. Among the markers of the latter, it is found that the markers AX-91579515 and AX-90829656, involved in the determination of DMF and DFF, respectively, are relatively close, that is, they differ from each other at 438.4 kb. In addition, the AX-90830961 and AX-90830960 markers, also mapped in chromosome 3, are even closer (54.9 kb), being associated with DFF and ABMC, respectively.
Different studies, including a recent one carried out by Li et al.6 have confirmed the importance of genomic regions in chromosomes 8 and 10 and which are associated with flowering in maize, but were not detected in the present work. On chromosome 8, the Vgt genes6 and ZCN833 are mapped, associated with the transition between the vegetative and reproductive phases. In a study conducted with the Vgt1 allele, which promotes the anticipation of flowering in 100 GDD in maize plants under high latitude climatic conditions, allele frequency was found to be rare in tropical genotypes and provides an adaptive advantage to temperate genotypes24. The ZmCCT gene, mapped on chromosome 10 is considered to be one of the most important genetic components for photoperiod response25, which contains alleles of day length response and promotes an increase in the number of days for the flowering6,32. However, Hung et al.25 reinforces the occurrence of maize lineages containing ZmCCT alleles, including some that are adapted to tropical regions, which are insensitive to the photo period and which must have been selected by the people who domesticated the maize.
The results obtained help to explain the high correlations between the assessed characteristics. The majority of the markers most strongly associated with the characteristics evaluated are located on chromosomes 1 and 3. Genes which are lined in these chromosomes result in a high genetic correlation between these characteristics. Bound genes are more likely to be transmitted together during meiosis, which explains the correlation between the characteristics. Gene binding is a transient cause for correlation34, since the genetic linkages can be broken and the genes get in balance.
Some of the markers associated with different characteristics and that are closely related may be associated with the same gene. In this case, these are pleiotropic genes, that is, a gene that affects more than one characteristic. Pleiotropy is a permanent cause of correlation35. Some examples of probable pleiotropy on chromosome 3 are in the following regions: Marker AX-91579515, associated with DFF, distant 434.4 kilobases from marker AX-90829656, associated with DMF; marker AX-90830960, associated with ABMC, far 55 kilobases from AX-90830961, associated with DFF. The marker AX-90830961, associated with DFF, distant 54.9 kilobases from marker AX-90830960, associated with ABMC. On chromosome 1, marker AX-90671783 is associated with ABMC and DMF and it is also an example of pleiotropy.
CONCLUSION
The results obtained in this work reveal the importance of genomic regions of tropical maize on chromosomes 1 and 3 for male and female flowering and speed of moisture loss. However, it is necessary to perform the validation in temperate maize germplasm in order to make the use of these markers in a SAM program feasible.
SIGNIFICANCE STATEMENT
This study discovers the genomic regions associated with moisture loss in maize, being the first to use GWAS for this trait. Our work has novel findings in understanding corn maturation, associating for the first time the flowering time and loss of moisture in corn grains. Our results will contribute significantly to the advance of the current knowledge in the field of maturation of corn. Maturity time is one of the most important targets for maize breeding in almost all maize producing area in the world. Besides this, associating maturity time with the speed of moisture loss, will provide to the breeders, new tools to produce inbreeds and hybrids with highest precocity. In this way, The markers identified in this work, associated with the QTLs for flowering time and moisture loss can be validated in different backgrounds and then used in Marker Assisted Selection in plant breeding.