
Research Article


Evaluating the Performance of 16 Egyptian Wheat Varieties Using SelfOrganizing Map (SOM) and Cluster Analysis 

O.M. Ibrahim,
M.M. Tawfik Elham,
A. Badr
and
Asal M. Wali



ABSTRACT

To evaluate the performance of 16 wheat varieties based on agronomic parameters using selforganizing map and cluster analysis, a field experiment was conducted during 2013/2014 and 2014/2015 winter seasons at the farm of the National Research Center at Nubaria district, Albehira Governor ate, Egypt. In cluster analysis, all the studied characters were used to construct a distance matrix using the Euclidian coefficient and used to generate dendrogram showing dissimilarity among all the wheat varieties, distance matrix based on Euclidian coefficient for the 16 wheat varieties revealed that dissimilarity ranged from 0.62 between Gemmiza 10 and Beniswef 57.73 between Beniswef 6 and Sakha 93, which reveal the diversity among wheat varieties. Cluster analysis classified the 16 varieties into nine clusters whereas, selforganizing map classified the 16 varieties into 11 clusters which accounting for 95% of the variation. The clusters in SOM consist of nodes where, varieties in the same node are more similar than varieties in different nodes in the same cluster. However, varieties in the same cluster are more similar than varieties in different clusters. The results revealed that varieties with higher grain and straw yield were higher in plant height (cm), number of spikes per meter square, spike length (cm) and number of grain per spike suggesting that grain and straw yield were strongly correlated with those parameters than other parameters. Also, the results suggested that using self organizing map is helpful to classify varieties clearly and more interpretable than cluster analysis.





Received: October 24, 2015;
Accepted: December 05, 2015;
Published: January 15, 2016


INTRODUCTION
Artificial Neural Networks (ANNs) have been considered a powerful modeling technique in the agricultural sciences (Ibrahim, 2012, 2013, 2015). A SelfOrganizing Map (SOM) is a type of artificial neural networks that is trained using unsupervised learning to produce a lower dimensional representation of the inputs, called a map. The goal in the field of assessment of varieties performance and variability is to extract useful information out of large dimensional data sets. Clustering large datasets has been a domain of classical statistical methods. Recently a new approach, Self Organizing Map (SOM), (Kohonen, 2001) has been proposed in order to classify high dimensional datasets. When it is compared with the other clustering algorithms, SOM has the greatest visualization capability. Also, the detailed information can be determined by using the SOM’s outputs due to the easiness of interpretation of the visualized outputs. On the other hand, traditional cluster analysis has limited visualization property. Clustering methods may involve a variety of algorithms but almost invariably build distinct self contained clusters (Curry et al., 2001), whereas, the neurons of the SOM are not mutually exclusive. This means that the final feature map, instead of showing several distinct clusters with differing characteristics, shows neighboring nodes which have many similar characteristics but differ perhaps on one or two, or in degree of intensity of characteristics (Curry et al., 2001). The traditional cluster analysis cannot be sufficient for analyzing the data containing many data cases and large number of variables, however, SOM is considered as an effective method in dealing with high dimensional data. The traditional cluster analysis methods are designed under strict assumptions of certain statistical distribution; however, there is no need for making that kind of assumptions in application of SOM. For instance, continuous variables should satisfy normal distribution assumption and categorical variables should satisfy multinomial distribution assumption in order to perform cluster analysis effectively (Norusis, 2004). Furthermore, the number of the clusters should be known at the initial of the Kmeans clustering method. However, the number of the clusters is not a prerequest at the initial stage of SOM and the correct number of clusters will be directly shown by the result itself. The sorting ability of the traditional cluster analysis is an important problem for the reliability of the solutions. Whereas, the SOM can remedy that problem, because the Umatrix does not give any results, when there are no obvious clustering relations in the original space, thus, unreasonable arbitrary classification can be avoided (Zhang and Li, 1993). The present study was conducted to assess the performance and variability of 16 Egyptian wheat varieties under sandy soil conditions by using selforganizing map artificial neural network and traditional cluster analysis.
MATERIALS AND METHODS
A field experiment was carried out during the winter seasons of 20132014 and 20142015 at the Research and Production Station, National Research Centre, ElNubaria Province, ElBehaira Governorate, Egypt to assess the performance and variability among 16 wheat varieties grown under newly reclaimed sandy soil. The experimental design was randomized complete block design. Plot area was 10.5 m^{2} (3.5 m long and 3 m wide). Phosphorus fertilizer was added before sowing at the rate of 31 kg P_{2}O_{5} per fed as calcium superphosphate (15.5% P_{2}O_{5}) while, potassium was added at the rate of 24 kg K per fed as potassium sulphate (48% K_{2}SO_{4}), nitrogen fertilizer was applied at the rate of 75 Kg N per fed in the form of ammonium nitrate (33.5% N) in three equal doses, 20 days after sowing, tillering and heading stages. Sowing was in mid November and irrigated just after sowing using sprinkler irrigation system and water was added every 5 days. At harvest, the two central rows were harvested and the following characters were recorded: Grain yield (tons per fed), straw yield (tons per fed), biological yield (tons per fed), harvest index (economical yield/biological yield×100), number of grains per spike, dry weight of grains per spike (g), 1000 grains weight (g), number of spikes.
Seven agronomic parameters [plant height (cm), number of spikes per meter square, spike length (cm), number of grain per spike, 1000 grain weight (g), grain yield (tons per fed.), straw yield (tons per fed.)] were used as inputs to classify the varieties. They were normalized to the range of 01 using a logistic function and standardized before being provided to the self organizing map model and cluster analysis, respectively as inputs. In this study, cluster analysis begins by separating each variety into a cluster by itself. At each stage of the analysis, the distance by which varieties are separated is relaxed in order to link the two most similar clusters until all of the varieties are joined in a complete classification tree. The cluster analysis was performed using the Ward method with Euclidean distance coefficient to evaluate dissimilarity among all the varieties. Before performing cluster analysis, the data were first standardized by subtracting the mean from each value and then divided on the standard deviation (El Kramany et al., 2009; Ibrahim et al., 2011; Bakry et al., 2014). Self organizing map is a realistic model of the biological brain function (Kohonen, 2001). The SOM consists of input and output layers which were composed of neurons serving as the computational units in the network. Input and output layers connected with connection weights. When the inputs (the seven agronomic parameters), xs were given to the network, the distance between the weight vector, w_{i} and the input vector, xs was calculated by Euclidean distance, xsw_{i}. The output layer consists of 16 neurons in a twodimensional hexagonal lattice connected via weights with inputs (Vesanto and Alhoniemi, 2000; Bacao et al., 2005). Learning of SOM is iteratively and can be conducted with a subset of the data or the entire data vectors. Prior to learning, the connection weights of the map units (Wi) are initialized with random values. During the learning phase each input vector (xs) is presented to the network and Euclidean distances between xsi and all vector units or nodes in the network are computed. The node (q) with the shortest Euclidean distance commonly known as Best Matching Unit (BMU) is selected as a winner:
where, q is winning neuron, xsi and wji are the ith element of the input vector Xs and the ith weight of neuron j, respectively. This winning neuron becomes the centre of an update neighborhood. Update neighborhood is an area within which nodes and their associated weights according to Kohonen rule will be updated, each weight vector converges to the input pattern. So, the nodes in a selforganizing map compete to best represent the particular input sample. This process is repeated for every input sample as they are introduced sequentially to the SOM. During this iterative process, the rate at which the winning nodes converge to the input samples is termed the learning rate (α^{s}). Throughout learning, the learning rate and size of the update neighborhood (the update radius) decrease, so that the initial generalized patterns are progressively refined. After the learning phase, the SOM consists of a number of vectors with similar vectors nearby and dissimilar vectors further apart (Richardson et al., 2003; Mingoti and Lima, 2006). Without normalization, the variable with the largest range will dominate the map organization.
RESULTS AND DISCUSSION
The seven studied agronomic parameters were used to construct a dissimilarity matrix using the Euclidian coefficient, (Table 1) and used to generate dendrogram (tree diagram) showing dissimilarity among all the varieties (Fig. 1). In Table 1 the distance matrix reveals that dissimilarity ranged from 0.62 between Gemmiza10 and Beniswef57.03 between Sakha93 and Beniswef6, which reveal the agronomic diversity among varieties.
Table 1: 
Distance matrix based on Euclidian dissimilarity coefficient for the 11 sugar beet varieties 

Table 2:  Wheat varieties groups issued from cluster analysis and its agronomic parameters mean values 

C 1: Sids1, Sids12, Sids13, C 2: Sakha94, Gemmiza7, C 3: Gemmiza10, Beniswef5, Shandweel1, C 4: Beniswef6, Misr2, C 5: Gemmiza9, Giza171, C 6: Sakha93, C 7: Giza168, C 8: Gemmiza11 and C 9: Sohag3 
Figure 1 displays the tree diagram. The figure provides a graphical view of the clusters. As the number of branches grows to the bottom from the root, the R^{2} approaches 1 and the distance approaches 0. The 9 clusters (branches of the tree) account for 95% of the variations among all the varieties, In other words, 9 clusters are necessary to explain 95% of the variations among varieties.
Cluster analysis was approved as a suitable method for data classifying and suggested by (Mohammadi and Prasanna, 2003). Based on the cluster analysis in Fig. 1, we can divide the 16 varieties can divide into 9 clusters based on the studied agronomic characters as shown in Table 2, which reveal that the cluster number 6 (Sakha93) was the highest in grain yield per fed. (2.27 tons per fed.), number of grain per spike (65.33), spike length (12.37 cm) and number of spikes per meter square (496.33) this means that this cluster demonstrated the relationship between grain yield and number of grain per spike, spike length, number of spikes per meter square. On the other hand, the third cluster (Gemmiza10, Beniswef5 and Shandweel1) and fourth cluster (Beniswef6 and Misr2) were the lowest in grain yield per fed. (2.02 tons per fed.), number of grain per spike (60.33 and 58.33), 1000 grain weight (47.73 and 49.53 cm) and number of spikes per meter square (455 and 441.17). This ensures the relationship between grain yield and both number of grain per spike and number of spikes per meter square. The rest of the clusters had intermediate values of grain yield. Every cluster can be represented by any variety belonging to that cluster; this will be useful in reducing the number of varieties being tested in the next assessment. The current findings of cluster analysis are in agreement with those obtained by Bakry et al. (2014), El Kramany et al. (2009) and Ibrahim et al. (2011), who mentioned that agronomic parameters were useful in clustering flax, triticale and barley varieties using traditional cluster analysis.
Based on SOM in Fig. 2 and 3, the 16 varieties divided into 11 clusters. The clusters consist of nodes where varieties in the same node are more similar than varieties in different nodes in the same cluster (Ibrahim et al., 2013). However, varieties in the same cluster are more similar than varieties in different clusters. Based on the seven studied agronomic parameters as shown in Table 3 the results revealed that the cluster number 11 (Giza168 and Sakha93) was the highest in grain yield (2.25 tons per fed), spike length (12.2 cm) and number of grain per spike (64.33) while the cluster number 4 (Beniswef6, Misr2) and the cluster number 9 (Shandweel1) were the lowest in grain yield (2.02 tons per fed.), number of grain per spike (60.33 and 58.33), 1000 grain weight (47.73 and 49.53 cm) and number of spikes per meter square (455 and 441.17).
Table 3:  Wheat varieties groups issued from self organizing map and its agronomic parameters mean values 

C 1: Gemmiza9, Giza171, C 2: Sids12, Sids13, C 3: Beniswef5, Sids1, C 4: Beniswef6, Misr2, C 5: Gemmiza7, C 6: Gemmiza10, C 7: Sohag3, C 8: Gemmiza11, C 9: Shandweel1, C 10: Sakha94 and C 11: Giza168, Sakha93 

Fig. 2(ad): 
Visualization of varieties in the trained SOM in color scale according to agronomic parameters (a) SOMwardclusters, (b) Grain yield ton per fed, (c) Straw yield ton per fed and (d) Plant height. Red color: High value, Blue color: Low value 

Fig. 3(ad): 
Visualization of varieties in the trained SOM in color scale according to agronomic parameters (a) Spike length (cm), (b) No. of spikes per meter square, (c) No. of grain per spike and (d) 1000 grain weight gram. Red color: High value, Blue color: Low value 
This reflects the relationship between grain yield and its components. This information can’t be quickly extracted from hierarchical cluster analysis neither from the distance matrix nor from the dendrogram until we manually calculate the average of each group for all the parameters, however, these information are illustrated visually on the maps of SOM. The results of the current study are in agreement with those obtained by Ibrahim et al. (2013, 2015) who stated that selforganizing map was helpful in classifying treatments and varieties clearly and more interpretable than cluster analysis.
CONCLUSION
SelfOrganizing Map (SOM) and hierarchical cluster analysis were utilized to classify 16 wheat varieties. Seven agronomic parameters were used in determining the performance and variability among wheat varieties. The SOM showed a high performance for visualization of agronomic data. The trained SOM efficiently classified wheat varieties according to gradients of input agronomic variables and displayed a distribution of each input agronomic variables. Also, the SOM showed high performance in analyzing the relationships among the agronomic variables and consequently could be used as a tool to extract relationships between agronomic variables. The SOM could be an alternative tool to traditional cluster analysis in fields such as crop science. The results obtained have shown that the agronomic parameters are very useful for the initial description.

REFERENCES 
1: Bacao, F., V. Lobo and M. Painho, 2005. The selforganizing map, the GeoSOM and relevant variants for geosciences. Comput. Geosci., 31: 155163. CrossRef  Direct Link 
2: Bakry, A.B., O.M. Ibrahim, T.A.E. Elewa and M.F. ElKaramany, 2014. Performance assessment of some flax (Linum usitatissimum L.) varieties using cluster analysis under sandy soil conditions. Agric. Sci., 5: 677686. CrossRef  Direct Link 
3: Curry, B., F. Davies, P. Phillips, M. Evans and L. Moutinho, 2001. The Kohonen selforganizing map: An application to the study of strategic groups in the UK hotel industry. Expert Syst., 18: 1931. CrossRef  Direct Link 
4: El Kramany, M.F., O.M. Ibrahim, S.F. El Habbasha and N.I. Ashour, 2009. Screening of 40 Triticale (X Triticosecale wittmack) genotypes under sandy soil conditions. J. Applied Sci. Res., 5: 3339. Direct Link 
5: Ibrahim, O.M., 2012. Simulation of Barley grain yield using artificial neural networks and multiple linear regression models. Egypt. J. Applied Sci., 27: 111. Direct Link 
6: Ibrahim, O.M., 2013. A comparison of methods for assessing the relative importance of input variables in artificial neural networks. J. Applied Sci. Res., 9: 56925700. Direct Link 
7: Ibrahim, O.M., 2015. Evaluating the effect of salinity on corn grain yield using multilayer perceptron neural network. Global J. Adv. Res., 2: 400411. Direct Link 
8: Ibrahim, O.M., A.A. Gaafar, A.M. Wali and M.M. Tawfik, 2015. Assessing the performance and variability of some sugar beet varieties using selforganizing map artificial neural network and cluster analysis. Int. J. ChemTech Res., 8: 1219. Direct Link 
9: Ibrahim, O.M., A.T. Thalooth and E.A. Badr, 2013. Application of Self Organizing Map (SOM) to classify treatments of the first order interaction: A comparison to analysis of variance. World Applied Sci. J., 25: 14641468. Direct Link 
10: Ibrahim, O.M., M.H. Mohamed, M.M. Tawfik and E.A. Badr, 2011. Genetic diversity assessment of barley (Hordeum vulgare L.) genotypes using cluster analysis. Int. J. Acad. Res., 3: 8185. Direct Link 
11: Kohonen, T., 2001. SelfOrganizing Maps. Springer, New York, USA., ISBN: 9783642569272, Pages: 502.
12: Mohammadi, S.A. and B.M. Prasanna, 2003. Analysis of genetic diversity in crop plantssalient statistical tools and considerations. Crop Sci., 43: 12351248. CrossRef  Direct Link 
13: Norusis, M.J., 2004. SPSS 13.0 Advanced Statistical Procedures Companion. Prentice Hall, Upper SaddleRiver, N.J., ISBN13: 9780131865402, Pages: 368.
14: Richardson, A.J., C. Risien and F.A. Shillington, 2003. Using selforganizing maps to identify patterns in satellite imagery. Prog. Oceanogr., 59: 223239. CrossRef  Direct Link 
15: Mingoti, S.A. and J.O. Lima, 2006. Comparing SOM neural network with fuzzy cmeans, Kmeans and traditional hierarchical clustering algorithms. Eur. J. Operat. Res., 174: 17421759. CrossRef  Direct Link 
16: Vesanto, J. and E. Alhoniemi, 2000. Clustering of the selforganizing map. IEEE Trans. Neural Networks, 11: 586600. CrossRef  PubMed  Direct Link 
17: Zhang, X. and Y. Li, 1993. Selforganizing map as a new method for clustering and data analysis. Proceedings of the International Joint Conference on Neural Networks, Volume 3, October 2529, 1993, Nagoya, Japan, pp: 24482451.



