Subscribe Now Subscribe Today
Research Article

Multivariate Classification of China’s Regional Energy Consumption Pattern

Enebeli Emmanuel Emeka, Cheng Jinhua and Xu Xiaoping
Facebook Twitter Digg Reddit Linkedin StumbleUpon E-mail

The unbalanced mix of energy consumption pattern in China and its environmental consequences have been a controversial topic researched on by many researchers. With an energy consumption structure clearly dominated by coal, the emission of CO2 into the atmosphere becomes inevasible, posing future danger to the environment and the health of its citizenry. Before formulating successful policy reforms that will put in check this danger or proffer an alternative energy source, identification of dense energy consumption regions is paramount. This study looks at regional energy consumption pattern of 30 Chinese regions and further investigates whether there is evidence of distinct groups of regions based on their energy consumption indicators by means of cluster analysis. Ningxia and Qinghai regions were identified as outliers and were excluded from initial consideration. Four-cluster groups were obtained using Ward’s hierarchical clustering method with 7, 13, 4 and 4 cluster members, respectively in each of the clusters. The resulting cluster groups were statistically evaluated against their Gross Regional Products using General Linear Model (GLM) procedure and the result obtained was significant at 5% level of significance. However, further analysis shows that Ningxia and Qinghai regions could be considered under cluster 4 for policy formulation and implementation purposes.

Related Articles in ASCI
Similar Articles in this Journal
Search in Google Scholar
View Citation
Report Citation

  How to cite this article:

Enebeli Emmanuel Emeka, Cheng Jinhua and Xu Xiaoping, 2009. Multivariate Classification of China’s Regional Energy Consumption Pattern. Journal of Applied Sciences, 9: 2576-2583.

DOI: 10.3923/jas.2009.2576.2583



Energy is one of the most critical resource for modern life and economic growth of any nation. It encompasses crude oil, gas, coal, hydro-power and uranium used in generating electricity, heat and other forms of energy used to power homes, businesses and transportation. It is described as an engine of economic growth since many production and consumption activities involve energy as a required input. Over the years, as China emerges as one of the world’s largest economic players, its GDP and energy consumption have concurrently risen (Fig. 1).

China’s rising economic growth trend has been accompanied by energy consumption growth averaging 5% per year with electricity consumption growth of 8% annually. China consumed 9.6% of global energy consumption in 1997 and this has been projected to be 16.1% by 2020 (Cole, 2003). China’s demand for energy has surged mainly to fuel its expanding industrial and commercial sectors as well as its households. In 2004, China accounted for 37.8% of the world’s solid energy consumption, 4.4% of liquid energy consumption, 7% of electricity consumption and 1.4% of gas consumption as compared to the year 2000 when she only accounted for 23.3% of solid energy, 4.3% liquid energy, 2.5% of electricity and 1.3 of gas consumption, respectively (National Bureau of Statistics, 2008). Today, China is the second largest consumer of energy products in the world behind the United States, consuming annually about 1.7 billion tons of coal equivalent worth of energy (Crompton and Wu, 2005).

The composition of China’s total energy consumption has been predominantly made-up of coal. Accompanying the growth in energy consumption, China’s CO2 (carbon dioxide) emissions have grown rapidly. China is today the world’s largest producer and consumer of coal. At present, it contributes about 13.5% of global CO2 emissions, which makes it the world’s second largest CO2 emitter, after the United States (Zhang, 2000; Crompton and Wu, 2005; Fisher-Vanden et al., 2006). This situation has imposed a high cost on the economy in terms of environmental pollution associated with excessive use of coal. With the growing environmental concern among the Chinese people, the energy industry and policy makers are under tremendous pressure to change the structure of energy consumption toward shifting from coal to cleaner products such as natural gas and hydroelectricity and formulating policies that will prevent excessive use of energy-intensive appliances that are detrimental to the environment.

Fig. 1:
China’s GDP and energy consumption in value terms (1978-2006). Yuan: Chinese national currency, SCE: Standard coal equivalents. Source: National Bureau of Statistics (1989, 1999 and 2007)

This study looks at regional energy consumption pattern of 30 Chinese regions and further investigates whether there is evidence of distinct groups of regions based on their energy consumption pattern. The result is expected to create awareness on China’s dense energy consumption regions. In addition, offers a logical demarcation showing regions more likely to undergo reforms or be classified as policy-target regions when formulating and implementing policies that will bring about balance in energy consumption structure; and in harmony with the environment and the growing economy.


The trend of energy consumption in China is closely related to that of its output. The magnitude of electricity generated and used by various regions oscillates in the same direction with their capital formations and gross regional products. Figure 2 shows graphically these three variables for the 30 regions under consideration.

Intuitively, a causal relationship exists among these variables indicating that similar reasons may lead to changes in their trend or their concentration in any particular region. Two main reasons for continuous rise in China’s energy consumption are discussed below.

Increase in the number of industries: The growth in China’s GDP is substantially caused by its increase in industrial production resulting from increase in the number of industries and improved productivity. Most government industrial policies in China are designed to encourage domestic industries with the intention of creating more jobs and encouraging greater output for domestic consumption and exports.

Table 1: Industry concentration (number of firms in China)
Adopted from: Rosen and Houser, 2007. Source: Beijing Kang Kal Information and Consulting from ISI Energy Markets. *2002 number is from a February 2003 survey

With China’s population of over a billion and the openness of the economy to foreign investors, multinational companies are attracted by the booming industry and lesser cost of production. These have greatly affected the growth in energy-intensive heavy industry (Table 1). China is now manufacturing for itself, rather than importing from abroad, more of the energy-intensive basic products, such as steel and aluminum, used in construction of roads and buildings and for export purposes.

Recently, industry accounts for over 70% of final energy consumption in China, while the residential, commercial and transportation sector account for 10, 2 and 7%, respectively (Rosen and Houser, 2007).

Changes in domestic consumption and pattern of consumption: Domestic market for these industrial products in China is also waxing strong. With higher rate of urbanization and increase employment opportunities, Chinese households are reaching income levels at which energy-intensive consumer goods are within reach. China’s strong economic growth has also brought about demographic changes.

Fig 2:
Average regional gross domestic product, gross capital formation and electricity consumption for China’s regions (2005-2006). GRP: Gross regional product, GCF: Gross capital formation, ELEC: Electricity consumption. Source: National Bureau of Statistics, China (2006 and 2007)

Currently, 40.5% of the population lives in urban areas. The average growth rates in urban areas are 1.4% per year. More importantly, these urban populations consume approximately 35 times more energy than rural populations, significantly contributing to rising energy consumption. A rising middle class also means higher energy demand, as individuals demand higher living standards, more travel by air and more cars on the roads. In addition, ownership of air conditioners in households has increased from 11.6% in 1990 to 61.8% in 2003.

The rising living standards have promoted energy consumption in transport sector as well. The market for personal cars is growing strongly-about 2.6 million automobile sedans were produced in 2004, about a 26% rise from 2003. The number of vehicles in China is expected to rise between 120 and 130 million by 2020. At the beginning of this decade, there were 20 million vehicles. Energy demand for all road transport is projected to grow by 4.6% per year from 2004 to 2030 (Fredriksen, 2006).


Cluster analysis: Ward’s Hierarchical method: In order to search for a possible cluster among the regions based on their energy consumption indicators, this study adopts Ward’s hierarchical clustering method because of the numerous characteristic it possess over other methods. Mainly, there are two ways of searching for clusters: hierarchical and nonhierarchical methods. The nonhierarchical clustering (partitioning) methods do not fit the purpose of this study inasmuch as the initial set of cluster seed points are chosen by the researcher before building clusters around each of the seeds (Johnson, 1998). Thus, a researcher’s choice of the initial cluster seeds may affect the nature of the resulting clusters. In hierarchical cluster methods, the initial cluster seeds are assumed as unknown to the researcher. These methods represent an attempt to find better clusters, that is, g cluster groups in n data set. This involves a sequential process and in each step, an observation or a cluster of observations is merged into another cluster. In this process, the number of clusters shrinks and the clusters themselves grow larger. The resulting clusters form a tree-liked structure viewed using a dendrogram.

Frequently used hierarchical clustering methods by researchers are single linkage, complete linkage, average linkage, centroid, median, flexible, beta and Ward’s method (Rencher, 2002) for detailed description of the various methods). Sometimes, it requires a heuristic approach to determine which of the methods is most suitable for a given data set . However, Ward’s method is highly recommended by most researchers. Milligan (1980) compared 15 different clustering procedures and came up with part of the following observations.

Single linkage tends to be myopic. An object could be added to a cluster as long as it is close to any one of the other objects in the cluster, even if it is relatively far from all the others. Thus, single linkage has a tendency to produce long, string clusters and nonconvex cluster shapes. But in most cases, the naturally occurring modes in data tend to be convex and compact and a better reflection of internal homogeneity. Compared to single linkage, complete linkage is more likely to produce convex clusters that tend to be of comparable diameter but this method is highly sensitive to outliers in the data and subsequently affect the shape of the cluster solution. The average linkage method is a compromise between single and complete linkage methods. It comes closest to fitting a tree that satisfies a least squares minimization criterion. Instead of using the minimum (single linkage) or the maximum (complete linkage), it creates a new distance defined as the average distance between the two clusters.

By contrast, Ward’s method adopts a slightly different procedure. Instead of joining the two closest clusters, it seeks to join the two whose merger leads to the smallest within-cluster sum of squares. This tends to produce equal-sized clusters that are likely to be convex. Rencher (2002) explains that if AB is the cluster obtained by combining clusters A and B, the sum of within-cluster distances are:

where, nA, nB and nAB = nA+nB are the numbers of points in A, B and AB, respectively.

Since, these sums of distances are equivalent to within-cluster sums of squares, they are denoted by SSEA, SSEB and SSEAB. Ward’s method joins the two clusters A and B that minimize the increase in SSE, defined as:


Other methods used for analysis in this study are the principal component plot and the General Linear Model (GLM) procedure. When p>2, that is, when more than two variables are being measured on each experiment unit, a principal component analysis can be carried to verify if the data actually fall within a reduced dimensional space. If it is possible to get the effective dimensionality down to two, then the first two principal component scores corresponding to each experimental unit in the data set could be plotted, after which clusters could be identified visually from the plot. The GLM procedure uses analysis of variance and covariance to relate one or several continuous dependent variables to one or several independent variables. In this case, the dependent variable will be Gross Regional Products (GRP) while the independent variables are the resulting discrete cluster groups obtained from the cluster analysis.

Data description: The data used for the clustering of the regions are panel data made up of three variables for the 30 regions under consideration that indicate their levels of energy consumption. The variables are energy consumption per unit gross regional product (equivalent value) measured in ton of SCE per 10,000 Yuan; electricity consumption per unit of gross regional product (equivalent value) measured in KWH per 10,000 Yuan and energy consumption per unit of industrial value-added (equivalent value) measured in tons of SEC per 10,000 Yuan (The three variables will be henceforth referred to in this study as EC, EIC and ECI, respectively). In addition, data on each region’s GRP will be used in test statistics to justify the cluster groups formed. All the data used for the study were obtained from China Statistical Yearbook published by National Bureau of Statistics (2006, 2007) and were based on the average of 2005 and 2006 annual figures.


The empirical results presented herein were conducted by the use of SAS statistical package. An initial univariate analysis carried out on each of the variables (EC, EIC and ECI) for the cluster analysis showed that for the outputs for EC and EIC, Ningxia and Qinghai were clearly outliers. Both were therefore excluded from the initial data set analyzed. In addition, the data were standardized before performing the cluster analysis. These were because the variables (EC, EIC and ECI) have extremely different SD (Table 2) and were not measured under the same unit scale. Otherwise, higher preferences would be placed on variables with higher numerical values when it comes to identifying the clusters.

Table 2: Descriptive statistics for panel data of concern (average of 2005-2006)

Fig. 3: Dendrogram showing regional clusters

Table 3: The cluster procedure
The data have been standardized to mean 0 and variance 1, Root-mean-square total-sample SD = 1, Root-mean-square distance between observations = 2.44949

Figure 3 and Table 3 show the resulting dendrogram for the regions and the summary of the cluster procedure, respectively using Ward’s hierarchical clustering method.

The result on Table 3 shows the Means±SD for each variable as well as measures of skewness, kurtosis and bimodality for each variable. The bimodality index, b, is derived from the function of skewness and kurtosis as follows:

where, n is the number of observations, m3 is skewness and m4 is kurtosis. The values of b range from less than zero to one (0<b = 1).

Values of b greater than 0.55 may indicate bimodel or multimodel marginal distributions (Der and Everitt, 2002). The value b = 0.55 is the expected value of statistic for a uniform distribution while the expected value of statistic for a single normal distribution is 0.33 (Donoghue, 1994). b will takes on values less than or equal to 0.33 when no mean differences are present. It is assumed that b values larger than 0.55 are likely to reflect true subgroup structure. This implies that the result for EIC (b = 0.7022) suggest multimodality and the value for EC (b = 0.5425) is very close to 0.55. Both suggest that there is possible clustering in the data. The cumulative column in the eigenvalues of the correlation matrix section, shows that two eigenvalues account for 96.44% of the total variability in the measured variables. This implies that the measured variables nearly fall within a two-dimensional subspace (a plane) of the three-dimensional sample space meaning that a plot of the first two principal component scores should be very useful for clustering this data. Thus, Fig. 4 shows the plot for the principal component of the data. The number of the cluster groups (g = 4) were subjectively chosen based on drawing a vertical line on the dendrogram in Fig. 3 at the point 0.05 in the semi-partial R2-axis (d = 0.05).

Fig. 4: Graph of the first two principal component scores (g = 4)

Table 4: Four-cluster solution obtained using Ward’s clustering method
d = 0.05

Table 5: The GLM procedure
Class level information: Class: Cluster, Levels: 4, Values: 1 2 3 4, No. of observation: 28

By evaluating the choice of the cluster groups, the plot for the first two principal component shows clearly that the 4 clusters are fairly stratified. It can be observed from the plot that regions in clusters 1, 2 and 3 turns out better clusters when compared with regions in cluster 4. Cluster 4 produces an elongated cluster which are highly disperse and could not be merged with any of the other 3 clusters. Table 4 shows clearly the regions grouped together based on the result obtained from Ward’s clustering output.

The test for a cluster difference in Gross Regional Products (GRP) using a one-way analysis of variance further justify the choice of the cluster groups. The procedure employed for this purpose is the General Linear Model (GLM) and the result is as showed in Table 5.

Fig. 5: Abridged dendrogram showing regions in cluster 4 with the inclusion of Ningxia and Qinghai regions

The overall F test is significant (F = 3.05, p = 0.0478), indicating a fair evidence that there is a significant difference in GRP outputs in the four clusters at 5% significance level. The Type I and III sums of squares in this analysis were the same or balanced because there were no missing cells in the data. Both are typically not equal when the data are unbalanced. Thus, the sum of square test shows that the clusters are less dependent on one another since F = 3.05 and p = 0.0478 at 5% (α = 0.05) significance level.

By ignoring the outliers observed in the data for Ningxia and Qinghai regions at the initial univariate analysis, the entire data set was again analyzed using Ward’s hierarchical clustering method. The result obtained shows that Ningxia and Qinghai regions could be considered under cluster 4 for the purpose of policy implementations that requires grouping of the entire regions into segments based on their energy consumption pattern. Hence, further discussion in this study about cluster 4 encircles Ningxia and Qinghai regions. Figure 5 is abridged dendrogram showing regions in cluster 4 with Ningxia and Qinghai regions.


The clusters obtained in this study are highly significant. It is close to providing a ‘natural’ grouping for the regions. For instance, in Crompton and Wu (2005), 13 Chinese regions were classified into developed and developing regions in order to illustrate their energy consumption per capita in 2002. Beijing, Tianjin, Shanghai, Jiangsu, Zhejiang, Shandong, Fujian and Guangdong were regarded as developed regions while Anhui, Guangxi, Guizhou, Yunnan and Shaanxi were classified under developing regions. All the regions in cluster 1 excluding Shandong falls within the developed regions while cluster 2 contains Shandong and all the regions, classified as developing regions with exception of Yunnan which falls under cluster 3. Ironically, regions in cluster 4 are the least energy consumption group in this analysis. This close similarity in the classification by Crompton and Wu (2005) with the result obtained in this study, despite the difference in terminology used, could be justified by considering that the various factors used mainly in categorization of regions as developed or developing regions are relatively energy consumption indicators or stimulating factors. Such factors includes the number of working population or per capita income of the populace, the numbers of industries found in the region, the number of modern houses and good road networks.

Ma et al. (2009) analyzed the substitution possibilities among energy, capital and labor for China classified the regions considered into 7 different broad groups. In their study, regions under group 6 were Chongqing, Sichuan, Shaanxi, Gansu, Guizhou and Yunnan regions while group 7 contained Inner Mongolia, Qinghai, Ningxia and Xinjiang. In these two groups, the first contains two regions while the last contains three regions including Ningxia and Qinghai regions that all belong to cluster 4 of this study. Based on Ma et al. (2009) classification and the geographical closeness of the regions within cluster 4, it is enough that robust analysis conducted in order to accommodate Ningxia and Qinghai regions into cluster 4 is reasonable.

Comparatively, the total electricity consumption in 2005 by the different clusters measured in 100 million KWH were 10,385.22 for cluster 1; 10,281.5 for cluster 2; 3,964.91 for cluster 3 and 3,610.34 for cluster 4. Yuan et al. (2007) noted that of all the coal consumed in China more than half of it, is for electricity production and in 2003, 53.5% of the coal was used for electricity generation. Thus, if this presumption is applied to electricity consumption by the various clusters in this study; it shows that more than half of the electricity consumption figures by the various regions in the clusters are from coal. Heavy reliance on coal to generate electricity causes severe environmental degradation and air pollution by emitting CO2, which is one of the main of greenhouse gas. If the present trend in energy consumption in China continues to grow without substantial changes in the current mix, it is inevasible that China cannot realize sustainable development.

Thus, clusters obtained in this study stand as alternative grouping for the regions rather than using information based on geopolitical classification, as used in Ma et al. (2009) or by basing on mere perception of the regions when searching for energy substitution possibilities for China. It was formed based on the various regions energy consumption indicators: energy consumption per unit gross regional product; electricity consumption per unit of gross regional product and energy consumption per unit of industrial value-added. It provides a conduit for measuring or comparing regions’ energy intensity levels and a possible segment for fixing energy policies that would directly address the rising threat posed by intense consumption levels by some regions; most especially regions found in cluster 1.

On the other hand, the Chinese government and stakeholders have invested a lot to treat industrial pollution and to fund afforestation programs in the regions. In 2006, the total amount of money invested on treatment of industrial pollution was 48.4 billion Yuan while investment in fixed asset for afforestation gulped 49.3 billion Yuan as compared to the year, 2000 when 23.5 and 16.1 billion Yuan were invested, respectively. Out of the total amount invested on treatment of industrial pollution in 2006, 27.9% of it went to the regions in cluster 1 while 37.4, 17.6 and 17.1% went to the regions in cluster 2, 3 and 4, respectively. Although, these efforts toward controlling environmental pollution caused by intensive energy consumption had accomplished some of its goals, more are still expected to be done.

China needs a more comprehensive and feasible policy measures to tackle these foreseen health hazard and environmental pollution that may result from energy-use by its regions. Segmentary energy-use tax structure should be enacted, whereby different regions based on clusters formulated in this study pay similar taxes or tariffs on energy consumed. This will go a long way in bringing atmospheric balances cutting across all the regions. A change in current energy mixture remains inevitable for China’s economic development. Incentive should be given to promote the development of renewable energy sources (such as hydro, wind, biomass and solar). The newly constructed three Gorges dam at Sanxia on the Upper Yangtse River, a huge hydro-power station, should be effectively and efficiently utilized. It is estimated that this dam will generate about 25,000 MW when fully utilized.

Improvement on coal mining and combustion technology should also be of more concern to the Chinese government and stakeholders in the Chinese energy industry. This will increase efficiency and decrease pollution in the course of mining, processing, transporting and usage of coal. For example, combined cycle is an electricity technology in which electricity is produced from otherwise-lost waste heat exiting from gas (combustion) turbines (Austin, 2005). The benefits of combined cycle generation are greater efficiency that are realized from an increased energy-use per unit of input and relatively lower emissions than traditional coal-based generation.


This study investigates the existence of possible distinct clusters in 30 Chinese regions based on their energy consumption pattern. The results obtained showed that there are possibly four-cluster grouping for the regions.

1:  Austin, A., 2005. Energy and Power in China: Domestic Regulation and Foreign Policy. Foreign Policy Centre, London, UK.

2:  Cole, B.D., 2003. Oil for the Lamps of China-Beijing`s 21st Century Search for Energy. McNair Paper 67. National Defense University Press, Washington, DC. USA.

3:  Crompton, P. and Y. Wu, 2005. Energy consumption in China: Past trends and future directions. Energy Econ., 27: 195-208.
CrossRef  |  

4:  Fisher-Vanden, K., G.H. Jefferson, M. Jingkui and X. Jianyi, 2006. Technology development and energy productivity in China. Energy Econ., 28: 690-705.
CrossRef  |  

5:  Fredriksen, K.A., 2006. China`s Role in the World: Is China a Responsible Stakeholder?. US China Economic and Security Review Commission, USA.

6:  Johnson, D.E., 1998. Applied Multivariate Methods for Data Analysts. Higher Education Press, USA., pp: 319-396.

7:  Ma, H., L. Oxley and J. Gibson, 2009. Substitution possibilities and determinants of energy intensity for China. Energy Policy, 37: 1793-1804.
CrossRef  |  Direct Link  |  

8:  Milligan, G.W., 1980. An examination of the effect of six types of error perturbation on fifteen clustering algorithms. Psychometrika, 45: 325-342.
CrossRef  |  

9:  National Bureau of Statistics, 2008. International Statistics Yearbook. China Statistics Press, Beijing, ISBN: 978-7-5037-5353-4.

10:  Rencher, A.C., 2002. Methods of Multivariate Analysis. 2nd Edn., John Wiley and Sons, New York, pp: 451-503.

11:  Rosen, D.H. and T. Houser, 2007. China Energy: A Guide for the Perplexed. Peterson Institute for International Economics, China.

12:  Yuan, J., C. Zhao, S. Yu and Z. Hu, 2007. Electricity consumption and economic growth in China: Cointegration and co-feature analysis. Energy Econ., 29: 1179-1191.
CrossRef  |  

13:  Zhang, Z.X., 2000. Can China afford to commit itself an emissions cap? An economic and political analysis. Energy Econ., 22: 587-614.
CrossRef  |  

14:  Donoghue, J.R., 1994. Variable screening for cluster analysis. Educational Testing Service, Princeton, New Jersey.

15:  Der, G. and B. Everitt, 2002. A Handbook of Statistical Analysis Using SAS. 2nd Edn., Chapman and Hall, USA., ISBN: 1-5848-8245-X, pp: 101-116.

16:  National Bureau of Statistics, 1989. China Statistical Yearbook. China Statistics Press, Beijing, ISBN: 7-5037-0209-9.

17:  National Bureau of Statistics, 1999. China Statistical Yearbook. China Statistics Press, Beijing, ISBN: 7-5037-2920-1.

18:  National Bureau of Statistics, 2006. China Statistical Yearbook. China Statistics Press, Beijing, ISBN: 7-5037-5001-4.

19:  National Bureau of Statistics, 2007. China Statistical Yearbook. China Statistics Press, Beijing, ISBN: 978-7-5037-5124-0.

©  2021 Science Alert. All Rights Reserved