Subscribe Now Subscribe Today
Research Article

Pseudo-additive Entropies of Degree-q and the Tsallis Entropy

B.H. Lavenda and J. Dunning-Davies
Facebook Twitter Digg Reddit Linkedin StumbleUpon E-mail

The Tsallis entropy is shown to be a pseudo-additive entropy of degree-q that information scientists have been using for almost forty years. Neither is it a unique solution to the nonadditive functional equation from which random entropies are derived. Notions of additivity, extensivity and homogeneity are clarified. The relation between mean code lengths in coding theory and various expressions for average entropies is discussed.

Related Articles in ASCI
Similar Articles in this Journal
Search in Google Scholar
View Citation
Report Citation

  How to cite this article:

B.H. Lavenda and J. Dunning-Davies , 2005. Pseudo-additive Entropies of Degree-q and the Tsallis Entropy. Journal of Applied Sciences, 5: 315-322.

DOI: 10.3923/jas.2005.315.322



In 1988 Tsallis[1] published a study containing an expression for the entropy which differed from the usual one used in statistical mechanics. Previous to this, the Rényi entropy was used as an interpolation formula that connected the Hartley-Boltzmann entropy to the Shannon-Gibbs entropy. Notwithstanding the fact that the Rényi entropy is additive, it lacks many other properties that characterize the Shannon-Gibbs entropy. For example, the Rényi entropy is not subadditive, recursive, nor does it possess the branching and sum properties[2]. The so-called Tsallis entropy fills this gap, while being nonadditive, it has many other properties that resemble the Shannon-Gibbs entropy. It is no wonder then that this entropy fills an important gap.

Yet, it appears odd, to say the least, that information scientists have left such a gaping void in their analysis of entropy functions. A closer analysis of the literature reveals that this is not the case and, indeed, a normalized Tsallis entropy seems to have first appeared in a 1967 by Havrda and Charvat[3] who introduced the normalized ‘Tsallis’ entropy


for a complete set of probabilities, and parameter q>0, but q ≠ 1. The latter requirement 6 is necessary in order that (1) possess the fundamental property of the entropy, that is, it is a concave function. According to Tsallis[4], only for q>0 is the entropy, (1), said to be expansible[2] [cf. (6) below].


The properties used to characterize the entropy are[2,5].



where the nonnegative n-tuple, (p) = (p1,...., pn), forms a complete probability distribution. For ordinary means, the n-tuple, (x) = (x1,...., xn), represents a set of nonnegative numbers which constitute a set independent variables. What constitutes the main difficulty in proving theorems on characterizing entropy functions in information theory is that the ‘independent variables’, (x), are not independent of their ‘weights’, (p)[6].

Coding theory, derives the functional dependencies in a very elegant way through optimization. The entropies S(xi) represent the costs of encoding a sequence of lengths xi, whose probabilities are pi. Minimizing the mean length associated with the cost function, expressed as the weighted mean of the cost function, gives the optimal codeword lengths xi as functions of their probabilities, pi. Consequently, the entropies that result when the xi are evaluated at their optimal values by expressing them in terms of their probabilities, pi, constitute lower bounds to the mean lengths for the cost function.





where, [ ] denotes any arbitrary permutation of the indices on the probabilities. For the entropy, the symmetry property (4) means that it should not depend upon the order in which the outcomes are labelled.

The sum property


where, is a measurable function on ]0, 1[.



meaning that the entropy should not change when an outcome of probability zero is added.

Recursivity of degree-q


asserting that if a choice is split into two successive choices, the original entropy will be the weighted sum of the individual entropies. Recursivity implies the branching property by requiring at the same time the additivity of the entropy as well as the weighting of the different entropies by their corresponding probabilities[7].





Additivity of degree-q


for any two complete sets of probabilities, (p) and (q). As late as 1999, Tsallis[4] refers to (10) as exhibiting “a property which has apparently never been focused before and which we shall refer to as the composability property”. Here, composability means something different than in information theory[2], in that it “concerns the nontrivial fact that the entropy S(A+B) of a system composed of two independent subsystems A and B can be calculated from the entropies S(A) and S(B) of the subsystems, without any need of the microscopic knowledge about A and B, other than the knowledge of some generic universality class, herein the nonextensive universality class, represented by the entropic index q...”[4].

However, the additive entropy of degree-q, (1), is not the only solution to the functional equation (10) for q ≠ 1. The average entropy:


also satisfies (10), with the only difference that (1-q)/q replaces the coefficient in the multiplicative term[8]. Since the weighted mean of degree-q is homogeneous, the pseudo-additive entropy (11) is a first-order homogeneous function of (p), SAn,q (λp1,... ,λpn) = λSAn,q (p1,... , pn). It can be derived by averaging the same solution to the functional equation (10), in the case q ≠ 1, as that used to derive the Tsallis entropy, except with a different exponent and normalizing factor, under the constraint that the probability distribution is complete[9]. Although the pseudo-additive entropy (11) lacks the property of recursivity, (7), it is monotonic, continuous and concave for all positive values of q. Weighted means have been shown to be measures of the extent of a distribution[10,11] and (11) relates the entropy to the weighted mean rather than to the more familiar logarithm of the weighted mean, as in the case of the Shannon and Rényi entropies.

Tsallis, in fact, associates additivity with extensivity in the sense that for independent subsystems


According to Tsallis[4], superadditivity, q<1, would correspond to superextensivity and subadditivity, q>1, would correspond to subextensivity. According to Callen[12], extensive parameters have values in a composite system that are equal to the sum of the values in each of the systems. Anything that is not extensive is labelled intensive, although Tsallis would not agree [cf. (30) below]. For instance if we consider black-body radiation in a cavity of volume V, having an internal energy, U and magnify it λ times, the resulting entropy


will be λ times the original entropy, S(U, V), where, σ is the Stefan-Boltzmann constant. Whereas extensitivity involves magnifying all the extensive variables by the same proportion, additivity in the sense of being superadditive or subadditive deals with a subclass of extensive variables, because the condition of extensivity of the entropy imposes that the determinant formed from the second derivatives of the entropy vanish[13]. The entropy of black-body radiation, (13), is extensive yet it is sub-additive in either of the extensive variables. The property of subadditivity is what Lorentz used to show how interactions lead to a continual increase in entropy[13]. This is a simple consequence of Minkowski’s inequality,

where, u = U/V is the energy density. Hence, (sub-or super-) extensivity is something very different from (sub-or super-) additivity.

Strong additivity of degree-q


where, qij is the conditional probability. Strong additivity of degree-q describes the situation in which the sets of outcomes of two experiments are not independent. Additivity of degree-q, (10), follows from strong additivity by setting q1k = q2k =… = qmk = qk and taking (1) into consideration[2].

A doubly stochastic matrix (qij), where, m = n, is used in majorization to distribute things, like income, more evenly[14] and this leads to an increase in entropy. For if



it follows from the convexity of Ψ= x ln x, or



since We may say that p majorizes q, p>q if and only if (15) holds for some doubly stochastic matrix (qij)[15]. A more even spread of incomes increases the entropy. Here we are at the limits of equilibrium thermodynamics because we are invoking a mechanism for the increase in entropy, which in the case of incomes means taking from the rich and giving to the poor[16]. This restricts q in the ‘Tsallis’ entropy to ]0, 1[. Values of q in ]1, 2[ show an opposing tendency of balayage or sweeping out[17]. Whereas averaging tends to decrease inequality, balayage tends to increase it[16].

Yet Tsallis[4] refers to processes with q<1, i.e. pqi>pi, as rare events and to q>1, i.e. pqi<pi as frequent events. However, only in the case where, q<1 will the Shannon entropy, (16) be a lower bound to other entropies like, the Rényi entropy


which is the negative logarithm of the weighted mean of piq-1. The Rényi entropy has the attributes of reducing to the Shannon-Gibbs entropy, (16), in the limit as q→1 and to the Hartley-Boltzmann, entropy


in the case of equal apriori probabilities pi = 1/n. This leads to the property of



for any given integer n≥2. The right-hand side of (19) should be a monotonic increasing function of n. As we have seen, the tendency of the entropy to increase as the distribution becomes more uniform is due to the property of concavity (2). Hence, it would appear that processes with q<1 would be compatible with the second law of thermodynamics, rather than being rare exceptions to it!

Continuity: The entropy is a continuous function of its n variables. Small changes in the probability cause correspondingly small changes in the entropy. Additive entropies of degree-q are small for small probabilities, i.e.,


The analogy between coding theory and entropy functions has long been known[18]. If k1,...,kn are the lengths of codewords of a uniquely decipherable code with D symbols then the average codeword length


is bounded from below by the Shannon-Gibbs entropy (16) if the logarithm is to the base D. The optimal code-word length is ki = -ln pi, which represents the information content in event Ei. When D = 2, pi = 1/2 and contains exactly one bit of information.

Ordinarily, one tries to keep the average codeword length (20) small, but it cannot be made smaller than the Shannon-Gibbs entropy. An economical code has frequently occurring messages with large pi and small ki. Rare messages are those with small pi and large ki. The solution ni = -ln pi has the disadvantage that the codeword length is very great if the probability of the symbol is very small. A better measure of the codeword length would be:


where τ = (1-q)/q, thereby limiting q to the interval [0, 1]. As τ→∞, the limit of (21) is the largest of the ki, independent of pi. Therefore, if q is small enough, or tau large enough, the very large ki’s will contribute very strongly to the average codeword length (21), thus keeping it from being small even for very small pi. The optimal codeword length is now:

showing that the Rényi entropy is the lower bound to the average codeword length (21)[19]. Just as the are the optimum probabilities for the Shannon-Gibbs entropy, the optimum probabilities for the Rényi entropy are the so-called escort probabilities,


As pi→0, the optimum value of ki is asymptotic to -q ln pi so that the optimum length is less than ln pi for q<1 and sufficiently small pi. This provides additional support for keeping q within the interval [0,1][17].

Although the Rényi entropy is additive it does not have other properties listed above; for instance, it is not recursive and does not have the branching property nor the sum property. It is precisely the ‘Tsallis’ entropy which fills the gap, while not being additive, it has many of the other properties that an entropy should have[20]. Therefore, in many ways the additive entropy of degree-q (1) is closer to the Shannon entropy, (16) than the Rényi entropy is[21]. The so-called additive entropies of degree-q can be written as:


where the function f is a solution to the functional equation:

subject to f(0) = f(1), which was rederived by Curado and Tsallis[22]. The property of additivity of degree-q (10) was referred to them as pseudo-additivity, omitting the original references. What these authors appeared to have missed are the properties of strong additivity, (14) and recursivity of degree-q (7). These properties can be proven by direct calculation using the normalized additive entropy of degree-q, (1). Additive entropies of degree-q≥1 are also subadditive.

Moreover, additive entropies of degree-q satisfy the sum property, (5) where,


Only for q>0 will (24) and consequently (1), be concave since

where, the prime stands for differentiation with respect to pi. This is contrary to the claim that the additive entropy of degree-q is “extremized for all values of q"[1]. It can easily be shown that the concavity property

implies the monotonic increase in the entropy (19). Setting pi = 1/n and using the sum property (5) lead to

showing that Sn,q(1/n,... , 1/n) is maximal.

In order to obtain explicit expressions for the probabilities, Tsallis and collaborators maximized their non-normalized entropy


with respect to certain constraints. Taking their cue from Jaynes’[23] formalism of maximum entropy, (25) was to be maximized with respect to the finite norm[24]

and the so-called q average of the second moment[22]


The latter condition was introduced because the variance of the distribution did not exist and the weights, (pq), have been referred to as ‘escort’ probabilities [cf. (22) above]. The resulting distribution is almost identical to Student’s distribution


where, (3-q)/(q-1) is the number of degrees of freedom and μ is the Lagrange multiplier for the constraint (26)[25].

The Gaussian distribution is the only stable law with a finite variance, all the other stable laws have in finite variance. These stable laws have much larger tails than the normal law which is responsible for the infinite nature of their variances. The initial distributions are given by the intensity of small jumps[26], where the intensity of jumps having the same sign of x and greater than x in absolute value is[27].


for x>1. For β<1, the generalized random process, which is of a Poisson nature, produces only positive jumps, whose intensity (28) is always increasing. No moments exist and the fact that


where, λ is both positive and real, follows directly from Pσlya’s theorem: If for each λ, Z(0) = 1, Z (λ)≥0, Z(λ) = Z (-λ), Z(λ) is decreasing and continuous convex on the right half interval, then Z(λ) is a generating function[28]. Convexity is easily checked for 0<α≤1 and it is concluded that Z(λ) is a generating function. In other words,

exists for a positive argument of the Gamma function and that implies β<1.

This does not hold on the interval 1<β<2, where it makes sense to talk about a compensated sum of jumps, since a finite mean exists. In the limit β = 2, positive and negative jumps about the mean value become equally as probable and the Wiener-Levy process results, which is the normal limit. If one introduces a centering term in the expression, -λx, the same expression for the generating function, (29), is obtained to lowest power in λ, as λ→0 and x →∞, such that their product is finite.

These stable distributions, 0<β<1, (and quasistable ones, 1<β<2, because the effect of partial compensation of jumps introduces an arbitrary additive constant) are related to the process of super-diffusion, where the asymptotic behavior of the generalized Poisson process has independent increments with intensity[28]. For strictly stable processes, the super-diffusion packet spreads out faster than the packet of freely moving particles, while a quasi-stable distribution describes the random walk of a particle with a finite mean velocity. It was hoped that these tail distributions could be described by an additive entropy of degree-q, where the degree of additivity would be related to the exponent of the stable, or quasistable, distribution. Following the lead of maximum entropy, where the optimal distribution results from maximizing the entropy with all that is known about the system, the same would hold true for maximizing the additive entropy of degree-q. However, it was immediately realized that the variance of the distribution does not exist.

Comparing the derivative of the tail density (28) with (27) identifies β = (3-q)/(q-1), requiring the stable laws to fall in the domain 5/3<q<3[24]. However, it is precisely in the case in which we are ignorant of the variance that the Student distribution is used to replace the normal since it has much fatter tails and only approaches the latter as the number of degrees of freedom increases without limit[27]. Just as the ratio of the difference of the mean of a sample and the mean of the distribution to the standard deviation is distributed normally, the replacement of the standard deviation by its estimator is distributed according to the Student distribution. This distribution (27) was not to be unexpected, because it stands in the same relation to the normal law as the ‘Tsallis’ entropy, (25), is to the shannon entropy in the limit as the number of degrees of freedom is allowed to increase without limit.

Whereas weighted means of order-q

do have physical relevance for different values of q, the so-called q-expectation

has no physical significance for values of q ≠ 1. Since the connection between statistical mechanics and thermodynamics lies in the association of average values with thermodynamic variables, the q-expectations would lead to incorrect averages. This explains why for Tsallis the internal energy of a composite system is not the same as the internal energies of the subsystems and makes the question “if we are willing to consider the nonadditivity of the entropy, why it is so strange to accept the same for the energy?"[29] completely meaningless. Yet, the zeroth law of thermodynamics and the derivation of the Tsallis nonintensive inverse temperature,


where, Uq is the q-expectation of the internal energy, rest on the fact that the total energy of the composite system is conserved[30].

It is as incorrect to speak of ‘Tsallis’ statistics[31] as it would be to talk of Rényi statistics. These expressions are mere interpolation formulas leading to statistically meaningful expressions for the entropy in certain well-defined limits. Whereas for the Rényi entropy the limits q→1 and q→0 give the Shannon-Gibbs and Hartley-Boltzmann entropies, respectively, without assuming equal probabilities, the additive entropy of degree-q reduces to the Shannon entropy in the limit as q→1, but it must further be assumed that the a priori probabilities are equal in order to reduce it to the Hartley-Boltzmann entropy. Hence, only the Rényi entropies are true interpolation formulas.

Either the average of -ln pi leading to the Shannon entropy, or the negative of the weighted average of piq-1, resulting in the Rényi entropy will give the property of additivity[2]. Whereas the Shannon entropy is the negative of the logarithm of the geometric mean of the probabilities,


is the geometric mean, the Rényi entropy is the negative of the logarithm of the weighted mean


is the weighted mean of piq-1. If the logarithm is to the base 2, the additive entropies of degree-q are exponentially related to the Rényi entropies of order-q by

which make it apparent that they cannot be additive. But nonadditivity has nothing to do with nonextensivity.

As a concluding remark it may be of interest to note that undoubtedly the oldest expression for an additive entropy of degree-2 was introduced by Gini[32] in 1912, who used it as an index of diversity or inequality. Moreover, generalizations of additive entropies of degree-q are well-known. It has been claimed that “Tsallis changed the mathematical form of the definition of entropy and introduced a new parameter q"[33]. Generalizations that introduce additive entropies of degree-q+ri-1[34]

with n+1 parameters, should give even better results when it comes to curve fitting.

1:  Tsallis, C., 1988. Possible generalization of boltzmann-gibbs statistics. J. Statist. Phys., 52: 479-487.
Direct Link  |  

2:  Tsallis, C., 1999. Nonextensive statistics: Theoretical, experimental and computational evidences and connections. Braz. J. Phys., 29: 1-35.
Direct Link  |  

3:  Boekee, D.E. and J.C.A. van der Lubbe, 1980. The R-norm information measure. Inform. Control, 45: 136-155.
CrossRef  |  Direct Link  |  

4:  Arimoto, S., 1971. Information-theoretical considerations on estimation problems. Inform. Control, 19: 181-194.

5:  Lavenda, B.H., 1998. Measures of information and error laws. Int. J. Theoret. Phys., 37: 3119-3137.
Direct Link  |  

6:  Daroczy, Z., 1970. Generalized information functions. Inform. Control, 16: 36-51.
Direct Link  |  

7:  Jaynes, E.T., 1957. Information theory and statistical mechanics. Phys. Rev., 106: 620-630.
CrossRef  |  Direct Link  |  

8:  Tsallis, C., S.V.F. Levy, A.M.C. Souza and R. Maynard, 1995. Statistical mechanical foundation of the ubiquity of Levy distributions in nature. Phys. Rev. Lett., 75: 3589-3593.
Direct Link  |  

9:  Lavenda, B.H., 2004. On the definition of fluctuating temperature. Open Syst. Inform. Dyn., 11: 139-146.
Direct Link  |  

10:  Lavenda, B.H., 2004. The linear Ising model and its analytic continuation, random walk. Nuovo Cimento B, 119: 181-189.
Direct Link  |  

11:  Tsallis, C., R.S. Mendes and A.R. Plastino, 1998. The role of constraints within generalized nonextensive statistics. Physica A, 261: 534-554.
Direct Link  |  

12:  Abe, S., 1999. Correlation induced by Tsallis' nonextensitivity. Physica A, 269: 403-409.
CrossRef  |  Direct Link  |  

13:  Cho, A., 2003. Revisiting disorder and tallis statistics. Science, 300: 249-251.
Direct Link  |  

14:  Cho, A., 2002. A fresh take on disorder or disorderly science. Science, 297: 1268-1269.
Direct Link  |  

15:  Rathie, P.N., 1971. A generalization of the non-additive measures of uncertainty and information and their axiomatic characterizations. Kybernetika, 7: 125-125.

16:  Curado, E.M.F. and C. Tsallis, 1991. Generalized statistical mechanics: Connection with thermodynamics. J. Phys. A: Math. Gen., 24: L69-L72.
CrossRef  |  Direct Link  |  

17:  Aczel, J. and Z. Daroczy, 1975. On the Measures of Information and Their Characterization. 1st Edn., Academic Press, New York.

18:  Mathai, A.M. and P.N. Rathie, 1975. Basic Concepts in Information Theory and Statistics. 1st Edn., Wiley, New York.

19:  Aczel, J., 1966. Lectures on Functional Equations and their Applications. 1st Edn., Academic Press, New York.

20:  Campbell, L.L., 1966. Exponential entropy as a measure of extent of a distribution. Probability Theory Related Fields, 5: 217-225.
CrossRef  |  

21:  Callen, H.B., 1985. Thermodynamics and Introduction to Thermostatics. 2nd Edn., Wiley, New York.

22:  Lavenda, B.H., 1991. Statistical Physics a Probabilistic Approach. 1st Edn., Wiley Interscience, New York.

23:  Marshall, A.W. and I. Olkin, 1979. Inequalities Theory of Majorization and its Applications. 1st Edn., Academic Press, San Diego.

24:  Hardy, G., J.E. Littlewood and G. Polya, 1952. Inequalities. 2nd Edn., Cambridge University Press, Cambridge, pp: 31-68.

25:  Arnold, B.C., 1987. Majorization and the Lorenz Order a Brief Introduction. 1st Edn., Springer Verlag, Berlin.

26:  Renyi, A., 1961. On Measures of entropy and information. Proceedings of the 4th Berkeley Symposia Mathematics Statistics and Probability Statistics and Probability, (BSMSPSP'61), University of California Press, pp: 547-561.

27:  Campbell, L.L., 1965. A coding theorem and renyi's entropy. Inform. Control, 8: 423-429.
CrossRef  |  

28:  Aczel, J. and J.D. Bres, 1972. Functional Equations in Several Variables. 1st Edn., Cambridge Univ. Press, Cambridge, MA.

29:  Lavenda, B.H., 1998. The Analogy between Coding Theory and Multifractals. J. Phys. A, 31: 5651-5660.
Direct Link  |  

30:  Lavenda, B.H., 2004. Information and coding discrimination of pseudo-additive entropies (PAE). Open Syst. Inform. Dyn., 11: 257-266.
CrossRef  |  Direct Link  |  

31:  Souza, A.M.C. and C. Tsallis, 1997. Student's t-and r-distributions a unified derivation from an entropic variational principle. Physica A, 236: 52-57.
Direct Link  |  

32:  Finetti, B., 1990. Theory of Probability. Vol. 2, John Wiley and Sons, New York.

33:  Chung, K.L., 1974. A Course in Probability Theory. 1st Edn., Academic Press, San Diego, USA.

©  2021 Science Alert. All Rights Reserved