HOME JOURNALS CONTACT

Journal of Applied Sciences

Year: 2005 | Volume: 5 | Issue: 5 | Page No.: 920-927
DOI: 10.3923/jas.2005.920.927
Concerning Tsallis`s Condition of Pseudo-additivity as a Definition of Non-extensivity and the Use of the Maximum Entropy Formalism
B.H. Lavenda and J. Dunning- Davies

Abstract: The pseudo-additive relation satisfied by the Tsallis entropy is not linked with the super- or sub-additive properties of the entropy. These latter properties, like concavity and convexity, are fundamentally geometric inequalities and cannot be reduced to equalities. The pseudo-additivity relation is, rather, a functional equation that determines the functional forms of the random entropies. A similar pseudo-additive relation is satisfied by the Arimoto entropy which is a first-order homogeneous form. No conclusions, based on the pseudo-additive functional equation, may be drawn about the extensive nature of systems from either the Tsallis or Arimoto entropy. Further, it is shown that Tsallis` statistical thermodynamic formulation of the non-additive entropy of degree-α is neither correct nor self-consistent.

Fulltext PDF Fulltext HTML

How to cite this article
B.H. Lavenda and J. Dunning- Davies, 2005. Concerning Tsallis`s Condition of Pseudo-additivity as a Definition of Non-extensivity and the Use of the Maximum Entropy Formalism. Journal of Applied Sciences, 5: 920-927.

Keywords: homogeneity, Gauss’s principle, non-extensivity, maximum entropy formalism, Pseudo-additive entropies and tsallis and arimoto entropies

INTRODUCTION

According to Tsallis[1], the entropy of a composite system composed of two independent systems A and B is given by the functional equation:

(1)

where:

(2)

is the Tsallis entropy for all α>0 and energy units for which k = 1 have been used. Also, the probability distribution P = (p1,....,pm) is complete. However, this pseudo-additive relation has nothing to do with the super-and sub-additivity properties of the entropy. These latter properties, like concavity and convexity, are couched in geometric inequalities and simply cannot be reduced to equalities. Rather, (1) is a functional equation that determines the functional forms of the random entropies. A similar pseudo-additive relation is satisfied by the Arimoto entropy which, unlike (2), is a first-order homogeneous form. Hence, no conclusions concerning the extensive nature of systems may be drawn from either the Tsallis or Arimoto entropies based on the functional equation of the form (1).

Further, it is well-known that the maximum entropy formalism[2], the minimum discrimination information[3] and Gauss’ principle[4,5], all lead to the same results when a certain condition on the prior probability distribution is imposed[6]. All these methods lead to the same form of the posterior probability distribution; namely, the exponential family of distributions. Tsallis et al.[7] have attempted to adapt the maximum entropy formalism that uses the Shannon entropy , to the pseudo- additive entropy (2). In order to come out with analytic expressions for the probabilities that maximise the nonadditive entropy, they found it necessary to use escort probabilities[8] of the same power as the nonadditive entropy. If the procedure they use is correct then Gauss’ principle should yield the same optimum probabilities. However, it will be shown that the Tsallis result requires the prior probability distribution be given by the same unphysical condition as the maximum entropy formalism and that the potential of the error law be required to vanish. The potential of the error law is what information theory refers to as the error[9]; that is, the difference between the inaccuracy and the entropy. Unless the ‘true’ probability distribution, P = {p(x1), p(x2),...,p(xm)}coincides with the estimated probability distribution, Q = {q(x1), q(x2),...,q(xm)}, the error does not vanish. Moreover, the two averaging procedures, one using the escort probabilities explicitly, will be shown to be inequivalent and the relation between the potential of the error law and the pseudo-additive entropy to require the latter to vanish when the former vanishes.

Condition of pseudo-additivity as a definition of non-extensivity: Supposedly[1], A and B are two independent systems in the sense that the probability of (A + B) factorises into those of A and B (that is, pij(A + B) = pi(A)pj(B)). It follows immediately that, since Sα≥0 (non-negativity property) in all cases, α<1, α = 1, α>1, respectively correspond to superadditivity (superextensivity), additivity (extensivity) and subadditivity (subextensivity). It does seem odd that criteria of super-and sub-additivity may be obtained via a functional equation rather than as a geometric inequality as are the criteria for convexity and concavity. The subadditive property is the triangle inequality[10], whereas the geometrical interpretation of a concave function is one that never rises above its tangent plane at any point. There are classes of functions which are defined by inequalities that are weaker than convexity (concavity) and stronger than superadditivity (subadditivity)[11].

To prove that (2) is always subadditive, consider the Companions to Minkowski’s inequalities are[12]

(3)

for α>1 and

(4)

for 0<α<1, where, Q = (q1,....,qm) is another complete distribution. Inequality (3) is the condition for superadditivity, the negative of which is subadditive. Hence, for α>1 the Tsallis entropy is subadditive. Inequality (4) is the criterion for subadditivity and hence, the Tsallis entropy is subadditive for 0<α<1. This contradicts the conclusion, stated above, that the entropy (2) is superadditive for 0<α<1 on the basis of (1).

Actually, the functional equation (1) is incorrect and should be replaced by[12]:

(5)

given in terms of two sets of complete probability distributions P and Q. Expression (5) has nothing to say about the extensivity of the entropy or the lack thereof; rather, it is a functional equation which determines the form of the entropy. The pseudo-additive entropies are averages of the generators of power means. Power means are the only quasi-arithmetic means with a certain homogeneity property that will now be used to derive these random entropies.

Power means are first order homogeneous[12]

(6)

where, x stands for a set of discrete variables, x1,....,xm. Since φ is defined up to a constant, set

(7)

Now (6) stands for

(8)

On account of the translational invariance of the weighted mean

(9)

where, a ≠ 0 and b are functions of the parameter λ. Since it is apparent from (8) that ψ(x) = φ(λx), then

(10)

However, if (7) is to be satisfied, b(λ) = φ(λ). Then, letting λ become another positive variable, y, (9) becomes the functional equation

(11)

Also,

(12)

Subtracting (12) from (11) leads to

(13)

where, c is a constant which both expressions must equal since each depends on a different independent variable. Solving (13) for a(y) and substituting back into (11) gives the functional equation

(14)

which is known in information science as the additivity property of degree-α.

If c = 0, this reduces to the classical equation

whose most general solution is φ = Alogx.

If x represents the probability p, then the best code length for an input symbol of probability p is

(15)

where, A = -1 and ni is the length of sequence i. Then, the average length corresponds to the average entropy[13]

(16)

which is an additive entropy of degree-1, namely, the Shannon entropy.

With c ≠ 1, insert cφ(x) + 1 = g(x) into (14) to give

(17)

Since the general solution of the functional Eq. 17 is g(x) = xr, the random entropy function is

(18)

Since (18) has to reduce to (15) in the limit as -c → +0, so that r = -c. Hence (18) becomes

(19)

where, c = 1 - α. In the limit α → 1, it reduces to (16). The average entropy (19) is the pseudo-additive entropy of degree-α[14-21] (2), which reduces to the additive entropy (16) in the limit as α → 1.

There is nothing unique about (19). If our only concern is that it reduces to (15) in the limit α → 1, then an equally likely candidate is:

(20)

If (20) is averaged with respect to Q, then P must be chosen in terms of Q such that the new P are normalised. In terms of a variational problem, it is necessary that

where, the constraint has been introduced using the Lagrange multiplier μ. Performing the variation leads to pi = (qi /μ)1/α and so that P be normalised

Introducing these so-called ‘escort’ probabilities[8] into (20) and averaging gives

(21)

where, α = 1/β. The first order homogeneous entropy (21) appears to have been introduced first by Arimoto[22] and subsequently given a complete characterisation by Boekee and van der Lubbe[23].

Just as the Tsallis entropy (2) obeys the pseudo-additivity relation (5), the Arimoto entropy (21) satisfies

(22)

Adopting Tsallis’ line of reasoning, it would follow from (22) that (21) is subadditive for β >1 and superadditive for β<1. However, according to Minkowski’s inequalities[12]

(23)

for β>1 and

(24)

for β<1. Thus, is subadditive for β>1 andm superadditive for β<1. Since the negative of the former and positive of the latter are used to construct the Arimoto entropy, it may be concluded that it is always superadditive.

Moreover, concavity follows by setting pi = λai and qi = (1-λ)bi, where the sets of numbers ai and bi are non-negative and 0≤λ≤1. Then inequality (23) becomes the criterion for convexity of

for β>1 and

for β<1 is the condition for its concavity. Since the Arimoto entropy takes the negative of the former and positive of the latter, it is concave everywhere[23].

Similarly, the concavity of the Tsallis entropy (2) may be established from the observations that is convex for α>1 and concave for 0<α<1. Since the Tsallis entropy is formed from the negative of the former and positive of the latter, it is concave for all α>0.

Furthermore, since the entropy is defined to within a constant classically, the Arimoto entropy (21) is a first order homogeneous function and yet it obeys the pseudo-additive relation (22), just as the Tsallis entropy (2) obeys the pseudo-additive relation (5). Hence, it may be concluded that this relation has nothing whatsoever to do with the extensivity properties, or lack thereof, of the pseudo-additive entropies.

Use of the maximum entropy formalism: Now let X e a random variable whose values x1, x2,...,xm are obtained at m independent trials. Prior to the observations, the distribution is Q and after the observations, the unknown probability distribution is P. The observer has at his disposal the statistic

to help him formulate a guess as to the form of Q. Gauss’ principle assumes that the probability distribution P depends on a parameter a

(25)

such that the arithmetic mean, is the maximum likelihood estimate of a. Furthermore, P will depend on the parameter a so that there is value a0 for which p(xi ; a0) = q(xi), the prior distribution.

The maximum likelihood estimate,

will lead to the exponential family of distributions when the log-likelihood function

The likelihood equation

where ψ(xi; a) = log p(xi; a) is the same as requiring

and any deviations in one will immediately lead to deviations in the other. Hence, they must be proportional to one another. Choosing the coefficient of proportionality as the second derivative of some appropriate scalar function V gives[6]

(26)

where, the prime denotes differentiation with respect to the argument. The scalar potential, V(a), must be independent of the xi because the left-hand side is a function of xi alone and a similar equation for xj would lead to a contradiction. The potential may be assumed such that V(a0 ) =0 and consequently, (26) may be written as:

Integrating from a0 to a gives

where, λ (a) = V’(a).

Usually, the log-likelihood function is logarithmic and the exponential family of distributions

(27)

results. Averaging both sides with respect to the probability distribution P gives

In information theory, the first term is the negative of the Shannon entropy, the second term is the inaccuracy and the right-hand side is the error[9]. On the strength of Shannon’s inequality

(28)

the inaccuracy cannot be smaller than the Shannon entropy. Shannon’s inequality follows from the arithmetic-geometric mean inequality,

with xi = q(xi)/p(xi; a)

When Q is the uniform distribution, that is, q(xi) = 1/m for all i, Shannon’s inequality, (28), becomes

which has been referred to as the entropy reduction caused by the application of a constraint that produces a finite value of a[24]. S0 (1/m) = logm is the maximum entropy and is known as Hartley entropy in information theory. Classically, the entropy is defined to within a constant; only entropy differences are measurable.

However, none of what has been said so far applies to equilibrium thermodynamics[5]. If (27) is averaged with respect to the Q distribution instead of the P distribution and Shannon’s inequality,

is used, a problem is encountered immediately because V(a) must be negative now. In statistical mechanics, q(xi) represents the surface of constant energy of a hypersphere of high dimensionality[25]. Because of its high dimensionality, the volume of the hypersphere lies very close to its surface so that q(xi) may be thought of as the volume of phase space occupied by the system. Averages are performed with respect to this non-normalisable prior probability distribution[25]. In order to keep the error V(a), which will be identified as the thermodynamic entropy, positive, it is necessary to introduce a sign change in (27). This sign change may be rationalised as follows: the exponential factor, will not overpower the rapidly increasing factor of the density of states, q(xi). However, the density of states cannot increase faster than a certain power of the radius, xi[25], of the phase space volume, which is proportional to xim in a hypersphere of m-dimensions. What is required is an even more rapidly decreasing exponential factor

According to the Boltzmann-Planck interpretation, q(xi) is not a normalised probability but, rather, a ‘thermodynamic’ probability, being proportional to the volume of phase space occupied by the system. The (random) entropy S(xi) is defined as the logarithm of the thermodynamic probability

The phase average is given by:

The thermodynamic entropy is the phase space average of that is:

and its Legendre transform

defines the logarithm of the generating function, Z(λ). The inaccuracy now appears as the difference between the thermodynamic entropy and the average of the random entropies

(29)

The inequality follows from the fact that S increases in the wide sense and is concave. The expectation a may be taken with respect to either P or Q. The two averages must coincide, otherwise there would not be a single general thermodynamics, but, rather a “microcanonical thermodynamics” and a separate “canonical thermodynamics”[26]. Taken with respect to Q, (29) is Jensen’s inequality for a concave function, where the Q has positive components which are arbitrary otherwise. Taken with respect to the normalised P, (29) is the Jensen-Petrovic inequality[27], where for each j = 1,....,m.

The average of m variables is likely to be considerably greater than any of its components. Then, if S is increasing,

for j = 1,....,m. Multiplying by q(xi) and summing give back (29). This does not mean that S(xi)/xi should not decrease: A sufficient condition for is that S(xi)/xi should decrease.

The criteria for inequality attenuation[28] are that S(xi) be an increasing function and S(xi)/xi decrease, that is, it is anti-star shaped. Fluctuations give rise to inaccuracy (29) and in their absence, a function of the average equals an average of the function. Therefore, if the exponential probability distribution, (27), is to coincide with Gauss’ error law, written in terms of the concavity of the entropy,

(30)

where , lies between xi and a, then sign changes are required. When this is done, (27) becomes

(31)

A comparison of these latter two equations shows that the entropy, S(a), is the potential, V(a), that determines the error law[5]. The concavity of the entropy ensures that the exponent will be negative and hence p(xi;a) will be less than unity. The parameter, λ(a), is still the derivative of the scalar potential, V(a), but since this potential now coincides with the thermodynamic entropy, S(a), the Lagrange multiplier λ(a) is identified as the intensive variable divided by the absolute temperature in the entropy representation.

Information theoretic entropies and the entropy reduction of the thermodynamics of extremes[24] are not amenable to the previous thermodynamic interpretation, where the entropy is defined as the logarithm of the volume of phase space occupied by the system. Since all the volume lies very near the surface in a thermodynamic system of high dimensionality, the volume of phase space will coincide with the surface area, which is referred to as the structure function[25].

In what was eventually a futile attempt to justify Tsallis’ formalism, Plastino and Plastino[29] considered a structure function of the form Eim-1 for the energy. Assuming a bounded phase space-for no stated reason-whose total energy is E0, the Tsallis exponent was identified as α = (m-2)/(m-1) and the inverse temperature was defined as β = (m-1)/E0. What was not realised is that, in order to define a temperature, m must be much greater than 1 so that α ≡ 1. More precisely, m must be large enough to allow the use of Stirling’s formula[5]. Considering the claimed conditions under which Tsallis’ statistical mechanics is supposed to apply, it follows that it cannot be applied to thermodynamic systems for such systems would be too small to be capable of defining intensive quantities, such as temperature and pressure.

Consider the P and Q as two sets of complete probability distributions. For a given probability distribution, Q, seek the set of probability P which resembles Q most closely. This is the minimum discrimination statistic of Kullback[3].

In order to derive the pseudo-additive entropies, the logarithm is replaced by the limit

and a similar relation for logq(xi), in the exponential law to give

(32)

Multiplying (32) by p(xi ;a) and summing leads to[30]

(33)

where:

has been referred to as the inaccuracy[11].

From now on the dependence of the probability distribution P on the average (25) will be suppressed because Tsallis statistical thermodynamics makes no pretext at statistical inference and the inaccuracy is a convex function of Q for given P provided α≤2. The inaccuracy is defined as the sum of the entropy (2) and the error V(a). The inaccuracy has the property that

the negative of the error is what has been termed the entropy reduction[24].

The inequality in (33) follows from Hölder’s inequality. Consider the case when all the P are rational; then they may be expressed in the form and Q is the uniform distribution, q(xi) = 1/m for all I. (32) then becomes

because of Hölder’s inequalities[12]

with the inequality sign being reversed for α>1 so that inequality (33) always holds.

Tsallis et al.[7] found that the maximisation procedure for the pseudo-additive entropy (32) with respect to the constraint

(34)

using the escort probabilities[8], yields the stationary condition[7]

(35)

where:

(36)

and λ is the Lagrange multiplier for constraint (34). The normalisation condition for the p(xi) gives the partition function as

(37)

At best, (35) may be considered an implicit relation for the probabilities since (36) contains the probabilities explicitly period. In order to reduce Gauss’ principle (32) to something looking even vaguely like the ‘optimal’ probabilities (35) that maximise the Tsallis entropy in, (33), it is necessary to

(i) assume that P is an incomplete distribution,
(ii) set q(xi) = 1 for all i
and (iii) set V(a) = 0.

Then

where, the partition function is given by (37) and the escort probabilities (34) were used to define the parameter a, rather than the weighted average (25).

However, if (31) is taken and the approximation

introduced, with a similar expression for q(xi),

(38)

results. Setting q(xi) = 1 and V(a) =0 and requiring the probability distribution P be normalised result in (35) or, equivalently

Multiplying both sides by pα(xi) and summing lead to

provided λα is given by (36). On the other hand, if both sides of (35) are raised to the power α, summed from 1 to m and rearranged

(39)

The only difference between the two forms of averaging is that, in the first case, use was made of the escort probability average, (34). Since the two results do not coincide, it must be concluded that there is something amiss with the escort probability average (34) since this is the one totally new aspect introduced into the considerations.

Moreover, if the α→1 limit is taken in (39)

results and this is not the correct expression for the partition function even in the unphysical case of a density of states equal to unity.

Finally, multiplying both sides of (38) by pα(xi) and summing gives:

If q(xi) ≡ 1 for all i then

(40)

This illustrates the correspondence between the Shannon entropy and the potential V(a) in the α → 1 limit that was alluded to above in the thermodynamic formulation which takes into account a non-normalised prior probability distribution. However, the prior probability distribution has been set equal to unity, as in the maximum entropy method. Also, in order to derive the probability distribution (35) from Gauss’ law, V had to be assumed identically equal to zero. Consequently, (40) would require the vanishing of the pseudo-additive entropy (2).

From all that has gone before, it must be concluded that Tsallis’ statistical thermodynamic formulation of his entropy (2) is neither self consistent nor correct.

REFERENCES

  • Tsallis, C., 1999. Nonextensive statistics: Theoretical, experimental and computational evidences and connections. Braz. J. Phys., 29: 1-1.


  • Campbell, L.L., 1970. Equivalence of Gauss` principle and minimum discrimination information estimation of probabilities. Ann. Math. Stat., 41: 1011-1011.


  • Tsallis, C., R.S. Mendes and A.R. Plastino, 1998. The role of constraints within generalized nonextensive statistics. Physica A, 261: 534-554.
    Direct Link    


  • Kerridge, D.F., 1961. Inaccuracy and inference. J. R. Stat. Soc. Ser. B, 23: 184-194.


  • Bruckner, A.M. and E. Ostrow, 1962. Some function classes related to the class of convex functions. Pac. J. Math., 12: 1203-1203.


  • Campbell, L.L., 1996. Definition of entropy by means of a coding problem. Probability Theor. Related Fields, 6: 113-118.


  • Havrda, J. and F. Charvat, 1967. Quantification method of classification processes. Concept of structural α-entropy. Kybernetika, 3: 30-30.


  • Daroczy, Z., 1970. Generalized information function. Inform. Control, 16: 36-36.


  • Vajda, I., 1968. Axiomy α-entropie zobecriuneho pravdvpodonostniho schematu. Kybernetika, 4: 105-105.


  • Aggarwal, N.L., Y. Cesari and C.F. Picard, 1972. Proprietes de branchement liees aux questionaires de Campbell et a l`information de Renyi. C.R. Acad. Sci. Paris Ser. A, 275: 437-437.


  • Forte, B. and C.T. Ng, 1973. On characterisation of the entropies of degree β. Utilitas Math., 4: 193-193.


  • Aczel, J. and Z. Darciczy, 1975. On Measures of Information and their Characterizations. Academic Press, New York, ISBN-13: 9780080956244, Pages: 233


  • Tsallis, C., 1988. Possible generalization of boltzmann-gibbs statistics. J. Statist. Phys., 52: 479-487.
    Direct Link    


  • Arimoto, S., 1971. Information theoretical considerations on estimation problems. Inform. Control, 19: 181-181.


  • Boekee, D.E. and J.C.A. van der Lubbe, 1980. The R-norm information measure. Inform. Control, 45: 136-155.
    CrossRef    Direct Link    


  • Greene, R.F. and H.B. Callen, 1951. On the formalism of thermodynamic fluctuation theory. Phys. Rev., 83: 1231-1231.


  • Plastino, A. and A.R. Plastino, 1990. Tsallis entropy and Jaynes` information theory formalism. Braz. J. Phys., 29: 50-50.


  • Lavenda, B.H., 1998. Measures of information and error laws. Int. J. Theoret. Phys., 37: 3119-3137.
    Direct Link    


  • Pecaric, J.E., F. Proschan and Y.L. Tong, 1992. Convex Functions, Partial Orderings and Statistical Applications. Academic Press, San Diego


  • Arnold, B.C., 1987. Majorization and the Lorenz Order. Springer, Berlin


  • Jaynes, E.T., 1957. Information theory and statistical mechanics. Phys. Rev., 106: 620-630.
    CrossRef    Direct Link    


  • Kullback, S., 1959. Information Theory and Statistics. Wiley, New York


  • Keynes, J.M., 1921. Treatise on Probability. St. Martin's Press, New York


  • Lavenda, B.H., 1991. Statistical Physics: A Probabilistic Approach. Wiley-Interscience, New York


  • Beck, C. and F. Schlogl, 1993. Thermodynamics of Chaotic Systems. Cambridge Univeristy Press, Cambridge


  • Beckbach, E.F. and R. Bellman, 1961. Inequalities. Springer, Berlin


  • Hardy, G., J.E. Littlewood and G. Polya, 1952. Inequalities. 2nd Edn., Cambridge University Press, Cambridge, pp: 31-68


  • Mathai, A.M. and P.N. Rathie, 1975. Basic Concepts in Information Theory and Statistics. 1st Edn., Wiley, New York


  • Lavenda, B.H., 1995. Thermodynamics of Extremes. Horwood, Chichester


  • Khinchin, A.I., 1949. Mathematical Foundations of Statistical Mechanics. Dover Publications, New York

  • © Science Alert. All Rights Reserved