Strong Law of Large Numbers for Nonlinear Semi-Markov Reward Processes

Khorshidian, K.

ABSTRACT

We obtain a Strong Law of Large Numbers (SLLN) for the reward process {Z(t), t≥0}, the cumulative reward gained by operating a Semi-Markov system during the time interval [0, t]. The important and striking point in this study is leaving the usual assumption that rewards for each state are of constant rates. In most of applications this frequently used assumption is not realistic, therefore we deal with reward functions of general forms. The SLLN obtained is in the sense that Lim_t→∞Z(t)/t = α, a.s. for some real value α. Mild conditions for this SLLN are existence of sojourn times means and integrability of reward functions with respect to sojourn time distributions. As it has been shown, the parameter α coincides with the shift parameter in asymptotic representation of E[Z(t)], t→∞.

PDF Abstract XML References Citation

INTRODUCTION

In this study we deal with long run rate of reward processes defined on a Semi-Markov process, when the reward functions are not necessarily of linear form. The main difference between this work and other similar subjects, is inherent in the structure of functionals that we have mentioned. A Semi-Markov Process (SMP), or equivalently a Markov-Renewal Process (MRP) is an extension of a Markov Chain (MC) tied with ordinary renewal processes for each state. The successive states visited by this process forms a Markov chain, while the sojourn time in each state is not a constant and probabistically depends on the current state and may depend on the next state to be visited. It is well known that given the sequence of the states, the sequence of the sojourn times are independent. More details on Semi-Markov processes can be found in Cinlar (1969 and 1975a and b), Pyke (1961a and b) and Teugels (1976).

For processes with regenerative structure, a cost or reward function is often introduced. The total cost or reward gained corresponding to the performance of the system as a function of the time is called a reward process. In last decades many authors have considered different aspects of reward processes. Some have dealt with mean value of these processes and because of complicated forms of exact solutions obtained, most of these authors have tried for limiting behavior of mean value in long run of the process.

The definition of a reward process on a Semi-Markov process, backs to the first given by Pyke and Schaufele in (1964) as:

(1)

Where, J_n and ξ_n are the state and epoch of n-th transition and f is a real valued function. The mean and variance for these processes in univariate and multivariate case were obtained by Sumita and Masuda (1987) and Masuda and Sumita (1991), when they considered a reward process of the form:

(2)

Where, Δ_n = ξ_n+1-ξ_n is the sojourn time at state J_n. ρ(k) is a constant at state k. Note that Eq. 2 is a special case of Eq. 1, when f is of constant rate and hence it is linear in Δ_n. They obtained exact formulas as well as asymptotic behavior for mean value of the process E[Z(t)], t≥0 in univariate and multivariate case. Soltani (1996), introduced the nonlinear reward process as:

(3)

Where, γ is the age process. The function ρ in Eq. 3 is called a reward function, where ρ(j, τ) measures the excess reward when time τ is spent in the state j. A closed form for mean of this process, when the reward functions assumed to be polynomials in time was obtained by Soltani (1996). Khorshidian and Soltani (2002 and 2003), obtained E[Z(t)] and Var[Z(t)] the expected value and variance of Z(t), as well as Σ(t), the covariance matrix for a multivariate reward process with components as Eq. 3. Using Markov-renewal theory and for general ρ(.,.), Soltani and Khorshidian (1998), showed that

(4)

Where, the constants α₀ and α₁ were fully specified. For reward functions of general form and in multivariate case, Khorshidian and Soltani 2002, used supplementary variables technique and showed that as t →∞,

(5)

Moreover, they calculated the vectors A₀ and A₁ and matrices C₀ and C₁.

In probability theory it is well known that for a sequence of random variables {X_n, n = 1, 2, ...}, with partial means (partial sums) {Y_n, n = 1, 2, ...}, if the limits exist and are constant values, lim_n→∞ E[Y_n] does not necessarily coincides with lim_n→∞ Y_n. In what follows we will show that under some mild conditions lim_t→∞ Z(t)/t = lim_t→∞ E[Z(t)]/t, this is the SLLN we purposed.

To arrive at the mentioned strong law for the process Eq. 3, since the reward functions ρ(.,.) assumed to have general forms, it is natural to impose some conditions on it. Most of the conditions we need are about integrability of ρ, just what has been done for calculating moments of Z(t), by Soltani and Khorshidian (1998) and Khorshidian and Soltani (2002).

NOTATION AND PRELIMINARIES

Let {J(t), t≥0} be a SMP with a MRP, {(J_n, ξ_n), n = 0, 1, 2, ...}. The state space of {J_n} is assumed to be E = {s₀, s₁, ..., s_m}. J_n and ξ_n are the state and epoch of n-th transition n≥0 with ξ₀ = 0. Also let Δ_n = ξ_n+1-ξ_n denotes the sojourn time at J_n. The transition probabilities are purposed to be

, where are transition probabilities of the embeded Markov chain and F_i(.) is the distribution of sojourn time at state i. The mean sojourn time at state i is:

The invariant stationary distribution of the underlying MC is is the initial probability row vector with .

The number of transitions in [0, t] is denoted by N(t). It is well known that if the SMP is irreducible, recurrent and the sojourn times have finite means, then

(6)

The elapsed life of the process after the last transition is the age process .

Throughout the study we assume that the reward functions ρ in Eq. 3 are nonnegative. For reward functions with possibly negative values, without loss of generality and by adding a constant finite value M, all of the undergoing results goes true, provided that .

SLLN FOR REWARD PROCESSES

In this section, using limit theorems for MC`s and under integrability of reward functions ρ(J.,), with respect to the sojourn time distributions F_j(.), we will arrive at a SLLN for Z given by Eq. 3.

Let {(J_n, ξ_n); n = 0,1,2,...} ba an irreducible MRP, with finite state space E and corresponding SMP, {J(t), t≥0} with as the invariant stationary distribution of the underlying Markov chain {J_n, n≥0}. It is well known that when the state space is finite, the corresponding MC and MRP are positive recurrent and in this case,

(7)

for any real valued function f. In the next theorem we will drive more general limiting results for summands similar to Eq. 7, with a bivariate reward function ρ: Ex[0, ∞)→[0, ∞) instead of f and a MRP instead of the MC. Thereafter we will show that a SLLN holds for {Z(t), t≥0}, in the sense that lim_t→∞Z(t)/t exists, under the assumptions that

are finite for all j ε E.

Theorem 1
Suppose that {(J_n, ξ_n), n≥0} is an irreducible recurrent MRP with finite state space E and stationary distribution . Also suppose that max _jεEm_j and θ₀(j) <∞, for all j ε E, then

where, Z(t) is given by Eq. 3 and δ by Eq. 6.

Proof: First note that,

and therefore,

(8)

Also irreducibility assumption and finiteness of E implies that, a.s., which gives

(9)

Moreover,

Image for - Strong Law of Large Numbers for Nonlinear Semi-Markov Reward Processes

(10)

by the fact that N(t)→∞ as t →∞ and the argument will been given below for Eq. 10, So by Eq. 9 and 10

It remains to show that Eq. 10 holds in the sense that:

To do this, denote by N_n(i) the number of visits to state i during first n-th transitions of the MRP and by subsequence k(1, i), k(2, i),... those indices at them the process visits i. Also, by Y_k(1,i), Y_k(2,i) denote the sojourn times in state i, during the above subsequence. Given J₀, J₁, J₂,..., the subsequence Y_k(1,i), Y_k(2,i) are IID with distributions F_i. Moreover, by Positive recurrence of the underlying MC, N_n(i)→∞ as n→∞ for all iεE and therefore by SLLN for IID random variables we have:

(11)

Thus,

(12)

Since the MRP is assumed to be irreducible, recurrent with finite state space, we conclude that as n→∞, N_n(i)/n→π_i and therefore Eq. 11 and 12 imply that:

(13)

The proof is now complete.

Remark
As it has been expected, the limiting value α in Theorem 1, coincides with slope parameter α₁ in Eq. 4, which has been derived by Soltani and Khorshidian (1998), for asymptotic representation of E[Z(t), t→∞. Also it coincides with elements of the vector A₁, in Eq. 5 for multivariate {Z(t), t≥0, Moreover the assumptions on the reward functions and sojourn times for the above SLLN, are just as that which has been assumed for existence of E[Z(t)], t→∞ (Khorshidian and Soltani, 2002).

In the Theorem 1, finiteness of the state space is only used for embedded MC to have positive recurrent states. If in the infinite state space case, we impose conditions that by them, the states would be positive recurrent, then the result of this theorem hold. Therefore, we can arrive at the following corollary.

Corollary
Suppose that {(J_n, ξ_n), n≥0} is an irreducible positive recurrent MRP with infinite state space E and stationary distribution . Also suppose that m_j<∞ and θ₀(j)<∞, for all j ε E then

where Z(t) is given by Eq. 3 and δ by Eq. 6.

REFERENCES

Cinlar, E., 1969. Markov renewal theory. Adv. Applied Probab., 1: 123-187.
CrossRef Direct Link
Cinlar, E., 1975. Markov renewal theory: A survey. Manage. Sci., 21: 727-752.
CrossRef Direct Link
Cinlar, E., 1975. Introduction to Stochastic Processes. Prentice-Hall, Englewood Cliffs, NJ.
Khorshidian, K. and A.R. Soltani, 2002. Asymptotic behavior of multivariate reward processes with nonlinear reward functions. Bull. Iran. Math. Soc., 28: 1-17.
Direct Link
Khorshidian, K. and A.R. Soltani, 2003. Exact solutions for covariance matrix of multivariate reward processes with analytic reward functions. J. Sci. I.R. Iran, 14: 173-183.
Direct Link
Masuda, Y. and U. Sumita, 1991. A multivariate reward process defined on a semi-Markov process and its first-passage-time distributions. J. Applied Probab., 28: 360-373.
CrossRef Direct Link
Pyke, R., 1961. Markov renewal processes: Definitions and preliminary properties. Ann. Math. Stat., 32: 1231-1242.
Direct Link
Pyke, R., 1961. Markov renewal processes with finitely many states. Ann. Math. Stat., 32: 1243-1259.
Direct Link
Pyke, R. and R. Schaufele, 1964. Limit theorems for Markov renewal processes. Ann. Math. Stat., 35: 1746-1764.
Direct Link
Soltani, A.R., 1996. Reward processes with nonlinear reward functions. J. Applied Probab., 33: 1011-1017.
CrossRef Direct Link
Soltani, A.R. and K. Khorshidian, 1998. Reward processes for semi-Markov processes: Asymptotic behaviour. J. Applied Probab., 35: 833-842.
CrossRef Direct Link
Ushio, S. and M. Yasushi, 1987. An alternative approach to the analysis of finite Semi-Markov and related processes. Commun. Stat. Stochastic Models, 3: 67-87.
CrossRef Direct Link
Teugels, J.L., 1976. A bibliography on Semi-Markov processes. J. Comput. Applied Math., 2: 125-144.
Direct Link

Asian Journal of Mathematics & Statistics

Research Article

Strong Law of Large Numbers for Nonlinear Semi-Markov Reward Processes

ABSTRACT

How to cite this article

Search

INTRODUCTION

REFERENCES

Search

Leave a Comment