INTRODUCTION
Erythropoietin or known as EPO is a hormone produced by specialized renal cells
or by regenerating human hepatic cells in our body (Eckardt,
1996). These cells release EPO when the oxygen level is low in the kidney.
EPO then stimulates the bone marrow to produce more red cells and thereby increase
the oxygencarrying capacity of the blood.
Normal levels of EPO are in the range 0 to 19 mU mL^{1} (miliunits per milliliter). Lower than normal levels of EPO are found in chronic renal failure while elevated levels of EPO can be seen in polycythemia, a disorder in which there is an excess of red blood cells. Nowadays, synthetic EPO has been produced through recombinant DNA technology in mammalian cell culture. Recombinant EPO is secreted from genetically engineered mammalian cells in a fermentation process then recovered and purified as EPO bulk in a purification process to achieve the desired product characteristics specified by the manufacturer. Besides of the very complicated manufacturing process, the final step in EPO production also hampered due to the difficulty to measure the product itself.
The ability to provide faster and easier measurement of fermentation variables is important for monitoring and minimizing product quality variability. However, EPO concentration is usually measured offline through laboratory analysis where expensive costs of ELISA test kit, tedious and long time analysis via chromatography column are the biggest obstacles. Accurate prediction of EPO concentration can make it easy to formulate the EPO bulk into dosage form.
Since 1980s, Artificial Neural Network (ANN) based software sensor started
to gain trust as a predicting tool where desired output can be predicted based
on information of input from readily available online measurement (Golobic
et al., 2000; Bernard et al., 2000;
Cheruy, 1997). In the case of no available sensor, a
software sensor may be designed using sampling and offline laboratory analysis.
It is a modeling approach to estimate difficulttomeasure variables from easytomeasure
variables (Jianxu and Huihe, 2002).
Radial Basis Function (RBF) network is one type of ANN which is a popular alternative to Multi Layer Perceptron (MLP) in modeling, controlling and optimization in bioprocesses. RBF captured interest because of their attractive properties such as localization, interpolation, cluster modeling and quasiorthogonally.
There have been many successful applications of RBF network as a predicting
tool. Moghadas and Choong (2008) study on prediction
of double layer grids’ maximum deflection using neural network resulted
that RBF is better than backpropagation in term of training time and approximation
errors. RBF network was proposed as soft sensor for automatic gantry crane system
where the developed RBF model has estimated the unmeasured state well and was
robust to parameter variations (Solihin et al., 2006).
The objective of this study is to develop a software sensor based on artificial
neural network (ANN) for prediction of EPO production variables. This datadriven
modeling approach is employed to predict EPO concentration based on other measured
variables such as biomass, substrate (glucose and Lglutamine) or by product
(ammonia).
Thus, RBF network is selected for this study because it has major advantages
over MLP due to its ability to be trained using established linear regression
techniques, allowing fast convergence to the solitary global minimum for a given
set of fixed hidden nodes parameters (Dacosta et al.,
1997). Besides, RBF is may be used with advantage for modeling of a system
with limited number of experimental data is available (Lanouetta
et al., 1999).
The developed RBF model is of a great importance due to its ability to predict variables under varying conditions. This study discussed the effect of spread constant and inputs numbers on RBF predictive performance in predicting EPO concentration.
MATERIALS AND METHODS
Model development
Structure and principle of RBF network: Radial basis function network consist
of three layers which is an input layer, a single layer of processing perceptron
and an output layer. The single layer of processing perceptron has an activation
function called basis function. The most commonly used basis function is Gaussian
basis function. Structure of RBF network in its most basic form is shown in
Fig. 1.
The input layer is made up of source nodes whose number is equal to the dimension of the input vector u. The second layer is the hidden layer which is composed of nonlinear units that are connected directly to all of the nodes in the input layer. x is the desired output.

Fig. 1: 
Architecture of RBF network 
Each hidden unit takes its input from all the nodes at the components at the input layer. The hidden units contain a basis function, which has the parameters center and width or spread. Φ is a basis function and n is no of cluster centers. The basis function is typically a Gaussian function, the spread σ corresponding to the variance which has a peak at zero distance and it decreases as the distance from the center increases.
In this work, Radial Basis Function Network model was developed using MATLAB
7.2, neural network toolbox. A feed forward radial basis function network with
single hidden layer of nodes with Gaussian density function was chosen. MATLAB
uses the Orthogonal Least Squares (OLS) algorithm to solve for the RBF centers
and weights for the connections between the nodes in the hidden and output layers
(Chen et al., 1991). Other than specifying an
error goal, the spread constant, σ, which determines the width of the receptive
fields must also be specified respective to RBFN model development.
Data preprocessing: Data collection is obtained from Inno Biologics
Sdn Bhd. All inputs and output data were preprocessed and normalized between
zero and one using Eq. 1 to ensure each input variables provides
an equal contribution in the network. This normalization method has been widely
used in various studies (Vanek et al., 2004;
Choi and Park, 2001):
where x_{norm }is the normalized value of variable x, x_{max} and x_{min} are variable maximum and minimum values, respectively.
RBF training and testing: There were four input and one output variables used in this work. The four input variables are biomass, glucose, Lglutamine and ammonia concentration, while the desired output for this model is erythropoietin concentration (Fig. 2).

Fig. 2: 
Architecture of RBF network for predicting EPO concentration 
Four models (A, B, C and D) with different number of input were presented in this study. Model A has one input (biomass), model B has two input variables (biomass and glucose), model C consists of three input variables (biomass, glucose and Lglutamine) and model D with four input variables (biomass, glucose, Lglutamine and ammonia).
Besides, the spread constant, σ was varied until the model obtained minimum error index. This step was conducted to investigate the effect of spread constant on RBF predictive performance. In this study, the predictive RBF model was designed using newrbe function. This function can produce a network with zero error on training vectors. The function newrbe takes matrices of input vectors P and target vectors T and a spread constant SPREAD for the radial basis layer and returns a network with weights and biases such that the outputs are exactly T when the inputs are P. Type of learning algorithm used in this newrb function is Orthogonal Least Square.
Model performance as expressed through Error index (EI): In the subsequent
analysis, the RBF network performance is expressed throughout in term of error
index, EI Eq. 2, because it provides a measure of suitable
fitness of the model to the data (Rashid et al.,
2006):
y represents the experimental (real) value of output while í is the predicted value.
Regression analysis between the network response and the corresponding target was performed to examine the network response in more details. Coefficient of determination, R was used as an indicator in this analysis.
RESULTS
Error Index (EI) and coefficient determination value (R) of training and testing
set for each model were collected during the simulation process and shown in
Table 1 and 2.

Fig. 3: 
(a, b): Regression analysis between predicted output and actual
target for 1 input (model A) and 2 input (model B), respectively. (c, d)
Regression analysis between predicted output and actual target for 3 input
(model C) and 4 input (model D), respectively 
As shown, in Table 1 and 2, a network
with spread constant, σ = 4 found to be an optimum RBF network because
it fits for every model in term of EI percentage and R value.
A regression analysis of predicted output and actual target was performed to
investigate the model precisely. Figure 3ad
illustrates the strong correlation between predicted value by the RBF model
using the optimum spread constant and the actual value resulted from experimental
data.
Table 1: 
Error index and coefficient determination for different input
numbers at selected spread constant, σ (Training set) 

Table 2: 
Error index and coefficient determination for different input
numbers at selected spread constant, σ (Testing set) 


Fig. 4: 
Error index for different number of input at selected spread
constant (training set) 

Fig. 5: 
Error index for different number of input at selected spread
constant (testing set) 
The coefficient of determination, R for model A, B, C and D were 0.94946, 0.99006,
0.99301 and 0.99128 accordingly, which means that the model was successfully
mapping the relationship between input and output variables.
DISCUSSION
Effect of spread constant, σ: Model A, B, C and D were trained with several spread constants (0.2, 0.6, 4, 10 and 22). As a result, it is found that smaller σ does not necessarily generate smaller EI and bigger σ also does not essentially resulted in higher error index. Figure 4 indicates that spread constant, σ = 0.2 produces higher error index for every model compare to spread constant of 0.6, 4, 10 or 22.
In Fig. 5, although σ = 0.2 generates very small error for model A (E = 0.59%), but the other models resulted in very high EI (15.01, 22.33 and 14.12%). By setting the spread constant of testing set to 0.6, the lowest EI (0.11%) was obtained. However, only one and two input variable can generate small EI, while the bigger input variables gain slightly high number of EI. Thus, 0.2 and 0.6 are not fit to be selected as an optimum spread constant for this predictive model.
In this case, the spread constant of 0.2 and 0.6 may not large enough for the receptive fields to overlap one another to amply cover the whole input range. Nevertheless, it should not be too large that there is no distinction between the outputs of different nodes in the same area of the input space.
Effect of input number: As shown in Fig. 5, there was a case where the selected σ (σ = 4) gives very small EI (0.28%) at certain model (Model D testing set). However, when σ equal to 4 is applied to model A, B and C, high value of EI (1 to 7.57%) was obtained. This result shows that number of inputs affect the RBF predictive performance. At constant σ, small inputs number may gives high value of error index, which implies that error index value is proportional to the number of input variables used in the model.
CONCLUSION
In this study, RBFbased predictive model was proposed to predict EPO concentration based on measured variables such as biomass, glucose, Lglutamine and ammonia concentration. The best predictive model that successfully produced small error shows that newrb function that applied Orthogonal Least Square (OLS) training algorithm worked very well upon the centers and weights for the connections between the nodes in the hidden and output layers. Spread constant and number of inputs indeed affect the predictive performance where the optimum spread constant for this model is 4 and the best input number is three variables (biomass, glucose and Lglutamine concentration). Strong correlation between the input and output variables was indicated by high value of coefficient of determination, R where it proved that the model successfully mapping the nonlinear relationship between the input and output variables. Thus, offers a fast and reliable prediction of EPO concentration.
ACKNOWLEDGMENTS
We are thankful to Inno Biologics Sdn Bhd for their contribution on data collection and to Ministry of Science Technology and Innovation (MOSTI) for financial support.