Software Sensor for Measuring Lactic Acid Concentration: Effect of Input Number and Node Number

Journal of Applied Sciences

Year: 2010 | Volume: 10 | Issue: 21 | Page No.: 2578-2583
DOI: 10.3923/jas.2010.2578.2583

Software Sensor for Measuring Lactic Acid Concentration: Effect of Input Number and Node Number

R. Rashid, S.M. Mohamed Esivan, S.R. Radzali and A. Idris

Abstract: Artificial Neural Network (ANN) approach was applied in developing software sensor for production of lactic acid using pineapple waste from Lactobacillus delbreuckii. Lactic acid production currently is one of the significant materials in industry and production with renewable source such as pineapple waste made the production of lactic acid faced a lot of disturbances in measuring the quality of lactic acid produced. An artificial neural network (ANN) was developed to predict the concentration of lactic acid, using collected data from an offline analysis. Multi layer perceptron (MLP) was used for mapping between the input and output parameters. Two variables were used as input parameters. MSE was used to evaluate the predictive performance of MLP. Logsig and purelin was used as the activation function and Levenberg-Marquadt was utilized as the training algorithm. The result showed that having 2 inputs is better in predicting the concentration of lactic acid; instead of 1 input. The optimum structure found was 2-5-1.

Fulltext PDF Fulltext HTML

How to cite this article

R. Rashid, S.M. Mohamed Esivan, S.R. Radzali and A. Idris, 2010. Software Sensor for Measuring Lactic Acid Concentration: Effect of Input Number and Node Number. Journal of Applied Sciences, 10: 2578-2583.

Keywords: pineapple waste, artificial neural network, lactic acid concentration and Multiple inputs

INTRODUCTION

Lactic acid has become one of prominence supplies to a lot of industry. Conventionally, lactic acid has been a well-liked ingredient in food industry. In pharmaceutical industry, lactic acid is used as the raw material for surgical appliances; in textile as one of the chemical properties in dye and the latest - lactic acid has been discovered as the raw substrate for the production of biodegradable polymer (Idris and Suzana, 2006; John et al., 2007; Marques et al., 2008; Rao et al., 2008). Alas, the demand of the lactic acid in the world was predicted will reach 500,000 (metric) tonnes per year worldwide by the year 2010 (John et al., 2007).

Biology processes has been known as a nonlinear process. Thus, supervision of the fermentation process must maintain certain variables within strict limits since biological systems are highly sensitive to abnormal changes in operation condition (Arauzo-Bravo et al., 2004). Lactic acid production currently is one of the significant materials in industry and production with renewable source such as pineapple waste made the production of lactic acid faced a lot of disturbances in measuring the quality of lactic acid produced.

Software sensor: Software sensor works in manner of cause and effect, the inherent biologic relation between measured and unmeasured states could affect the prediction accuracy significantly (Chen, 2006). A data-driven software sensor is an inferential model developed from process observations. The interactions of the signals can be used for calculating or to estimate new quantities that cannot be measured (Gonzaga et al., 2008).

Software sensor has been widely applied for the estimation and prediction of quality measurements that are normally determined through infrequent sampling and off-line analysis such as in fermentation (Kiviharju et al., 2008) and as in distillation column (Zamprogna et al., 2005). Software sensor is able to map out the relationship between hard to measure properties (output) with easily measured properties (inputs). Usually; the main function of offline software sensor is to do data processing, build neural network model. The data that the software used are read in from the data file generally. The data file is collected from the field of plant, keeping in the text format (Du et al., 2006; Lin et al., 2007).

The attractive features of software sensor is one of many reasons that made software sensor gains interest from the researchers. Software sensor is easily implemented, low cost operating and maintenance and able to overcome the time delays cause by offline analysis (Fortuna et al., 2005).

An accurate, fast and reliable measurement is essential to any process, including fermentation. Typically, measurement of lactic acid concentration is involving a stage by stage procedure and takes a longer time, not to mention the cost.

The most frequently used procedures for the measurement of lactic acid concentration was done using high pressure liquid chromatography, (HPLC). Nonetheless, the drawbacks of HPLC is include the need for expensive equipment, the need of the great expertise for the operation of the equipment, the need for sample extraction, high waste solvent production and slow effective process time for samples due to removal of solvent and water from collected sample fractions (Rivier, 2000; Fogelman et al., 2009).

In consequence, the objective of this study is to develop a software sensor that able to predict the lactic acid concentration under varying conditions. Hence in this current study, the effect of input numbers, hidden nodes and normalization method in predicting lactic acid concentration is being investigated.

MATERIALS AND METHODS

In the present study, the data utilized in this work was obtained from the experimental work conducted by Idris and Suzana (2006). There are totally 8 process measurements available; glucose concentration, temperature, initial pH, percent concentration of sodium alginate, bead diameter of sodium alginate, fermentation time, cell concentration or cell number and lactic acid concentration. The selection of input variable is based on process knowledge, as well as considering reliability of process measurements (Lin et al., 2007). The fermentation was conducted for 72 h and the sampling was done every 4-8 h. There are 209 paired data from 19 sets experimental data.

Neural network design: Artificial Neural Network (ANN) has known used to solve a vast variety of problems in science and engineering; particularly in some areas where the conventional modeling method not giving good performance. Inspired from biological neuron, a well trained ANN is able to become a predictive model for a specific application. ANN is known to be used in diverse application prediction, forecasting, optimization, medicine and manufacturing (Najafi et al., 2007). The ability to grasp the inter relationship between two variables (input and output) has made ANN a very interesting tool.

Typical, two layer structure of multilayer neural network is chosen (Cheroutre-Vialette and Lebert, 2002; Fortuna et al., 2005; Li and Li, 2006; Gueguim-Kana et al., 2007; Fan et al., 2004). Based on the literature, logic and linear functions were selected as the activation function in hidden and output layer, respectively.


Fig. 1:	Configuration of model 2 inputs of multilayer neural network for predicting product concentration

This arrangement of function in function approximation problems or modeling is common and yields better results (Rashid et al., 2006; Najafi et al., 2007). To enable that each input variable provide an equal contribution in the ANN, the inputs in the model were preprocessed and scaled into a common numeric range [0 1] and [0.05 0.95]. The normalized value (xnorm) for each raw input/output dataset (xi) was calculated as follows:

(1)

(2)

where, xmin and xmax are the minimum and maximum values of raw data.

The training and testing performance (MSE) was chosen to be 0.001. In selection of the optimum model, the smaller ANNs had the priority to be selected as the complexity and size of the network is also important. A regression analysis between the network response and the corresponding targets was performed to investigate the network response in more detail. Training algorithm of Levenberg-Marquadt (trainlm) was elected based on the literature (Rashid, 2004; Rashid et al., 2006; Herzog et al., 2009; Esnoz et al., 2006). There will be 2 structures/topologies to be developed, 1-x-1 and 2-x-1 (as shown in Fig. 1) model; x denotes the number of hidden sizes.

RESULTS

In the present study, the effect of normalization using Eq. 1 and 2, effects of hidden nodes and number of inputs were investigated The result is tabulated in Table 1 and Table 2. R in Table 1 and 2 represents the correlation coefficient (R-value) between the signal outputs and targets.

Table 1:	Effect of hidden nodes on (a) = 1 number of input, (b) = 2 number of inputs. Normalization equation = 2

Table 2:	Effects of normalization method


Fig. 2:	Residual plot for model 1-7-1

To have a more precise investigation into the model, residual plot was conducted, showed by Fig. 2 and 4 and regression analysis of outputs and desired target was plotted and showed as Fig. 3 and 5.

DISCUSSION

Effects of hidden nodes: In the study presented here, the size of hidden nodes in the range (1-17) were employed and analyzed. In general, 5 and 7 hidden nodes gave better result on the MLP performances for topology with 1 input and 2 inputs.


Fig. 3:	Regression analysis (a) training set and (b) testing set, between the network response and the corresponding outputs of model 1-7-1


Fig. 4:	Residual plot for model 2-5-1

Poor MSE performance can be observed for small hidden nodes, 1-4, as shown in Table 1. This might due to the model failed to differentiate between complex patterns leading to only a linear estimate of the actual trend. For hidden nodes more than 7, the structure fails to give good generalization.


Fig. 5:	Regression analysis (a) training set and (b) testing set, between the network response and the corresponding outputs of model 2-5-1

It can be seen as in Table 1, that the predictive accuracy (MSE) on the test set is poor even though the predictive accuracy on the train set is good. The depletion of R-value on the test set also noticeably seen.

Furthermore, the phenomenon of over-fitting can be observed here as the size of hidden nodes becomes more than 7, whereby, too many hidden nodes will make the network follows the noise in the data due to over-parameterization leading to poor generalization for untrained data.

Effects of number of inputs: Two model were developed in current study; 1-x-1 and 2-x-1. 1 input model consist glucose concentration as input variable used to predict the concentration of lactic acid. The MSE performance for this model (Table 1a) were seen gradually decreasing down to 0.0057, as the hidden nodes increasing. 1-7-1 was the optimum structure that gave better predictive performance compared to other structure. Residual plot for model 1-7-1 was conducted, as shown in Fig. 2. The error was seen scattered randomly in between -0.2 and 0.2; (-0.2 < error < 0.2). Meanwhile, Fig. 3 shows the regression analysis conducted for model 1 input for structure 1-7-1. The correlation coefficient for training set is 0.96148 and for test set is 0.96313 as depicted in Fig. 3.

Meanwhile, for model of 2 inputs, inputs variable used were glucose concentration (g L^-1) and cell number. It was found that the optimum structure that gave better performance compared to other structure was 2-5-1. Figure 4 shows the residual plot conducted for structure 2-5-1. From the figure, it can be seen that the error was scattered randomly and the error fall in the range -0.15 and 0.15 (-0.15<error<0.15). As from regression analysis, showed by Fig. 5, there is a high correlation between the predicted values by the ANN model and the measured values from the experimental data. The R-value for the training set was 0.98887 and for testing set was 0.98757 as depicted in Fig. 5.

It is found that the MSE values presented in Table 1 (b) are mostly better than the MSE values presented in Table 1a. The regression analysis showed that Fig. 5 gave better correlation compared to Fig. 3. The range of error in residual plot of Fig. 2 was larger compared to the range of error in Fig. 4. Alas, this showed that having 2 inputs, give better predictive performance of ANN in predicting lactic acid concentration.

Effects of normalization method: Normalization is one of the critical elements in developing a software sensor. It was done in order to let the input and output value in the same order of magnitude and also to avoid any domination from any value of large magnitude [1,5]. Two normalization methods were studied here, using Eq. 1 and 2. There are significance differences between these two type normalization methods. It is found that the MSE-values presented in Table 2 for method 2 are better than the MSE-values for method 1. The best MSE-value for normalization method 1 is 0.0049; while for method 2 is 0.0025. Thus, normalization Eq. 2 gave better effects to both model compared to normalization Eq. 1.

CONCLUSION

In this study, the effects of number of inputs, hidden nodes and normalization method were studied. From the results, it showed that having too small of hidden nodes will resulted in incapability of network to differentiate a complex pattern and having too big of hidden nodes will lead to over fitting. In this case, having only 1 input is not sufficient to achieve good predictive performance.

For future work, prediction of lactic acid concentration can be carried out using different type of inputs besides glucose concentration and cell number and also using number of inputs more than 2. Other types of ANN model can also be considered and compared with MLP.

ACKNOWLEDGMENTS

This study was funded by Ministry of Science, Technology and Innovation, Malaysia (MOSTI).

REFERENCES

Idris, A. and W. Suzana, 2006. Effect of sodium alginate concentration, bead diameter, initial pH and temperature on lactic acid production from pineapple waste using immobilized Lactobacillus delbrueckii. Process Biochem., 41: 1117-1123.
CrossRef

Ara�zo-Bravo, M.J., J.M. Cano-Izquierdo, E. G�mez-Sanchez, M.J. L�pez-Nieto, Y.A. Dimitriadis and J. L�pez-Coronado, 2004. Automatization of a penicilin production process with soft sensors and an adaptive controller based on neuro fuzzy systems. Control Eng. Practice, 12: 1073-1090.

Chen, L.Z., 2006. On-line softsensor development for biomass measurement using dynamics neural network. Modeling Opt. Biotechnol. Processes, 15: 41-56.

Cheroutre-Vialette, M. and A. Lebert, 2002. Application of recurrent neural network to predict bacterial growth in dynamic conditions. Int. J. Food Microbiol., 73: 107-118.

Du, D., C. Wu, X. Luo and X. Zuo, 2006. Delay time identification and dynamic characteristics study on ANN soft sensor. Proceedings of the 6th International Conference on Intelligent Systems Design and Applications, Oct. 16-18, IEEE Computer Society Washington, DC, USA., pp: 42-45.

Esnoz, A., P.M. Periago, R. Conesa and A. Palop, 2006. Applications of neural networks to describe the combine effect of pH and NaCl on the heat resistance of Bacillus stearothermophilus. Int. J. Food Microbiol., 106: 153-158.

Fan, Y., K. Takayama, Y. Hattori and Y. Maitani, 2004. Formulation optimization of paclitaxel carried by PEGylated emulsions based on artificial neural network. Pharmaceut. Res., 21: 1692-1697.

Fogelman, K.D., E.E. Wikfors and R. Chen, 2009. Time delay for sample collection in chromatography systems. http://www.faqs.org/patents/app/20090050568.

Fortuna, L., S. Graziani and M.G. Xibilia, 2005. Soft sensors for product quality monitoring in debutanizer distillation columns. Control Eng. Practice, 13: 499-508.

Gueguim-Kana, E.B., J.K. Oloke, A. Lateef and M.G. Zebaze-Kana, 2007. Novel optimal temperature profile for acidification process of Lactobacillus bulgaricus and Streptococcus thermophilus in yoghurt fermentation using artificial neural network and genetic algorithm. J. Ind. Microbiol. Biotechnol., 34: 491-496.

Herzog, M.A., T. Marwala and P.S. Heyns, 2009. Machine and component residual life estimation through the application of neural networks. Reliabil. Eng. Syst. Safety, 94: 479-489.

John, R.P., K.M. Nampoothiri and A. Pandey, 2007. Fermentative production of lactic acid from biomass: An overview on process developments and future perspectives. Applied Microbial. Biotechnol., 74: 524-534.
CrossRef

Kiviharju, K., K. Salonen, U. Moilanen and T. Eerik�inen, 2008. Biomass measurement online: The performance of in situ measurements and software sensor. J. Ind. Microbiol. Biotechnol., 35: 657-665.

Li, B. and L. Li, 2006. Artificial neural network based software sensor for yeast biomass concentration during industrial production. Proceedings of 2006 International Conference on Computational Intelligence and Security, Nov. 3-6, Guangzhou, pp: 955-958.

Lin, B., B. Recke, J.K.H. Knudsen and S.B. J�rgensen, 2007. A systematic approach for soft sensor development. Comput. Chem. Eng., 31: 419-425.

Marques, S., J.A.L. Santos, F.M. G�rio and J.C. Roseiro, 2008. Lactic acid production from recycled paper sludge by simultaneous saccharification and fermentation. Biochem. Eng. J., 41: 210-216.

Najafi, G., B. Ghobadian, T.F. Yusaf and H. Rahimi, 2007. Combustion analysis of a CI engine performance uing waste cooking biodiesel fuel with an artificial neural network aid. Am. J. Applied Sci., 4: 756-764.

Rao, C.S., R.S. Prakasham, A.B. Rao and J.S. Yadav, 2008. Production of L (+) lactic acid by Lactobacillus delbrueckii immobilized in functionalized alginate matrices. World J. Microbiol. Biotechnol., 24: 1411-1415.

Rashid, R., H. Jamaluddin and N.A. Saidina Amin, 2006. Empirical and feed forward networks models of tapioca starch hydrolysis. Applied Artificial Intell., 20: 79-97.

Rivier, L., 2000. Techniques for analytical testing of unconventional samples. Best Practice Res. Endocrinol. Metabolism, 14: 147-165.

Rashid, R., 2004. Genetic algorithm and neural networks modeling of tapioca starch hydrolysis process. Ph.D. Thesis, Universiti Teknologi Malaysia.

Zamprogna, E., M. Barolo and D.E. Seborg, 2005. Optimal selection of soft sensor inputs for batch distillation columns using principal component analysis. J. Process Control, 15: 39-52.

Gonzaga, J.C.B., L.A.C. Meleiro, C. Kiang and R. Maciel Filho, 2008. ANN-based soft-sensor for real-time process monitoring and control of an industrial polymerization process. Comput. Chem. Eng., (In Press).

HOME JOURNALS CONTACT

Journal of Applied Sciences

Year: 2010 | Volume: 10 | Issue: 21 | Page No.: 2578-2583 DOI: 10.3923/jas.2010.2578.2583

Software Sensor for Measuring Lactic Acid Concentration: Effect of Input Number and Node Number

R. Rashid, S.M. Mohamed Esivan, S.R. Radzali and A. Idris

How to cite this article

R. Rashid, S.M. Mohamed Esivan, S.R. Radzali and A. Idris, 2010. Software Sensor for Measuring Lactic Acid Concentration: Effect of Input Number and Node Number. Journal of Applied Sciences, 10: 2578-2583.

Keywords: pineapple waste, artificial neural network, lactic acid concentration and Multiple inputs

REFERENCES

Year: 2010 | Volume: 10 | Issue: 21 | Page No.: 2578-2583
DOI: 10.3923/jas.2010.2578.2583