Abstract: The main concern of the present investigation is to compare the performance between (1) Neural Network (NN) and Support Vector Machine (SVM) and (2) Genetic Algorithm (GA) and Fuzzy Inference System (FIS) which were used to identify the specific (acute) spots of the abnormal children speech who need specific practice in those identified spots to overcome stuttering behavior. In the previous work, there are three phases followed (1) Training Phase (2) Testing phase and (3) Optimization phase for blemishing the acute spot in which the speech practice can be provided for the abnormal children. The training and optimization in previous works were implemented using (1) NN and GA and (2) SVM and FIS. In this study, the comparison is done between the performance of (1) NN and SVM and (2) GA and FIS using Mathews Correlation Coefficient (MCC). Performance of the proposed system is tested by the dataset generated by abnormal and normal female children. The proposed system is implemented using MATLAB as working platform. This study is to provide a system to identify the acute spots in abnormal word where the stuttering children need the practice. Comparison Study is done between the performance of (1) NN and SVM and (2) GA and FIS.
INTRODUCTION
In real-world environment, speech signal processing plays a vital role among the research communities in general and it has a specific role in the case of children who have the problem of stuttering in particular. But, experts feel that the latter case is caused by a combination of various factors. First, genetics is believed to play a part. It appears that stuttering results from inherited (genetic) abnormalities in the language centres of the brain. Second, children who have developmental delays or other speech problems are more likely to stutter. Third, environmental factors can have an influence. Speech synthesis among the children helps in identifying the acute spot in the speech of abnormal children.
In this study, the abnormal children speech is considered to enhance the speech by comparing it with the normal children speech. This work is very useful for the speech practitioners to improve the speech of the abnormal children by knowing where the speech practice is required. The Mel Frequency Cepstrum Coefficients (MFCC) for abnormal and normal speeches was extracted. Principal Components Analysis (PCA) is applied to reduce the dimensionality of MFCC features and is trained with the aid of Neural Network (NN) and Support Vector Machine (SVM) separately to identify the abnormal word (Bharathi and Shanthi, 2012c). Through the thresholding operation acute word is identified. Number of peaks and the amplitude is found from the Fast Fourier Transform (FFT) of the acute word. The amplitude and number of peaks are converted into genes and optimized using Genetic Algorithm (GA). And also the amplitude and number of peaks are inputted to the Fuzzy Inference System (FIS) for the optimization process separately to identify the position in which the stress has to given. The comparison of performance between (1) NN and SVM and (2) GA and FIS is done in this study.
MATERIALS AND METHODS
Classification process
Classification through NN: In the classification, the NN is utilized for
the classification purpose in order to separate the normal and for identifying
the abnormal dataset. The seven MFCC features namely mean, standard deviation,
maximum amplitude value and its id, minimum amplitude value and its id, MFCC
length were extracted and the same set of seven features were extracted from
the original word. These fourteen features are trained using NN and SVM separately
(Bharathi and Shanthi, 2011a; Bharathi
and Shanthi, 2012c). This neural network has fourteen input layers which
are the parameters of a word. Hidden layers and one output layer to identify
the word inputted is either normal or abnormal.
Support Vector Machine (SVM): An SVM is a concept in statistics and computer science for a set of related supervised learning methods that analyze data and recognize patterns, used for classification and regression analysis. The standard SVM takes a set of input data and predicts, for each given input, which of two possible classes forms the input, making the SVM a non-probabilistic binary linear classifier. Given a set of training examples, each marked as belonging to one of two categories, an SVM training algorithm builds a model that assigns new examples into one category or the other.
Identifying the acute spot of aberration of abnormal children using GA and
FIS
Genetic Algorithm (GA): In Optimization Phase, thresholding operation on
the identified abnormal word is done and then FFT is computed for the acute
word. The sudden changes in frequency can be seen in FFT by maximum peaks (Bharathi
and Shanthi, 2011b). The numbers of maximum peaks are analyzed with its
amplitude using FFT. For identifying the acute spot of aberration for abnormal
children, GA is used. This each peak is considered as a gene. Number of Chromosomes
obtained from crossover and mutations are combined and so population pool is
filled up. After crossover process, the fitness is applied on children chromosomes
and hence highly deviated chromosomes are replaced with new genes. The process
is repeated until it reaches maximum iteration. The indices, which are obtained
from genes of best chromosomes, represent the acute position of abnormal word
(Bharathi and Shanthi, 2012e).
Fuzzy Inference System (FIS): Fuzzy knowledge-based systems are rules that built on fuzzy logic and fuzzy set theory. A fuzzy rule system is a rule system whose variables or part of its variables are linguistic variables. It is well known that FIS can be used to approximate closely any nonlinear input output mapping by means of a series of ifBthen rules. In the design of FIS, there are two major tasks, viz. the structure identification and the parameter adjustment. The fuzzy inference system is comprised of three phases that are Fuzzification, Rules Evaluation and Defuzzification. The process of fuzzy inference involves Membership Functions, Logical Operations and If-Then Rules. In this FIS, the peaks of the abnormal words are utilized in order to identify the aberration spot (Bharathi and Shanthi, 2012a).
Experimental results: There are number of approaches and among them few approaches are detailed below (Bharathi and Shanthi, 2012d):
• | Automatic detection method which uses a new approach called the speech recognition technology for syllable repetition in read speech for objective assessment of stuttered disfluencies. Different steps were involved in finding the number of repetitions from the speech samples using MFCC feature extraction algorithm. The twelve dimensional MFCC obtained for each syllable are used to compute the angle between them which serve as local-distance and represent in the form of matrix. Using Dynamic Programming the minimum-cost path through matrix is found and given to Decision logic. In Decision logic, Dynamic Time Warping (DTW) based score matching is done |
• | Intelligent processing of stuttered speech work reports results on automatic detection of speech disorder events that were based on both rough sets and artificial neural networks. Rough set-based algorithm and Neural Network algorithm was applied in (a) Automatic detection of stops-gaps, (b) Discerning vowel prolongations and (c) Detection of syllable repetitions. During the tests, rough set system settings were selected as follows: the value of quantization was from the range of 5 to 20 and the value of neutral point was from 0.6 to 0.9. Better scores were obtained using rough set-based system |
• | In Optimal Curve Fitting method the amplitude profile of sampled speech data were fitted by employing sum of sine functions with a confidence level. Furthermore, amplitude correlation technique is applied between original speech signal samples of normal and pathological subjects and correlation technique is also applied between the curve fit constant values for normal and pathological subjects. For the collected sample space of children the curve fitting technique is applied and curve fitting constant values are obtained. Then correlation coefficient between normal and pathological curve fitting constant values are calculated and it is observed that if correlation coefficient value starts deviating from its expected value 0.20, speech disability severity increases in pathological subjects in comparison with normal child. In the present, investigation the scope of the proposed work is detailed below to overcome shortcomings observed in the above existing methods |
Fig. 1: | Signal for the word Dinosaurs (Normal Child1) |
Fig. 2: | Signal for the word Dinosaurs (Normal Child2) |
Feature extraction: Initially, the words are extracted from the both normal and abnormal children and then the MFCC feature has been extracted from it. Subsequently, the PCA is applied to reduce the dimensionality of the words. Few speech samples which are sent as input to MFCC Feature Extraction (Bharath and Shanthi, 2011c).
Discriminant analysis using SPSS model was implemented on fourteen extracted features and around 80% classification accuracy was got (Bharathi and Shanthi, 2012b).
Classification using NN and SVM: Subsequently, for MFCC features the PCA is applied to reduce the dimensionality of the words and then they are inputted to the NN and SVM separately to identify the abnormal and the normal word.
Figure 4 shows the output generated by NN and Fig. 5 shows the regression result of the SVM Classifier.
Fig. 3: | Signal for the word Dinosaurs (Abnormal Child) |
Fig. 4: | Generated ANN for classification |
The SVM regression results of the recorded dataset of normal and abnormal words are shown in Fig. 5. Signal for the abnormal word Dinosaurs is shown in Fig. 3. Signal for the normal word Dinosaurs of the normal child1 is shown in Fig. 1. Signal for the normal word Dinosaurs of the normal child2 is shown in Fig. 2. Table 1 shows the output for the SVM classifier for the identification of abnormal word. The output results identified by the SVM and NN classifiers are manually verified and classification accuracy is found. The output of SVM classifier and NN classifier for few words as an sample are mentioned in the Table 2 for identification of abnormal and normal word. Ninty percent of Classification accuracy is got by SVM classifier and 60% of classification accuracy is got by NN.
The below rules were also used to find the performance the NN and SVM:
This performance is verified using the TP, TN, FP and FN rules. Sensitivity and specificity are statistical measures of the performance of a binary classification test, also known in statistics as classification function.
Table 1: | Few Output for the SVM classifier for the identification of abnormal word |
Table 2: | Output for the SVM classifier and ANN classifier for the identification of abnormal word |
Fig. 5: | Regression result for the SVM classification to identify the abnormal word |
Sensitivity also called the true positive rate, measures the proportion of actual positives which are correctly identified as such. Specificity measures the proportion of negatives which are correctly identified as such. Equations 1 to 7 are used for this statistical measures:
(1) |
Table 3: | Performance of Neural Network for identifying the abnormal words |
Table 4: | Performance of Support Vector Machine for identifying the abnormal words |
(2) |
(3) |
(4) |
(5) |
(6) |
(7) |
Accuracy (Acc) is calculated as:
Acc = (TP+TN)/(TP+TN+FP+FN) |
The Matthews Correlation Coefficient (MCC) is as a measure of the quality of binary (two-class) classifications. It takes into account true and false positives and negatives and is generally regarded as a balanced measure which can be used even if the classes are of very different sizes. The MCC is in essence a correlation coefficient between the observed and predicted binary classifications; it returns a value between -1 and +1. A coefficient of +1 represents a perfect prediction, 0 no better than random prediction and -1 indicates total disagreement between prediction and observation.
The accuracy comparison of different words using NN and SVM is shown in Fig. 6 using the accuracy found in Table 3 and 4 which shows the performance of NN and SVM respectively. Four different groups of ten different words were considered for the Table 3 and 4 to calculate the accuracy and other parameters mentioned in Table 3 and 4.
Fig. 6: | Accuracy comparisons of different words in ANN and SVM |
Fig. 7: | Identified acute spot of aberration for the word Dinosaurs Using GA (Abnormal child) |
Finding spot using GA and FIS: In the previous work, it was tested with the database of 100 words with normal children two and an abnormal child data. Then the acute spot of aberration for the abnormal words is identified using GA and FIS separately. The output obtained for the words Dinosaurs using GA and FIS for identifying acute spot of aberration is shown in the Fig. 7 and 8, respectively.
Performance of GA and FIS for finding the aberration spot is calculated by TP,TN,FP and FN. Where TP,TN,FP and FN are:
Performance measure for the GA and FIS for identifying the aberration spot is shown in Table 5 and Table 6 respectively.
Fig. 8: | Identified acute spot of aberration for the word Dinosaurs using FIS (Abnormal child) |
Fig. 9: | Accuracy Comparison for different abnormal words with existing Genetic Algorithm and Fuzzy Inference system |
In this performance measure, as a sample four words were taken namely dinosaurs, flowers, daffodils and wild animals. Aberration spot was found for these words and the number of spots identified correctly and incorrectly are marked and accordingly TP,TN,FP and FN is calculated correspondingly. These values are shown in Table 5 and 6 of GA and FIS respectively.
Accuracy Comparison of GA and FIS for four different abnormal words is shown in Fig. 9 using the Table 5 and Table 6. The accuracy and MCC terms shows that performance of FIS is better than GA.
Table 5: | Performance measure for the Genetic Algorithm for identifying the aberration spot |
Table 6: | Performance measure for the Fuzzy Inference System |
CONCLUSION AND FUTURE WORK
The proposed system was implemented in the working platform of MATLAB (version 7.11). In this with the aid of the Free Audio Editor, the dataset with the normal and abnormal female children within the age limit 6-10 yrs are generated. For 100 normal data (words) 2 female children were utilized each and for 100 abnormal data, a female child was utilized for the system and their normal frequency range is from 20-4 kHz. The speech input is recorded at a sampling rate of 44.1 kHz. In this study, an effective system has been proposed to identify the abnormal word and the spot in which the speech has to be improved is also identified (Bharathi and Shanthi, 2011b, 2012d). The performance between ANN and SVM are compared and found that 90% accuracy is got by SVM and the performance of FIS is found better than GA.