ABSTRACT
This study presents a Digital Speech Processing algorithm based respectively on Continuous Wavelet Transform (CWT) and Bionic Wavelet Transform (BWT), which are applied for cochlear prostheses to stimulate properly cochlea`s nervous cells of totally or even profoundly deaf patients. In fact, BWT is a new time-frequency analysis method recently proposed in the biomedical engineering field. It is based on active cochlear model introduced into CWT. BWT has proven to be meaningful in cochlear implant research. A twenty one filters banc based, respectively on Bionic and Continuous Wavelet Transform using Munich basic function was designed. The input signal could be then divided into twenty one bands according to ERB model (Equivalent Rectangular Bandwidth) which is similar to active auditory model. Strategy performances were evaluated using different input signals like pure and composite sounds and real speech signals extracted from TIMIT database. One comparison study between these two algorithms shows that BWT performs better than CWT for speech signal processing, especially for cochlear prosthesis. This speech processing algorithm allows on the other hand the conception of one graphical spectrogram generating stimulating pulses. It is a useful simulation of electrical stimulating-pulses recommended for cochlear prostheses, especially for performing experiments and clinical set-up during stimulation test and re-education. This software provided in this stimulation technique ensures flexibility in programming, easiness in use as well as different safety features that could help reaching each individual`s needs.
PDF Abstract XML References Citation
How to cite this article
DOI: 10.3923/ajsr.2008.293.309
URL: https://scialert.net/abstract/?doi=ajsr.2008.293.309
INTRODUCTION
Communication via speech is one of the essential functions of human beings. Humans possess varied ways to retrieve information from the outside world or to communicate with each other and the three most important sources of information are speech, images and written text. For many purposes, speech stands out as the most efficient and convenient one.
Around 10% of the population in developed countries s uffers from hearing impairment (Ay et al., 1997).
Auditory function restoration in profoundly to totally deaf patients has been successfully achieved by implanting wireless micro stimulators through direct electrical stimulation of auditory residual nerve fibbers of the cochlea.
Cochlear stimulation induced, could initiate a nerve impulse (action potentials) that could be transmitted through synapses into auditory nerve center.
Such biomedical apparatus permitting this electrical stimulation was referred to as cochlear implant prosthesis.
![]() | |
Fig. 1: | Basic architecture of cochlear prosthesis |
Standard conception normally includes two main parts: an external unit, the sounds analyzer and an internal chirurgically implanted unit, the under-skin implant, which are connected by an inductive transcutaneous link, as shown Fig. 1.
Electric stimulation of nerve hearing cells` induces a nervous impulse (nervous message), which will be forwarded until cerebral areas for being subjectively interpreted. Several speech processing algorithms, often referred to as stimulation algorithms, were elaborated by research groups to extract essential parameters for the cochlea`s stimulation (Hamida, 1998; Kallel et al., 2006).
Different stimulation strategies were proposed during the past few years including simple filtering modules distributed along one considered sounds spectrum. These filters were in the analogical form at the beginning of this biomedical research and then improved to the numerical form with the emergence of the digital sounds processors (Hamida, 1998). Other strategies were also numerical and based on spectrum analysis including FFT algorithms (Hamida, 1998; Kallel et al., 2006). As an important time-frequency analysing method, the Wavelet Transform (WT) has a multiresolution feature that is similar to that of the hearing perception procedure (Derbel et al., 2007a).
Our proposed stimulation strategy for cochlear prosthesis would be based on one processing approach conceived around Continuous Wavelet Transform (CWT) witch is an important time-frequency analyzing method. It is characterized by its time-frequency resolution that is similar to the hearing perception procedure (Yang et al., 1992).
It is widely accepted today that the generation of Otoacoustic Emissions (OAEs) is related to the active mechanism of the cochlea in vivo and that this active mechanism is associated with activities of the Outer Hair Cells (OHCs) (Long and Talmadge, 1997). Bionic Wavelet Transform (BWT) is a new time-frequency analysis method recently proposed in the biomedical engineering field. It is based on active cochlear model introduced into CWT. The meaning of Bionic is rooted from an active biological mechanism (Jun, 2001).
In this study we develop CWT and BWT algorithms based stimulation strategy proposed for this type of hearing-aid systems.
These algorithms were conceived around a temporal approach using wavelet transform based on Morlet mother wavelet and distributed along the Munich critical bands. Then, we compared performances of these two algorithms. In this processing system, we can use both CWT and BWT based on auditory model.
Energy would be calculated for different specified frequency bands relatively to cochlear prosthesis` stimulation channels.
Extracted energies would be then coded for being transmitted to the internal part of cochlear prostheses for generating the adequate stimuli ton the cochlea`s nervous sites (Hamida, 1998; Kallel et al., 2006).
Special attention was taken for performance and efficiency of the BWT based speech processing. Hence, we used different input signals in order to validate our strategy and show the advantages of BWT than CWT algorithm.
The multiple choices of signal processing algorithms were highly appreciated because it involves apparatus adaptation to various pathological cases and provide convivial choice to clinicians during clinical adjustments (Hamida, 1998; Ghorbel et al., 2006).
CWT SPEECH PROCESSING FORMALISM
Mathematic Formulation of CWT
The continuous wavelet transform CWT is used to conceive a digital filter banc for a speech processing strategies used for the external part of cochlear prosthesis system (Rioul and Duhmel, 1992; Choi and Baraniuk, 2000).
The time-domain definition of CWT of f(t) is given by Eq:
![]() | (1) |
This analysis uses a family of functions ψa,b(t) based on a mother wavelet ψ (Yang et al., 1992). ψa,b(t) is given by the following equation:
![]() | (2) |
Where: | ||
a | = | Dilatation parameter |
b | = | Translation parameter |
![]() | = | Envelope of ψ(t) |
The equivalent frequency-domain definition of Eq. 1 is:
![]() | (3) |
Where, F(ω) and Ψ(ω) are the Fourier transforms of f(t) and ψ(t), respectively (Yang et al., 1992). |
According to the theory of Fourier transform, if the center frequency of the mother wavelet Ψ(ω) is ƒ0, then that of Ψ(aω) is ƒ0/a. Besides, in the analyzed result, different scales extract different frequencies from the original signal.
Selection of the Mother Wavelet
The commonly used mother wavelet in the CWT is Morlet and Marr wavelet (Derbel et al., 2007b). Due to the easy selection of its center frequency and its exponential form, Morlet wavelet is chosen as the mother wavelet to decompose the audible spectrum for the cochlear bank filter. Morlet wavelet is defined as:
![]() | (4) |
Considering a constant surface of elementary paving in time frequency domain, we can prove that:
![]() | (5) |
The result of a mathematical demonstration can be summarized as:
![]() | (6) |
Two approximate functions that mean to ERB (Equivalent Rectangular Bandwidth) model exist and titled as Cambridge and Munich critical bands (The Bionic Ear Institute Annual Report, 2000).
The first is expressed by this BM equation:
![]() | (7) |
and the second is expressed by this BC equation:
![]() | (8) |
Where, f0 is the band frequency center in KHz. |
A comparative study between these two functions showed that Munich critical bands are better (Derbel et al., 2007a).
Consequently the Munich based mother wavelet is given by:
![]() | (9) |
Stimulation Strategy Principle
CWT is similar to the perception process of the auditory system. This is the main reason why, in this study our stimulation strategy is based on CWT algorithm (Derbel et al., 2007b).
The Fig. 2 architecture gives several details concerning the proposed stimulation strategy dedicated to cochlear prostheses apparatus. We distinguish the main part including CWT algorithms, a second part for energy estimation to be forwarded as stimulation level to the inner part (Kallel et al., 2006).
Filter Banc Conception
The spectral analysis of the Munich based mother wavelet form for the ERB various frequencies center as shown in Fig. 3.
In our case, we took 16000 Hz as a sampling frequency, so a bandwidth equal to 8000 Hz. Consequently we consider a digital filter banc with 21 bands, so we have to extract 21 parameters from the input signal.
With the ERB critical bands as a sharing spectrum model and according to Eq. 7 it can be found that ƒ0 is equal to 8000 Hz (Quateri, 2001).
![]() | |
Fig. 2: | Bloc architecture of the proposed stimulation strategy |
![]() | |
Fig. 3: | A twenty-one bands Munich based filters bench |
Using the parameter give here, the final expression of the Munich based morlet wavelet used in this study becomes
![]() | (10) |
Its time domain waveform and amplitude spectrum is plotted in Fig. 4.
![]() | |
![]() | |
Fig. 4: | Munich based morlet wavelet used: (a) Time waveform and (b) amplitude spectrum |
It can be seen that, in the time domain, the mother wavelet is a high-frequency vibration whose amplitude decreases when the time varies from zero to infinity and in the frequency domain, as expected, it is narrow bands with 8 KHz as its center frequency.
BIONIC WAVELET TRANSFORM
Bionic Wavelet Transform (BWT) is a new time frequency method based on an auditory model (Yao and Zhang, 2001).
Human Auditory Front-End Periphery
The outer ear, middle ear and inner ear constitute the human peripheral auditory system. Sound travels through the outer ear into the ear canal and results in the vibrations of the eardrum that connects the outer ear to the middle ear. Some frequencies of the incoming sound are being attenuated more than others. The middle ear transforms the air vibrations from the outer ear into the vibrations in the fluid of the inner ear.
The inner ear performs most of the signal processing tasks of acoustic signals. Its organs include the cochlea, basilar membrane, the inner and outer hair cells and the auditory nerves. The major component of the inner ear is the cochlea that can be viewed as a spatial frequency analyzer (Quateri, 2001).
For above 800 Hz in humans the transfer function of those cochlear filters remain approximately invariant except for a translation or time-shift while the center frequency of those filters decreases along the base to the apex (Yang et al., 1992) We can see that the spatial axis of the cochlea is akin to the scale axis and the outputs of the cochlear from each stimulus act as those of a standard wavelet transform.
However, by making this signal processing abstraction we have ignored several nonlinear phenomena that achieve high sensitivity and frequency selectivity of the cochlea. Among them the so called active cochlear mechanism is found (Quateri, 2001). The bionic wavelet transform elaborated hereafter is originated from this active cochlear mechanism that has been qualitatively reflected in the modified Giguere`s auditory model.
Modified Giguere`s Auditory Model
Giguere and Woodland (1994) put separate auditory model stages together and further constructed a computational model from the external ear to the inner hair cells. Three major parts are included in Giguere`s model, namely the auditory canal, the middle ear and the cochlea (Giguere and Woodland, 1994; Long and Talmadge, 1997).
Recent research (Yao and Zhang, 2002) has adopted Giguere`s model for the qualitative description of this active phenomenon of the cochlea. In Giguere`s model, at one point of the basilar membrane, we qualitatively model the displ et amacement of the basilar membrane as a function of the tone frequency d(x,t) by the following equation:
![]() | (11) |
Where: | ||
x | = | Distance along the BM from the basal end |
t | = | Time |
d | = | Displacement the BM |
![]() | = | First and second order fifferentials of d in terms of t |
P | = | Pressure difference across the BM. |
w0 | = | Character frequency of the point and equals ![]() |
The equivalent resistance is given by:
![]() | (12) |
Where: | ||
R(x) | = | Passive resistance corresponding to the acoustic resistance |
d1/2 | = | Saturation factor |
G1(x) | = | Active gain factor whose value is related to the activity of the corresponding outer hair cell group |
The second term in (11) represents the active resistance function of the OHC control.
Recent research results, however, have shown that nonlinear damping alone is not enough to describe the active mechanism of the cochlea and that nonlinear compliance is also necessary (Gelfand, 1998; Hubbard and Mountain, 1995) pointed out that the output of the inner hair cells (IHCs) is proportional to the velocity of the displacement of the basilar membrane. We, hence, further introduce the nonlinear capacitance as follows:
![]() | (13) |
Where, G2(x) is the active factor that is related to the effect of the active mechanism on the compliance of a point on the basilar membrane. This changes the tone frequency w0 to and makes it adaptive rather than fixed.
As Req and Ceq vary according to the displacement of the basilar membrane, the pervious signal processing abstraction of the cochlea as having constant quality factor is no longer valid. Based on precedent equations it becomes:
![]() | (14) |
It is this adaptable new quality factor that motivates us to arrive at a new wavelet transform with adaptive mother wavelet according to not only the frequency changes of the target signal but also to the amplitude of it.
Organization of Bionic Wavelet Transform
Basically, the idea of the BWT is to make the envelope of the mother wavelet time varying according to the characteristics of the target signal by introducing a time varying T function.
To emulate the active control function of the hair cells of the auditory system, the new mother wavelet ψT(t) expression is given in the following way:
![]() | (15) |
The first T is the scaling factor that insures the energy is the same for every mother wavelet and the second T adjusts the envelope of the mother wavelet ψ(t) without adjusting its center frequency. Applying the CWT definition with our new adaptive mother wavelet, we have the bionic wavelet transform:
![]() | (16) |
Given T as a constant in a short enough period, Fourier transforms of ψ(t) and ψ1(t) successively is as follows:
![]() | (17) |
![]() | (18) |
Where, ψ and ψT(t) represents the Fourier transforms of ψ(t) and ψT(t), respectively. |
To introduce the active mechanism of the auditory system into CWT, we replaced the constant quality factor Q0 by a variable QT:
![]() | (19) |
After substituting Req and Ceq into Qeq, we have:
![]() | (20) |
Comparing (19) with (20), we conclude:
![]() | (21) |
BWTf (a,b) | = | Coefficient of the BWT at time τ and scale a that maps the displacement of the basilar membrane d(x,t) |
G1 and G2 | = | Active factors |
BWTs | = | Maps the saturation constant factor d1/2 |
Δτ | = | A calculations step |
Obviously T is related to signal instantaneous amplitude and its first-order differential. The values of G1 and G2 and BWTs depend on the target signal properties and other parameter settings. We chose G1 = 0.87, G2 = 45 and BWTs = 0.8 in our experiments (Yao et al., 1999).
IMPLEMENTATION AND PERFORMANCE EVALUATION
Here, first, we describe the implementation details of the proposed bionic wavelet transform algorithm. Then, we used different input signals in order to validate our strategy and show the advantages of BWT than CWT algorithm.
BWT Based Stimulation Strategy Implementation
Clearly from Eq. 15 the centers of the analyzing window in the time and frequency domains are the same as those of CWT and BWT. The only difference is in the envelope of the mother wavelet, it is that BWT mother function is adjusted by the parameter T. The introduction of the factor 1/T is for normalization.
Consequently the Bionic Munich based mother wavelet is given by:
![]() | (22) |
To obtain the Eq. 18, T must be a constant in the period of Δτ. This approximation is reasonable if the signal and its first differential are continuous and Δτ is small enough. It is clear from section III, that the bionic wavelet transform is nonlinear.
Figure 5 presents the BWT based signal processing diagram. We note that BWT have almost the same form as the CWT but with a feedback phase in the process of the reformulation of the T-function in order to obtain the bionic mother wavelet.
![]() | |
Fig. 5: | Bionic wavelet transform parameterization technique |
Experimental Results
Validation of the Algorithm with Constructed Harmonic Signals
To show that BWT results in better trade-off between time and frequency resolutions than traditional CWT, two constructed harmonics given by:
![]() |
Figure 6 and 7 shows the time-frequency distributions and the different power level, respectively for CWT and BWT of the signal s1.
The time-frequency distribution and the different power level for CWT and BWT of the signal s2 are presented in Fig. 8 and 9, respectively.
We note that the signal s1 has two different frequency components, which change sharply from 2 to 4 kHz at time 0.032 sec.
We remark that, Fig. 6a presents an activation in time-frequency distribution of the band where is localized the harmonic 4 kHz only, but in Fig. 6b, with BWT, a sharp frequency transient at the time 0.032 see and an activation of two bands where are localized 2 and 4 KHz are presented clearly.
In Fig. 7, the parameterization of the constructed harmonic s1 is presented correctly with BWT then with CWT, that is in (a), only the band where is localized the harmonic 4 KHz in indicated, but in (b), we remark that we have activated the bands where is localized the two corresponding harmonic.
As a second validation test, we take the case of the signal s2, where concurrently to a sharp frequency transient at the time 0.032 see we used an addition of two harmonics at each part of this signal.
It is clear from the time-frequency distribution plot of the constructed harmonics in Fig. 6 and 8 that BWT achieves better trade-off between time and frequency resolutions than CWT. 21 scales were used in our analysis. In Fig. 7 and 9a, not all the scales where is localized the corresponding constructed harmonic are plotted, but in Fig. 7 and 9b each band that correspond to each harmonic is activated. The corresponding center frequencies accorded to ERB model (The Bionic Ear Institute Annual Report, 2000)
Fig. 6: | Time-frequency distribution of constructed harmonic s1 by (a) CWT and (b) BWT |
Fig. 7: | : Different power level of constructed harmonic s1 by (a) CWT and (b) BWT |
![]() | |
![]() | |
Fig. 8: | Time-frequency distribution of constructed harmonic s2 by (a) CWT and (b) BWT |
Fig. 9: | Different power level of constructed harmonic s2 by (a) CWT and (b) BWT |
![]() | |
![]() | |
Fig. 10: | Results of (a) CWT and (b) BWT application to speech signal /aa/ in time-frequency distribution |
Parameterization of Real Speech Signals
According to satisfactory results obtained for pure and composite sounds, we could approve our speech processing for a real speech signal extracted from ‘TIMIT’ data base. The temporal distribution and the spectrum denoted real harmonics or formant characterizing the considered speech segment used for this test.
![]() | |
![]() | |
Fig. 11: | Parameters for speech signal /aa/ by (a) CWT and (b) BWT application |
![]() | |
![]() | |
Fig. 12: | The results of (a) CWT and (b) BWT application to speech signal /dx/ in time-frequency distribution |
![]() | |
![]() | |
Fig. 13: | Parameters for speech signal /dx/ by (a) CWT and (b) BWT application |
![]() | |
![]() | |
Fig. 14: | Results of (a) CWT and (b) BWT application to speech signal /eh/ in time-frequency distribution |
![]() | |
![]() | |
Fig. 15: | Parameters for speech signal /eh/ by (a) CWT and (b) BWT application |
![]() | |
![]() | |
Fig. 16: | Results of (a) CWT and (b) BWT application to speech signal /sh/ in time-frequency distribution |
![]() | |
![]() | |
Fig. 17: | Parameters for speech signal /sh/ by (a) CWT and (b) BWT application |
![]() | |
![]() | |
Fig. 18: | Results of (a) CWT and (b) BWT application to speech signal /epi/ in time-frequency distribution |
![]() | |
![]() | |
Fig. 19: | Parameters for speech signal /epi/ by (a) CWT and (b) BWT application |
The test of BWT application to speech signal processing in CI were conducted on five phonemes, /aa/, /dx/, /eh/, /sh/ and /epi/. Speech signal were sampled with the same equipment and the same configurations as used in section a. The total number of scales, for both BWT and CWT, is 21, the speech signal was decomposed to 21 different channels with different center frequencies ranging from 100 to 7700 Hz.
Figure 10, 12, 14, 16 and 18 show the distributions in the time-frequency domains for CWT and BWT of the speech signals /aa/, /dx/, /eh/, /sh/ and /epi/, respectively.
Figure 11, 13, 15, 17 and 19 show the different power levels for CWT and BWT of the speech signals /aa/, /dx/, /eh/, /sh/ and /epi/, respectively.
DISCUSSION
In this study, we were interested in cochlear implant research wish is actually a developing field regrouping researchers in different disciplines. All researchers are nowadays looking for new flexible techniques of stimulation and rehabilitation.
We purpose a wavelet transform based speech processing algorithm for one stimulation strategy. First, we developed a CWT filter Banks using Morlet mother wavelet in order to generate 21 energy levels of the considered input signal. An optimized choice of central frequencies was taken into account in order to share the considered spectrum according to the ERB critical bands. In the purpose of improving our stimulation algorithm, we introduced the active mechanism of the auditory system into CWT approach, which is the BWT algorithm. It provides new and effective properties for speech signal processing. In fact, it is a bio-signal processing method based on human bio-system.
Simulation results based on considered signals shows that the BWT based on the auditory model achieves adjustable resolution for both time and frequency and it has on the other hand high sensitivity for frequency selectivity.
Experimental results based on real speech signal show that BWT presents a good temporal signal distribution because the sharp frequency transient is detected only by BWT. This proves that BWT achieves better trade-off between time and frequency resolutions than CWT.
Furthermore, results based on speech signal parameterization show that with the same signal data, more energy is retained by BWT than by CWT.
This programmable stimulation technique, which was based on these two approaches, the CWT and the BWT, includes different programming tools to maximize stimulation flexibility thereby exploring total hearing capacities of patients and satisfying different pathological cases` needs. This would give clinicians accessibility to improve management of stimulation parameters in order to achieve apparatus adaptation to the different patient cases.
On the other hand, this strategy could be extended for using more stimulation channels and for using other speech processing tools, depending to the considered apparatus.
Finally, we can indeed note that our strategy not gather only the efficiency and flexibility criteria, which was not offered by other strategies, but also the resumption of all the performances of these existing strategies.
REFERENCES
- Ay, S.U., F.G. Zeng and B.J. Sheu, 1997. Hearing with bionic ears: Speech processing strategies for cochlear implants devices. Circuits Devices, Mag. IEEE, 13: 18-23.
Direct Link - Derbel, A., F. Kallel, A.B. Hamida and M. Samet, 2007. Wavelet filtering based on mellin transform dedicated to cochlear prostheses. Conf. Proc. IEEE Eng. Med. Biol. Soc., 2007: 1900-1903.
PubMedDirect Link - Ghorbel, M., M. Samet, A.A.B. Hamid and J. Thomas, 2006. An advanced and versatile CMOS current driverfor multi-electrode cochlear implant microstimulator. J. Low Power Elect., 2: 442-455.
Direct Link - Jun, Y. and Y.T. Zhang, 2001. Bionic wavelet transform: A new time-frequency method based on an auditory model. IEEE Trans. Biomed. Eng., 48: 856-863.
Direct Link - Rioul, O. and P. Duhmel, 1992. Fast algorithm to discrete and continous wavelet transforms. IEEE Trans. Signal Process., 38: 569-586.
Direct Link - Yang, X., K. Wang and S.A. Shamma, 1992. Auditory representations of acoustic signals. IEEE Trans. Inform. Theor., 38: 824-839.
CrossRefDirect Link - Yao, J., Y.T. Zhang, F.S. Yang and D.T. Ye, 1999. Synthesis and decomposition of transient-evoked otoacoustic emissions based on an active auditory model. IEEE Trans. Biomed. Eng., 46: 1098-1106.
PubMed - Jun, Y. and Y.T. Zhang, 2001. Bionic wavelet transform: A new time-frequency method based on an auditory model. IEEE Trans. Biomed. Eng., 48: 856-863.
Direct Link - Yao, J. and Y.T. Zhang, 2002. The application of bionic wavelet transform to speech signal processing in cochlear implants using neural network stimulations. IEEE Trans. Biomed. Eng., 49: 1299-1309.
Direct Link