Speech Enhancement Control Design Algorithm for Dual ......algorithm is superior to other...

Research ArticleSpeech Enhancement Control Design Algorithm for Dual-Microphone Systems Using β-NMF in a Complex Environment

Dong-xia Wang , Mao-song Jiang, Fang-lin Niu , Yu-dong Cao, and Cheng-xu Zhou

School of Electronic and Information Engineering, Liaoning University of Technology, Jinzhou, Liaoning 121001, China

Correspondence should be addressed to Dong-xia Wang; [email protected]

Received 15 April 2018; Revised 3 July 2018; Accepted 26 July 2018; Published 9 September 2018

Academic Editor: Junpei Zhong

Copyright © 2018 Dong-xiaWang et al. This is an open access article distributed under the Creative Commons Attribution License,which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Single-microphone speech enhancement algorithms by using nonnegative matrix factorization can only utilize the temporal andspectral diversity of the received signal, making the performance of the noise suppression degrade rapidly in a complexenvironment. Microphone arrays have spatial selection and high signal gain, so it applies to the adverse noise conditions. In thispaper, we present a new algorithm for speech enhancement based on two microphones with nonnegative matrix factorization.The interchannel characteristic of each nonnegative matrix factorization basis can be modeled by the adopted method, such asthe amplitude ratios and the phase differences between channels. The results of the experiment confirm that the proposedalgorithm is superior to other dual-microphone speech enhancement algorithms.

1. Introduction

For the sake of improving the quality and intelligibility ofnoisy signals, speech enhancement is widely applied in manyfields including speech communication, speech coding, andspeech recognition. In terms of the number of microphones,speech enhancement methods can be split into two classes:single microphone and microphone array.

In the past, there have been many single-microphonespeech enhancement algorithms presented including statisti-cal model method, spectral subtraction, subspace decomposi-tion, and other typical algorithms. These algorithms have agood noise suppression performance under stationary condi-tions, but at the cost of a priori information loss of cleanspeech and noise, in which it provides limited performanceunder a complex environment.

Recently, a new matrix decomposition algorithm callednonnegative matrix factorization (NMF) [1] method hasbeen successfully used to solve a variety of problems in manyfields. NMF is a powerful method for machine learning andhidden data discovery; the basic idea of the method is thatone nonnegative matrix is decomposed into the product oftwo nonnegative matrices without making any statistical

hypothesis of data. Compared with the traditional matrixdecomposition algorithm, it has a strong physical signifi-cance, it has small storage, and it is simple and easy toimplement. The results show that it has been widely usedto effectively solve various problems including pattern clus-tering and classification tasks [2–5], source separation [6],and speech enhancement [7]. In voice applications, we canobtain a priori information by using train data with NMFinstead of the clean signal.

Currently, according to the different methods in machinelearning, a single speech enhancement method based onNMF can be categorized into unsupervised learning andsupervised learning algorithms [7]. Unsupervised methodsare simple and easy to implicate without any prior informa-tion on the speech or noise, whose main difficulty is estimat-ing the noise power spectral density (PSD) [8], especially in acomplex environment.

For the supervised methods, selecting a proper modelneeds to consider not only the aspect of the speech andnoise signals but also the model parameter estimation usingthe training samples of those signals. One advantage of thesemethods is estimating the noise PSD without the need to useother algorithms. Compared with the unsupervised methods

HindawiComplexityVolume 2018, Article ID 6153451, 13 pageshttps://doi.org/10.1155/2018/6153451

http://orcid.org/0000-0003-1830-4662

http://orcid.org/0000-0001-9395-3529

http://orcid.org/0000-0002-6348-9910

https://doi.org/10.1155/2018/6153451

under a complex environment, the studies have been provedthat the supervised method is an effective way of obtainingbetter performance of the enhanced speech signals.

In order to solve the problem of the characteristic of mis-match between training data and testing data, a supervisedNMF-based algorithm is proposed in speech enhancementto incorporate with some prior information, including tem-poral continuity [9] and statistical distribution of the data[10]. More recently, aiming at improving the general sub-space constraints, an improved NMF algorithm is proposedby introducing additional terms into the objective function[11]. A framework for decreasing the computational com-plexity in NMF by using the extreme learning machine(ELM) is designed in [12]. ELM and its variants have beenwidely applied in different kinds of fields, because of its goodscalability and strong generalization performance [13]. Withthe unceasing development of human-computer interactionrecently, higher requirements for speech recognition andcomputer vision are put forward in a complex environment.In [14–16]; the control scheme for improving the conver-gence speed is developed to optimize system performance.

In [17], a speech enhancement method for solving thedifficult problem of manual selection modes by applying aregularized nonnegative matrix factorization algorithm ispresented. In practical application, however, the speech sig-nals have spatial characteristics (spatial diversity of reverber-ation guidance), which is not present in the single-microphone system. One microphone has good performancein speech enhancement system, however, it only uses bothtemporal and spectral information of signal and lacks spatialinformation.

The two-microphone system has attracted much atten-tion for its small size and small amount of calculation, whichis in line with the trend of miniaturization of devices. Analgorithm for achieving a dual-microphone speech enhance-ment by using the coherence function is proposed [18]. In[19], the improved method, which incorporates the coher-ence function and the Kalman filter, is used to obtainenhanced speech signal. These algorithms belong to theunsupervised methods in a sense. Therefore, we propose anovel β-NMF for a dual-microphone speech enhancement.The interchannel characteristic of each NMF basis can bemodeled for the method by applying the spatial diversity ofspeech signals.

The paper is arranged as follows: Section 2 reviews theobjective function of standard NMF with β divergence.Section 3 extends it to the dual-microphones system forthe NMF basis. Section 4 presents a two-channel speechsignal model and details the proposed speech enhance-ment framework. Section 5 presents simulation resultsand Section 6 the conclusion.

2. Nonnegative Matrix Factorization withβ Divergence

In a single-microphone system, let y t , t ∈ R be theobserved value of one microphone for a specific time dura-tion. By applying the short-time Fourier transform (STFT)

to y t , we can obtain a complex matrix Y = yij ∈ CI×J

(i ∈ 1, 2,… , I denotes the number of frequency bins andj ∈ 1, 2,… , J the number of time frames). Using the stan-dard NMF, the amplitude Z = ∣Y∣ or equivalently zij = ∣yij∣is analyzed in [1]. Finally, the NMF-based algorithm is to finda local optimal decomposition, which is defined as

Z ≈ Z = TV, 1

where T = tik ∈ RI×K+ is a basis matrix, V = vkj ∈ RK×J

+ is acoefficient matrix, and K is the number of basis vectors.

For the sake of seeking for two nonnegative matrices suchthat the difference between Z and the product TV is mini-mized, define a measure function D to obtain the optimaldecomposition

arg minT,V

D Z Z , s t T ≥ 0,V ≥ 0, 2

where D Z Z denotes the error divergence functionbetween the observed data Z and the reconstructed data Z.The different probability models can be derived by (2), andthen different types of cost functions are obtained by themaximum likelihood. Selecting an appropriate objectivefunction is the key in formulating the NMF algorithm. Here,the objective function is derived by using a parametric diver-gence measure, namely, the β divergence [20]

Dβ Z T,V =〠ij

zijzβ−1ij − TV β−1

ij

β β − 1+ TV β−1

ij

TV ij − zijβ

β ∈ \ 0, 1 ,3

where β reflects the reconstruction penalty. The selectionof parameter β depends on the statistical distributioncharacters and requires prior knowledge. When β = 2, theresult is shown as the squared Euclidean distance (ED);when β→ 1, the result is approximately equal to theKullback-Leibler (KL) divergence; and when β→ 0, theresult is nearly equal to Itakura-Saito divergence.

T andV are expressed by applying multiplicative iterativeupdating rules as described in [21]; the update rules are givenas

T← T ⊗Z/ TV 2−β V⊤

TV β−1V⊤, 4

V←V ⊗T⊤ Z/ TV 2−β

T⊤ TV β−1 , 5

where the operation ⊗ represents an element-wise multipli-cation, / and the quotient line are performed element-wisedivision, and the superscript ⊤ is the matrix transpose. Asfor the initializations of T and V, positive random numbersare often used.

2 Complexity

3. The Dual-Microphone Model for NMFBasis with β Divergence

This section proposes an extension of the standard NMF.Compared with multichannel speech enhancement, dual-channel speech enhancement has advantages in manyaspects. Assume that X 1 and X 2 explain the observationsof the 1st and 2nd microphones in the time-frequencydomain, respectively. In [22], a new interchannel matrix His defined, which represents the spatial characteristicsbetween two channels, and they have both common nonneg-ative matrices T and V to model multichannel observations.

3.1. Preprocessing and Modeling. The first is only consideringthe amplitude observations in the time-frequency domainwhen we use the standard NMF algorithm for speechenhancement. The observation of the 1st channel is obtainedand acted as a reference

X 1 =X 1 ⊗X 1

X 1

∗

, 6

where ∗ is the complex conjugate, in order to fully reflect theinterchannel characteristic, and then the same is done for the2nd channel with the expression of

X 2 =X 2 ⊗X 1

X 1

∗

7

According to the above preprocessing principle, we canfind that X 1 is not only a nonnegative matrix but also acomplex matrix. Hence, an accurate modeling for the firstchannel is designed by using (3), and then an accuratemodeling for the second channel is designed by introducingan interchannel matrix H = hik ∈ CI×K , where ∑ihi = 1 usesrandom initialization. The interchannel characteristic hikcontains spatial information of the 2nd channel.

3.2. Maximum Likelihood Estimation and Its Cost Function.Using the dual-channel probabilistic model, the likelihoodis written as

p X 1 ,X 2 T,V,H ∝ p X 1 T,V p X 2 T,V,H ,

8

where we assume that the data follows the probability distri-bution. Thus, the maximum negative log-likelihood solutionof (8) is represented as

arg max logH≥0,T≥0,V≥0

p X 1 ,X 2 T,V,H

= −log p X 1 T,V − log p X 2 T,V,H

= −〠ij

X 1ij

X 1 β−1ij

− TV β−1ij

β β − 1

+ TV β−1ij

TV ij − X 1ij

β

−〠ij

X 2ij

X 2 β−1ij

− H ⊗ TV β−1ij

β β − 1

+ H ⊗ TV β−1ij

H ⊗ TV ij − X 2ij

β

=c D 1β X 1 TV +D 2

β X 2 H ⊗ TV ,

9

where = c represents equality up to irrelevant constant terms.The former term is explained in Section 2, and now the latterterm is given by

D 2β X 2 H ⊗ TV

=〠ij

X 2ij

X 2 β−1ij

− H ⊗ TV β−1ij

β β − 1

+ H ⊗ TV β−1ij

H ⊗ TV ij − X 2ij

β

10

The gradient is expressed with respect to α of the costfunction ∇αD (The subscript of the cost function of the 2ndterm is omitted for convenience, where α ∈ H, T,Vdenotes a variable.) as the difference of two positive terms∇−αD and ∇+

αD as

∇αD = ∇+αD − ∇−

αD 11

The solution can be expressed by applying general heu-ristic multiplicative update rules as

α← α ⊗∇−αD

∇+αD

12

The derivative of the cost function of the 2nd term in (10)with respect to H, T, and V are shown as

∂D∂T

= −X 2

H ⊗ TV 2−β V⊤H⊤1I×K + H ⊗ TV β−1V⊤H⊤1I×K ,

∂D∂V

= −TΤ ⊗HΤ X 2

H ⊗ TV 2−β + T⊤ ⊗H⊤ H ⊗ TV β−1,

∂D∂H

= −X 2

H ⊗ TV 2−β V⊤T⊤1I×K + H ⊗ TV β−1V⊤T⊤1I×K

13

3Complexity

This leads to the following updating rules by using thecost function of (9), and then the complex matrices and non-negative matrices T and V are estimated by using the updaterule of [21]; we can obtain the gradient of the cost functionwhich is rewritten as

T← T ⊗X 1 / TV 2−β V⊤ + X 2 / H ⊗ TV 2−β HV ⊤1I×K

TV β−1V⊤ + H ⊗ TV β−1 HV ⊤1I×K,

14

V←V ⊗T⊤ X 1 / TV 2−β + H ⊗ T ⊤ X 2 / H ⊗ TV 2−β

T⊤ TV β−1 + H ⊗ T ⊤ H ⊗ TV β−1,

15

H←H ⊗X 2 / H ⊗ TV 2−β TV ⊤1I×K

H ⊗ TV β−1 TV ⊤1I×K, 16

where 1I×K is a I × K matrix of ones. As is shown by Formulas(14), (15), and (16) derived above, it can reduce to single-channel counterparts (4) and (5) if only one microphone isused, and the interchannel matrix H is a unit matrix.

4. Proposed NMF-Based SpeechEnhancement Algorithm

Assuming that dual microphones are set up in a complexenvironment, and the noise and target speech signals are spa-tially separated. Let s t be the target speech, and then the

noisy speech signal of the mth microphone ym t can bedefined with the expression of

ym t = am t ∗ s t + nm t m = 1, 2 , 17

where ∗ is the operator of conjunction, m is the micro-phone index, t is the sample index, and am t and nm trepresent room reverberation and noise, corresponding tothe mth microphone, respectively. The block diagram ofthe proposed algorithm is described in Figure 1, whichmainly includes two parts: the training stage and theenhancement stage.

4.1. Training Stage. By applying STFT, (17) can be repre-sented in the frequency domain

Ym i, j =Am i, j S i, j +Nm i, j 18

At the stage of training, we chose the magnitude spectraof the clean speech and noise from the database as the datamatrix for the β-NMF processing to produce the basis matri-ces TS and TN , by using multiplicative iterative updatingrules given in (4) and (5) to the corresponding training data,separately. The basis matrices are saved as a joint dictionarymatrix, namely, T = TS TN , and as a priori information ofthe enhancement stage.

4.2. Enhancement Stage. The proposed enhancement stageconsists of three parts, firstly beamforming, secondly signalgain estimation, and finally speech signal reconstruction,which are explained in the next section.

Beamforming system

Speech enhancement system

STFTNMFV H, V

T=[TS TN]

Joint dictionarymatrix STFT

y1(t)

y(t)

y2(t)

Y1(i,j)

Y2(i,j)

Y (i,j)

∠Y

(i,j)

S (i,j)

W1

W2

G1

G2

G ISTFT

�휏 1

�휏 2

�훴

⌃S (t)⌃

Figure 1: The block diagram of the proposed algorithm.

4 Complexity

4.2.1. Beamforming. Beamforming is one of the most popularalgorithms which are the basis of microphone array speechenhancement. In general, the most common fixed beamfor-mers are the delay-and-sum and superdirective beamfor-mers. In the paper, we can use the delay-and-sum as

y t = 〠m

i=1wiyi t − τi , 19

where wi represents weight and τi denotes the time delaycompensation obtained by estimation.

4.2.2. Signal Gain Estimation. Firstly, two noisy speeches y1t and y2 t are used as input signals of this stage after delaycompensation, and then we obtain the magnitude spectra ofnoise by applying STFT, namely, Y1 and Y2 . Next, theyare factorized via the extension of NMF with the fixed jointdictionary matrix T = TS TN , which is just derived fromthe training stage via using the update rules given in(15) and (16). Accordingly, the magnitude spectra can beapproximately decomposed into an interchannel matrixH = HS HN and a coefficient matrixV = VS VN .

(1) Based on the above results, we can obtain the 1stchannel (as reference channel) gain function G1which is defined with the expression of

G1 = TSVS ⋅ / TV 20

(2) By using the interchannel matrix H, we can alsoobtain the 2nd channel gain function G2 which isrepresented as

G2 = HS ⊗ TSVS ⋅ / H ⊗ TV 21

(3) The final gain function G can be obtained for thiswork by Formulas (20) and (21). Furthermore, thegain estimation is achieved by

G =G1 ⊗G2, 22

where / is the element-wise division.

4.2.3. NMF-Based Signal Reconstruction. This stage is similarto a Wiener filtering process; the gain function G is obtainedby using (22) and acts as a Wiener filter. First, we obtain themagnitude spectra of y t by using STFT, namely, Y , andthen the magnitude spectra of the enhanced speech S isapproximately represented by

S =G ⊗ Y 23

Therein, the enhanced speech waveform s t is estimatedby using the inverse STFT.

5. Experimental Results

In this section, we perform an experiment to evaluate the per-formance of these methods with respect to quality and intel-ligibility. We compare the proposed method with the speechenhancement algorithm coherence based in [18] and thestandard NMF in terms of performance. The performanceof the proposed method is evaluated using a perceptual eval-uation of speech quality (PESQ) [23], source-to-distortion(SDR) [24], and segmental SNR (SSNR) which are used asthe objective measures, where a higher value indicates abetter result.

5.1. Experimental Setup. The selection of the clean speech andthe noise is the TIMIT database [25] and the NOISEXdatabase [26], where using downsampling we can adjust thesampling rate of all signals to 8 kHz. In this study, the train-ing for the clean speech contains 20 sentences (60 seconds)pronounced by 10 males and 10 females. Each of the testspeech signals for the speech enhancement work is one sen-tence. We select two background noises in the paper: theHfchannel and Factory1 noises. Besides, training data andtest data in the experiment are disjoint. For the proposedframework, the window function, the applied frame size,and the frame shift are Hamming window, 512 samples and128 samples, respectively. According to the standard decisionof K ≤ I J/ I + J , assuming the clean speech and noise basisvectors, K is set to 30, respectively, and let the maximumiteration number be equal to 50. The two microphones with a4 cmspacing distance picked up noisy speech signals whichwere generated by convolving the target and noise sourceswith a set of HRTFs measured inside a mildly reverberantroom (T60 ≈ 220ms) with dimensions 4.3× 3.8× 2.3m3

(length×width×height), by adding the noise to the cleantesting speech to generate the noisy signals at four signal-to-noise ratios (SNRs): −10, −5, 0, and 5dB. The distancebetween the target source and the midpoint of the two micro-phones is set to 1.2m. The direction of arrival (DOA) waschosen, respectively, according to θ∝ 0∘, 30∘, 60∘, 90∘, 120∘,150∘, 180∘ . The squared Euclidean distance β = 2 is usedfor simplicity.

Figure 2 shows the results of the PESQ, SDR, andSSNR metric with the variation of the θ values whileinput SNR is set to 0 dB under the Factory1 noise condi-tion. As can be seen, DOA of the target source has littleinfluence on the PESQ metric for these methods, but agreat effect on the other metrics. In the following exper-iments we ultimately chose θ = 60° for consistency andsimplicity. Figure 2 also indicates that the proposed methodcan suppress not only the background noise level effec-tively but also comparability when the angle of the sourceis set to 60°.

5.2. Speech Quality and Intelligibility Evaluation. To investi-gate the achievable gain estimation performance, we chosetwo background noises in a complex environment: the

5Complexity

0º 30º 60º 90º 120º 150º 180º0

0.5

1

1.5

2

2.5

3

Angle/ �휃

PESQ

Noisy

Coherence

Standard

Proposed

(a)

−4

−2

0

2

4

6

8

10

12

Angle/ �휃

SDR

0º 30º 60º 90º 120º 150º 180º

NoisyCoherence

StandardProposed

(b)

−4

−3

−2

−1

0

1

2

3

SSN

R

Angle/ �휃

0º 30º 60º 90º 120º 150º 180º

NoisyCoherence

StandardProposed

(c)

Figure 2: PESQ, SDR, and SSNR values of the enhanced speech from Factory1 noise at 0 dB input SNR.

6 Complexity

−10 −5 0 50

0.5

1

1.5

2

2.5

3

SNR (dB)

PESQ

NoisyCoherence

StandardProposed

(a)

−10 −5 0 5−10

−5

0

5

10

15

SNR (dB)

SDR

NoisyCoherence

StandardProposed

(b)

−10 −5 0 5−7

−6

−5

−4

−3

−2

−1

0

1

2

3

SNR (dB)

SSN

R

NoisyCoherence

StandardProposed

(c)

Figure 3: PESQ, SDR, and SSNR scores in Factory1 noise scenarios.

7Complexity

−10 −5 0 50

0.5

1

1.5

2

2.5

3

SNR (dB)

PESQ

NoisyCoherence

StandardProposed

(a)

−10 −5 0 5−10

−5

0

5

10

15

SNR (dB)

SDR

NoisyCoherence

StandardProposed

(b)

−10 −5 0 5−7

−6

−5

−4

−3

−2

−1

0

1

2

3

SNR (dB)

SSN

R

NoisyCoherence

StandardProposed

(c)

Figure 4: PESQ, SDR, and SSNR scores in Hfchannel noise scenarios.

8 Complexity

0.5 0.8 1 1.5 2 2.52.2

2.3

2.4

2.5

2.6

2.7

2.8

2.9

3

Divergence �훽

PESQ

�휃 = 0º�휃 = 30º�휃 = 60º�휃 = 90º

�휃 = 120º�휃 = 150º�휃 = 180º

(a)

0.5 0.8 1 1.5 2 2.53

4

5

6

7

8

9

10

11

Divergence �훽

SDR

�휃 = 0º�휃 = 30º�휃 = 60º�휃 = 90º

�휃 = 120º�휃 = 150º�휃 = 180º

(b)

0.5 0.8 1 1.5 2 2.5−1

−0.5

0

0.5

1

1.5

2

2.5

3

3.5

Divergence �훽

SSN

R

�휃 = 0º�휃 = 30º�휃 = 60º�휃 = 90º

�휃 = 120º�휃 = 150º�휃 = 180º

(c)

Figure 5: PESQ, SDR, and SSNR values of the enhanced speech from Factory1 noise at 0 dB input SNR.

9Complexity

−10 −5 0 51.6

1.8

2

2.2

2.4

2.6

2.8

3

3.2

SNR (dB)

PESQ

�훽 = 0.5�훽 = 1 (KL)

�훽 = 2 (ED)�훽 = 3

(a)

�훽 = 0.5�훽 = 1 (KL)

�훽 = 2 (ED)�훽 = 3

−10 −5 0 50

2

4

6

8

10

12

14

SNR (dB)

SDR

(b)

�훽 = 0.5�훽 = 1 (KL)

�훽 = 2 (ED)�훽 = 3

−10 −5 0 5−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

SNR (dB)

SSN

R

(c)

Figure 6: PESQ, SDR, and SSNR scores for the different divergence values from Factory1 noise at different SNR levels.

10 Complexity

Hfchannel and Factory1 noises. Figures 3 and 4 give someresults between the noisy signal and the enhanced signal withthe different methods where parameter β is set to 2.

From Figures 3 and 4, we can find that the proposedmethod leads to higher PESQ, SDR, and SSNR scores thanthe coherence-based method [18] and the standard NMFalgorithm [7] in almost all cases, which reveals that this algo-rithm could also prominently improve both the quality andintelligibility of speech signals. The analysis of PESQ scoresshows that the method in [18] has good stability and scarcelyaffected by SNR, but with lower performance correspondingwith other metrics. The latter tends to much distortion. It canbe also seen that the advantage of these algorithms becomesless evident with SNR increased. Compared with the

coherence-based method, the proposed methods based onNMF still attain improvement in objective measures.

Figure 5 shows the results of the PESQ, SDR, and SSNRmetric for the change in the incidence of angle under differ-ent β parameter conditions. From Figure 5, we can see thatthe change in the incidence angle has a significant influenceon the performance of the proposed method. Based on theobservation of SSNR values, for θ = 90°, the proposed methodhas better scores, but at the expense of speech quality andintelligibility. For θ = 120°, we can get an optimum solutionof the angle of incidence. Besides, by comparing analysis ofthe PESQ and SDR values with different β parameters, it isfound from simulating results that parameter β has greatinfluence on speech quality more than speech intelligibility

Freq

uenc

y (H

z)

0.5 1 1.50

1000

2000

3000

4000

(a)

PESQ = 1.82 SDR = 0.28SSNR = −1.56

0.5 1 1.50

1000

2000

3000

4000

(b)

Freq

uenc

y (H

z)

PESQ = 2.35 SDR = −0.45SSNR = −2.46

0.5 1 1.50

1000

2000

3000

4000

(c)

PESQ = 2.52 SDR = 8.66SSNR = 0.94

0.5 1 1.50

1000

2000

3000

4000

(d)

Time (s)

Freq

uenc

y (H

z)

PESQ = 2.82 SDR = 9.65SSNR = 1.34

0.5 1 1.50

1000

2000

3000

4000

(e)

Time (s)

Freq

uenc

y (H

z)

PESQ = 2.82 SDR = 9.65SSNR = 1.34

0.5 1 1.50

1000

2000

3000

4000

(f)

Figure 7: The spectrogram of (a) the clean speech signal, (b) the noisy signal obtained via the delay-and-sum, (c) the noisy speech enhancedby the coherence-based method, (d) the noisy speech enhanced by the standard NMF, (e) the noisy speech enhanced by the proposed method(β = 2), and (f) the noisy speech enhanced by the proposed method (β = 1).

11Complexity

under the same angle conditions. For β = 1, it not only canguarantee the accuracy of the proposed method but alsocan suppress the background noise level effectively withoutintroducing much distortion.

The simulation experiment shows the performance of theproposed algorithm with the different divergence and noisesin Figure 6. We can find that PESQ, SDR, and SSNR scoresbecome better and better when the SNR increases under thesame conditions. For the same SNR conditions, an optimumsolution with β→ 1 can be obtained where divergence tendsto the KL divergence. In fact, this observation can be inter-preted that the proposed method based on the KL divergencecan improve speech quality and intelligibility better thanother parameter properties. Besides, this result indicates thatunder the same conditions the proposed method has obviousimprovement of PESQ, SDR, and SSNR scores, especially atlow SNR. Hence, the proposed method can provide auralquality and noisy speech intelligibility.

5.3. Signal Spectrogram. By comparing the color depth ofspeech spectrograms, we can obtain the structural character-istics of residual noise and speech distortion. The spectro-grams of the different signals are presented in Figure 7. Itreflects that the performance of this method is better thanthat of those methods. Comparing to them for Factory1 noisewhile input SNR is set to 0 dB, it is easy to see that theproposed method based on NMF exhibits lower speech dis-tortion and residues than the traditional coherence-basedmethod and the standard NMF method do in the restoredspectrogram.

Besides, the β parameter influences the SDR scores. Inthe paper, the method based on the KL divergence is shownto be superior to the squared Euclidean distance in speechenhancement capability. Finally, the proposed speechenhancement framework based on KL-NMF provides thesignificant improvement in both quality and intelligibilityjustified by the higher evaluation scores.

6. Conclusions

We propose a dual-microphone speech enhancement frame-work based on β-NMF in the paper. This method extendssingle-microphone speech enhancement based on NMF byintroducing the interchannel matrix to the cost function. Itcan express the interchannel characteristic of each NMF basisvery well by applying a priori information. The results of theexperiment express that the presented method is effective innonstationary and low SNR conditions.

Data Availability

The data used to support the findings of this study areincluded within the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the Scientific Public WelfareResearch of Liaoning Province (20170056), National NaturalScience Foundation of China (no. 60901063), the Programfor Liaoning Innovative Research Team in University underGrant LT2016006, and the Program for DistinguishedProfessor of Liaoning Province.

References

[1] D. D. Lee and H. S. Seung, “Algorithms for non-negativematrix factorization,”Advances in Neural Information Process-ing Systems, vol. 13, pp. 556–562, 2001.

[2] W. S. Chen, Y. Zhao, B. Pan, and B. Chen, “Supervised kernelnonnegative matrix factorization for face recognition,” Neuro-computing, vol. 205, pp. 165–181, 2016.

[3] M. Han and B. Liu, “Ensemble of extreme learning machinefor remote sensing image classification,” Neurocomputing,vol. 149, pp. 65–70, 2015.

[4] M. Babaee, S. Tsoukalas, G. Rigoll, and M. Datcu, “Immersivevisualization of visual data using nonnegative matrix factoriza-tion,” Neurocomputing, vol. 173, pp. 245–255, 2016.

[5] Z. Xia, X. Feng, J. Peng, J. Wu, and J. Fan, “A regularized opti-mization framework for tag completion and image retrieval,”Neurocomputing, vol. 147, pp. 500–508, 2015.

[6] A. Lefèvre, F. Glineur, and P. A. Absil, “A convex formulationfor informed source separation in the single channel setting,”Neurocomputing, vol. 141, no. 141, pp. 26–36, 2014.

[7] N. Mohammadiha, P. Smaragdis, and A. Leijon, “Supervisedand unsupervised speech enhancement using nonnegativematrix factorization,” IEEE Transactions on Audio, Speech,and Language Processing, vol. 21, no. 10, pp. 2140–2151, 2013.

[8] C. M. Nelke, N. Chatlani, C. Beaugeant, and P. Vary, “Sin-gle microphone wind noise PSD estimation using signalcentroids,” in 2014 IEEE International Conference on Acous-tics, Speech and Signal Processing (ICASSP), pp. 7063–7067,Florence, Italy, May 2014.

[9] G. J. Mysore and P. Smaragdis, “A non-negative approach tosemi-supervised separation of speech from noise with the useof temporal dynamics,” in 2011 IEEE International Confer-ence on Acoustics, Speech and Signal Processing (ICASSP),pp. 17–20, Prague, Czech Republic, May 2011.

[10] H. Chung, E. Plourde, and B. Champagne, “Regularized NMFbased speech enhancement with spectral componenetsmodeled by Gaussian mixtures,” in 2014 IEEE InternationalWorkshop on Machine Learning for Signal Processing (MLSP),pp. 1–6, Reims, France, September 2014.

[11] Y. Liu, Y. Liao, L. Tang, F. Tang, and W. Liu, “General sub-space constrained non-negative matrix factorization for datarepresentation,” Neurocomputing, vol. 173, pp. 224–232, 2016.

[12] Q. He, X. Jin, C. Du, F. Zhuang, and Z. Shi, “Clustering inextreme learning machine feature space,” Neurocomputing,vol. 128, pp. 88–95, 2014.

[13] C. Yang, K. Huang, H. Cheng, Y. Li, and C.-Y. Su, “Hapticidentification by ELM-controlled uncertain manipulator,”IEEE Transactions on Systems, Man, and Cybernetics: Systems,vol. 47, no. 8, pp. 2398–2409, 2017.

[14] C. Yang, Y. Jiang, W. He, J. Na, Z. Li, and B. Xu, “Adaptiveparameter estimation and control design for robot

12 Complexity

manipulators with finite-time convergence,” IEEE Transac-tions on Industrial Electronics, vol. 65, no. 10, pp. 8112–8123,2018.

[15] L. Ma and D. P. Li, “Adaptive neural networks control usingBarrier Lyapunov functions for DC motor system with time-varying state constraints,” Complexity, vol. 2018, Article ID5082401, 9 pages, 2018.

[16] S. M. Lu, D. P. Li, and Y. J. Liu, “Adaptive neural network con-trol for uncertain time-varying state constrained robotics sys-tems,” IEEE Transactions on Systems, Man, and Cybernetics:Systems, no. 99, pp. 1–8, 2017.

[17] K. K. Sunnydayal and S. Cruces, “An iterative posterior NMFmethod for speech enhancement in the presence of additiveGaussian noise,”Neurocomputing, vol. 230, pp. 312–315, 2017.

[18] N. Yousefian and P. C. Loizou, “A dual-microphone speechenhancement algorithm based on the coherence function,”IEEE Transactions on Audio, Speech, and Language Processing,vol. 20, no. 2, pp. 599–609, 2012.

[19] W. Nabi, N. Aloui, and A. Cherif, “Speech enhancement indual-microphone mobile phones using Kalman filter,” AppliedAcoustics, vol. 109, pp. 1–4, 2016.

[20] C. Févotte and J. Idier, “Algorithms for nonnegative matrixfactorization with the β-divergence,” Neural Computation,vol. 23, no. 9, pp. 2421–2456, 2010.

[21] P. D. O’Grady and B. A. Pearlmutter, “Discovering speechphones using convolutive non-negative matrix factorisationwith a sparseness constraint,” Neurocomputing, vol. 72,no. 1-3, pp. 88–101, 2008.

[22] H. Sawada, H. Kameoka, S. Araki, and N. Ueda, “Formulationsand algorithms for multichannel complex NMF,” in 2011 IEEEInternational Conference on Acoustics, Speech and SignalProcessing (ICASSP), pp. 229–232, Prague, Czech Republic,May 2011.

[23] Y. Hu and P. C. Loizou, “Evaluation of objective quality mea-sures for speech enhancement,” IEEE Transactions on Audio,Speech, and Language Processing, vol. 16, no. 1, pp. 229–238,2008.

[24] E. Vincent, R. Gribonval, and C. Févotte, “Performance mea-surement in blind audio source separation,” IEEE Transactionson Audio, Speech, and Language Processing, vol. 14, no. 4,pp. 1462–1469, 2006.

[25] C. Lopes and F. Perdigao, Phone Recognition on TIMIT Data-base, Speech Technologies, 2011.

[26] A. Varga and H. J. M. Steeneken, “Assessment for automaticspeech recognition: II. NOISEX-92: a database and an experi-ment to study the effect of additive noise on speech recognitionsystems,” Speech Communication, vol. 12, no. 3, pp. 247–251,1993.

13Complexity

Hindawiwww.hindawi.com Volume 2018

MathematicsJournal of


Mathematical Problems in Engineering

Applied MathematicsJournal of


Probability and StatisticsHindawiwww.hindawi.com Volume 2018

Journal of


Mathematical PhysicsAdvances in

Complex AnalysisJournal of


OptimizationJournal of



Engineering Mathematics

International Journal of


Operations ResearchAdvances in

Journal of


Function SpacesAbstract and Applied AnalysisHindawiwww.hindawi.com Volume 2018

International Journal of Mathematics and Mathematical Sciences


Hindawi Publishing Corporation http://www.hindawi.com Volume 2013Hindawiwww.hindawi.com

The Scientific World Journal

Volume 2018

Hindawiwww.hindawi.com Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwww.hindawi.com Volume 2018

Hindawiwww.hindawi.com

Di�erential EquationsInternational Journal of

Volume 2018


Decision SciencesAdvances in


AnalysisInternational Journal of


Stochastic AnalysisInternational Journal of

Submit your manuscripts atwww.hindawi.com

https://www.hindawi.com/journals/jmath/

https://www.hindawi.com/journals/mpe/

https://www.hindawi.com/journals/jam/

https://www.hindawi.com/journals/jps/

https://www.hindawi.com/journals/amp/

https://www.hindawi.com/journals/jca/

https://www.hindawi.com/journals/jopti/

https://www.hindawi.com/journals/ijem/

https://www.hindawi.com/journals/aor/

https://www.hindawi.com/journals/jfs/

https://www.hindawi.com/journals/aaa/

https://www.hindawi.com/journals/ijmms/

https://www.hindawi.com/journals/tswj/

https://www.hindawi.com/journals/ana/

https://www.hindawi.com/journals/ddns/

https://www.hindawi.com/journals/ijde/

https://www.hindawi.com/journals/ads/

https://www.hindawi.com/journals/ijanal/

https://www.hindawi.com/journals/ijsa/

https://www.hindawi.com/

https://www.hindawi.com/

Speech Enhancement Control Design Algorithm for Dual ......algorithm is superior to other...

Documents

Transcript of Speech Enhancement Control Design Algorithm for Dual ......algorithm is superior to other...