On Bedrosian Condition in Application to Chirp · PDF fileON BEDROSIAN CONDITION IN...

5
ON BEDROSIAN CONDITION IN APPLICATION TO CHIRP SOUNDS E. HERMANOWICZ 1 and M. ROJEWSKI ) ) 2 Faculty of Electronics, Telecommunications and Informatics, Gdańsk University of Technology, 11/12 Narutowicza St., 80-952 Gdańsk, Poland ) 1 Multimedia Systems Department, e-mail [email protected] . Teleinformation Networks Department, e-mail ) 2 [email protected] ABSTRACT ) ( ω j e A DTFT spectrum for envelope ] [n a It is assumed that for a certain class of sounds there exists a relationship between their pitch and their complex dy- namic representation CDR (real-valued log-envelope and instantaneous angular frequency). The CDR processing appears a powerful means for pitch shifting of chirp-like sound signals fulfilling practically the assumption of the Bedrosian theorem. Scaling the CDR components can be used for entertainment as well as, e.g., in a generator of different melodies for calling signals, where the sound of each note in a melody is derived from a short recording of a voice of a chosen creature. For this aim in this paper a concept of a Bedrosian chirp is proposed whose local in- stantaneous amplitude and instantaneous complex phasor spectra are separated. A quantitative measure to estimate and predict the quality of sound processing using the CDR scaling is defined and its utility verified on the basis of ex- periments with real-life audio chirps. ] , [ n k A STFT amplitude spectrum for envelope ] [n a ) ( Arg principal argument D global measure of envelope and phasor STFT am- plitude spectra overlap F frequency in Hz T sampling period in seconds ) ( ω j e U DTFT spectrum for ] [n u ] [n γ complex phasor ] [n λ instantaneous level (log-envelope) ] [n ϕ instantaneous phase κ CDR rescaling coefficient F π ω 2 = normalised angular frequency ] [n ω instantaneous angular frequency ) ( ω j e Γ DTFT spectrum for phasor ] [n γ ] , [ n k Γ STFT amplitude phasor spectrum for ] [n γ Abbreviations 1. INTRODUCTION This paper is a continuation of our previously published pa- pers [1] and [2] devoted to sound processing using complex dynamic representation. It is based on the Bedrosian theorem formulated in 1963 [3] see also [4], [5], [6], [7]. We apply here the Bedrosian theorem to the processing of real-life, audible signals, rather than synthesised. The Bedrosian theo- rem is used in a way generalised for all instants of time of the signal under processing. The theorem is not treated binary (satisfied or not satisfied) but a quantitative measure of the degree of its fulfilment is proposed. It concerns the condition of non-overlapping spectra: the envelope spectrum and phasor spectrum, in the FM AM signal model. AM amplitude modulation FM AM representation of an arbitrarily modu- lated signal FM AM DDS direct digital synthesis DFT discrete Fourier transform DTFT discrete-time Fourier transform DSP digital signal processing CDR complex dynamic representation FIR finite impulse response FM frequency modulation HF Hilbert filter HS Hilbertian signal HT Hilbert transformer IAE instantaneous amplitude estimator 2. THE BEDROSIAN THEOREM IFE instantaneous frequency estimator PA phase accumulator The Bedrosian theorem states that the Hilbert transform of the product of two signals with non-overlapping spectra equals the product of the low-pass term by the Hilbert trans- form of the high-frequency term. In other words, only the high-frequency term is transformed. STFT short-time Fourier transformation Notation ] [n a instantaneous amplitude ] [n c real part of phasor ] [n γ Let , ] [n x K , 2 , 1 , 0 = n be a real-valued discrete-time sound signal having the form ] [k d local measure of envelope and phasor STFT ampli- tude spectra overlap ] [ cos ] [ ] [ n n a n x ϕ = (1) n number of a discrete-time sample ] [n x real-valued discrete-time signal ©2007 EURASIP 1221 15th European Signal Processing Conference (EUSIPCO 2007), Poznan, Poland, September 3-7, 2007, copyright by EURASIP

Transcript of On Bedrosian Condition in Application to Chirp · PDF fileON BEDROSIAN CONDITION IN...

ON BEDROSIAN CONDITION IN APPLICATION TO CHIRP SOUNDS

E. HERMANOWICZ1 and M. ROJEWSKI ) )2

Faculty of Electronics, Telecommunications and Informatics, Gdańsk University of Technology,

11/12 Narutowicza St., 80-952 Gdańsk, Poland

)1 Multimedia Systems Department, e-mail [email protected]. Teleinformation Networks Department, e-mail )2 [email protected]

ABSTRACT )( ωjeA DTFT spectrum for envelope ][naIt is assumed that for a certain class of sounds there exists a relationship between their pitch and their complex dy-namic representation – CDR (real-valued log-envelope and instantaneous angular frequency). The CDR processing appears a powerful means for pitch shifting of chirp-like sound signals fulfilling practically the assumption of the Bedrosian theorem. Scaling the CDR components can be used for entertainment as well as, e.g., in a generator of different melodies for calling signals, where the sound of each note in a melody is derived from a short recording of a voice of a chosen creature. For this aim in this paper a concept of a Bedrosian chirp is proposed whose local in-stantaneous amplitude and instantaneous complex phasor spectra are separated. A quantitative measure to estimate and predict the quality of sound processing using the CDR scaling is defined and its utility verified on the basis of ex-periments with real-life audio chirps.

],[ nkA STFT amplitude spectrum for envelope ][na)(Arg ⋅ principal argument

D global measure of envelope and phasor STFT am-plitude spectra overlap

F frequency in Hz T sampling period in seconds

)( ωjeU DTFT spectrum for ][nu][nγ complex phasor ][nλ instantaneous level (log-envelope) ][nϕ instantaneous phase

κ CDR rescaling coefficient Fπω 2= normalised angular frequency

][nω instantaneous angular frequency

)( ωjeΓ DTFT spectrum for phasor ][nγ ],[ nkΓ STFT amplitude phasor spectrum for ][nγ

Abbreviations 1. INTRODUCTION

This paper is a continuation of our previously published pa-pers [1] and [2] devoted to sound processing using complex dynamic representation. It is based on the Bedrosian theorem formulated in 1963 [3] see also [4], [5], [6], [7]. We apply here the Bedrosian theorem to the processing of real-life, audible signals, rather than synthesised. The Bedrosian theo-rem is used in a way generalised for all instants of time of the signal under processing. The theorem is not treated binary (satisfied or not satisfied) but a quantitative measure of the degree of its fulfilment is proposed. It concerns the condition of non-overlapping spectra: the envelope spectrum and phasor spectrum, in the FMAM ⋅ signal model.

AM amplitude modulation FMAM ⋅ representation of an arbitrarily modu-

lated signal FMAM ⋅

DDS direct digital synthesis DFT discrete Fourier transform DTFT discrete-time Fourier transform DSP digital signal processing CDR complex dynamic representation FIR finite impulse response FM frequency modulation HF Hilbert filter HS Hilbertian signal HT Hilbert transformer IAE instantaneous amplitude estimator

2. THE BEDROSIAN THEOREM IFE instantaneous frequency estimator PA phase accumulator

The Bedrosian theorem states that the Hilbert transform of the product of two signals with non-overlapping spectra equals the product of the low-pass term by the Hilbert trans-form of the high-frequency term. In other words, only the high-frequency term is transformed.

STFT short-time Fourier transformation

Notation ][na instantaneous amplitude ][nc real part of phasor ][nγ Let , ][nx K,2,1,0=n be a real-valued discrete-time

sound signal having the form ][kd local measure of envelope and phasor STFT ampli-tude spectra overlap ][cos][][ nnanx ϕ= (1) n number of a discrete-time sample

][nx real-valued discrete-time signal

©2007 EURASIP 1221

15th European Signal Processing Conference (EUSIPCO 2007), Poznan, Poland, September 3-7, 2007, copyright by EURASIP

where is the instantaneous amplitude (real-valued en-velope) and

][na][nϕ is the instantaneous phase of . This

form governs all discrete-time signals having an arbitrary modulation pattern. It is an approach typical of and adopted from telecommunications. The right-hand side of (1) is the so-called representation of the signal , where

is the AM factor and cos

][nx

][nxFM⋅AM][na ][nϕ is the FM factor. The complex signal

u (2) { } ( ][exp][][][][ njnanxjHnxn T ϕ=+=∆ )

}where and is the Hilbert transform of , we further call the Hilbertian signal (HS). Here the

ideal stands for the discrete Hilbert transformer (HT) which is normally defined using its frequency response as given by

, we further call the Hilbertian signal (HS). Here the ideal stands for the discrete Hilbert transformer (HT) which is normally defined using its frequency response as given by

12 −=∆j

THTH

{ ][nxHT

][nx nx

(3) (3)

<<−<<−

=∆πω

ωπω

0,0,

)(j

jeH j

T

<<−<<−

=∆πω

ωπω

0,0,

)(j

jeH j

T

where where FTFTπω 2= stands for the ‘digital’ (normalised) angu-lar frequency, F stands for the physical frequency in Hz, T is the uniform sampling period and is the imaginary unit. j

The instantaneous amplitude of is ][nu ][][ nun ∆=

{ }][arg nu∆=

a and

the instantaneous phase of u is . The instantaneous angular frequency of we define as the first backward derivative of

][n

][n

]n[ϕ][nu

ϕ in the following way

(4a)

>−−=

=∆0],1[][0],0[

][nnnn

nϕϕ

ϕω

Assuming that there exists a relationship between the pitch of sound signal and its complex angular frequency, we can transpose (detune) the sound by rescaling its complex dy-namic representation – CDR: real-valued envelope

(4b) ][ln][ nan∆=λ

and instantaneous angular frequency ][nω .

3. THE CDR PROCESSING

The CDR processing used in this paper and shown in Fig. 1 is based on the following mapping of the instantaneous level

][nλ and instantaneous angular frequency ][nω

][][,][][][],[ 0 nnnnnn κωωλκλλωλ κκ =+=⇒ (5)

Based on this relation we obtain a new CDR having the components: ][nκλ and ][nκω . This CDR modification, which constitutes the core-processing block, results in pitch shifting of a given sound signal with a pitch modifica-tion (scaling) factor

][nx0>κ . As a result of the CDR scaling

in accordance with (5) not only the pitch but also the instantaneous l ][nleve λ responsible for audibility of the sound is changed. In order to counteract the latter, the maximal value of the instantaneous level change has to be

sated for by adding to ][nκλ a correction term )(0 κλ , computed by using

) maxλ=∆

][nx

},ωλ

κλ)(0 κλ

}, κκ ω

[κωϕ

]n

)]n

][na=

{ }][Re nγ

)1(( κκλ − 0

where maxλ stands for the maximal level of the primary (input) signal.

][nx ][nλ ][nxκ][nκλ

][nω ][nκω(4b) (4a), (2),

mapper CDR

κ

(5) scaler CDR

DDS demapper CDR

Figure 1– Pitch-shifting of by κ via CDR processing.

Hence, in Fig. 1, firstly the input signal is filtered

by the complex Hilbert filter – HF. Next the filtered signal is mapped into its CDR {

][nx

. The CDR components are extracted using the IAE – instantaneous amplitude estima-tor, )ln( ⋅ and IFE – instantaneous frequency estimator blocks. Further on both CDR components are multiplied by the same coefficient κ having a positive value as above and the instantaneous level ][n undergoes the above-mentioned correction by . Finally, after this remap-ping, performed as shown in (5), the new CDR, {λ , is demapped into the target pitch-shifted sound signal re-cord { using the DDS – direct digital synthesis see, for example, [1], [11].

}κx

The main processing block of the DDS is PA – phase accumulator [1]. Driven by ]n the PA wraps the instan-taneous phase to the interval (principal phase wind).

n∀−∈ ),[] ππn[p

4. COMMENTS ON THE BEDROSIAN CONDITION

We can split the HS u into two factors: a real-valued

envelope and complex phasor . For scaling the CDR, thus pitch-shifting, the fulfilment of the condition that stems from the Bedrosian theorem is required that the spectra of real-valued envelope and complex phasor are non-overlapping (thus are separable). For such signals the following Bedrosian identity holds

[

][na ( [exp][ jn ϕγ ∆=

{ } { } ][][][][][ nnancHncnaH γ= (6a) TT

where

(6b) ( ][cos][ nnc ϕ==∆ )is real-valued. The above-mentioned identity refers to global spectra [5]. It means that the theorem concerns the whole signal because it says about the envelope and phasor spectra. This limits the range of possible applications of the CDR processing. However, for a much wider class of signals local, instantaneous spectra are separable. We call such signals the Bedrosian chirps (always locally Bedrosian). In this class the

compensated for by adding to ][nκλ a correction term

©2007 EURASIP 1222

15th European Signal Processing Conference (EUSIPCO 2007), Poznan, Poland, September 3-7, 2007, copyright by EURASIP

songs of many species of birds and voices of some species of mammals are involved.

∑∑

∑−

=

=

=∆

+

= 1

0

2*1

0

2

1

0

*

],[],[

],[],[][ K

k

K

k

K

k

nkΓnkA

nkΓnkAnd (10) In the angular frequency domain the product ][][ nna γ

on the right-hand side of (6a) corresponds to circular convo-lution (U (7) ][][)()() nnaeΓeAe jjj γωωω ⇔⊗= where d stands for the distance (metrics). The global measure

is defined here as given by of the spectra: and , of its factors: a and )( ωjeA )( ωjeΓ ][n][nγ , respectively. Because u is the HS, the spectrum

for

][n

00) =ω( jeU <<− ωπ . Thus in general the spectra

and resulting from de-convolution of

overlap. In other words these spectra at least par-tially cover up. It means that the envelope a and phasor

( jeA

( jeU

][n

)( ωjeΓ

][nγ may be non-orthogonal, i.e., they may be correlated – linearly dependent. The product of spectra formulated in the following way

∑∑∑∑

∑∑−

=

=

=

=

=

=∆

+

= 1

0

1

0

2*1

0

1

0

2

1

0

1

0

*

],[],[

],[],[

K

k

N

n

K

k

N

n

K

k

N

n

nkΓnkA

nkΓnkAD (11)

{ } { }][][)()( ** nnaeΓe jj −∗⇔ γωωA (8)

This formula constitutes our definition of the global degree of envelope and phasor amplitude spectra overlap. It is the ratio of mutual energy of each of factors under summation to the average of energies of each of factors. The smaller is D, the weaker is correlation between the envelope and phasor.

Having at disposal a sufficiently long signal record we can divide it into frames and examine by block-by-block technique, computing the D values for consecutive frames. Generally these values belong to the interval >∈< 1,0D . The values of D close to 0 inform about strong separability of envelope and phasor amplitude spectra overlap. Conse-quently one can deduce good quality of the transposed sound. Opposite, the values of D close to 1 enable to predict poor quality of the transposition result obtained by CDR scaling.

where the asterisk ∗ stands for linear convolution, confirms this thesis. The right-hand side of (8) represents the sequence of mutual correlation of the envelope and phasor. Separation of their spectra means that the product in (8) is of zero value. Thus the correlation between phasor and envelope does not exist.

5. THE BEDROSIAN CONDITION FOR CHIRPS

The Bedrosian condition for chirps [8], especially for audio chirps otherwise called the pitched sounds, can be formu-lated as follows. There exists such an angular frequency

),0( πω ∈c , for which

7. RESULTS OF EXPERIMENTS WITH EXEMPLARY AUDIO CHIRPS

(9a) ),(for 0)( ccjeA ωωωω −∉=

and (9b) ],[for 0)( cc

jeC ωωωω −∈=

We performed our experiments in the MATLAB environ-ment. Figs. 2 and 3 show the time-frequency performance of two different audio chirps using processing as described in Sect. 3. In Fig. 2 we present the spectrograms (from top to bottom): of a duck chirp, two spectrograms of its deconvolu-tion into AM and FM factors, the spectrogram as a result of resynthesis of the of original chirp based on the above-mentioned AM and FM factors and finally the spectrogram obtained by remapping the CDR after scaling by κ (for e.g. κ =10). The order of spectrograms in Fig. 3 repeats that from

but it was canary, which voiced the chirp analysed in Fig. 3 (see Fig. 5). For the latter we observe in respective spectrograms that the AM and FM factors for canary practi-cally do not overlap as needed. Comparing the results we see that these spectrograms differ in that for canary they have typical ridges (except that for the AM factor). Moreover, the canary chirp from Fig. 3 having the smaller D value (see Ta-ble 1) is better remapped after scaling, as expected, than the duck chirp from Fig. 2 for which the value of D is much big-ger. Fig. 4 presents the amplitude spectra of those chirps – the original and recovered from respective AM and FM fac-tors. Its role is to convince the reader about invertibility of CDR processing. In order to compute the local and global measures (10) and (11) multiplication of respective spectro-gram values was done. Some global results obtained for ex-emplary sound records – the chirps taken from available Internet bases are presented in Table 1 in %.

2 Fig.

where is the spectrum of (6b). This can be used for processing of sounds, which in practice fulfil the conditions (9), thus are prone to the CDR scaling aimed at pitch-shifting. This disposition means good quality of the results of pitch-shifting for, e.g., many species of palatable birds’ songs. Their instantaneous spectra (STFTs – short-time Fourier transforms and spectrograms [9]) for Bedrosian chirps show visible separability. The ridges in the spectro-grams are disjoined.

)( ωjeC ][nc

6. THE DEGREE OF ENVELOPE AND PHASOR SPECTRUM OVERLAP

Let and , be the absolute values of STFTs of the envelope and phasor

],[ nkA ],[ nkΓ 1,,1,0 −= Kk K

][na ][nγ , both with , k stands for the bin (spectrum sample) number, K is the number of bins and N is the num-ber of samples of the investigated record u . We define the local (instantaneous) degree of the envelope and phasor

1,,1 NK,0 −=n

][n

][n][na

γ amplitude spectra overlap in the following way

©2007 EURASIP 1223

15th European Signal Processing Conference (EUSIPCO 2007), Poznan, Poland, September 3-7, 2007, copyright by EURASIP

Figure 2– Spectrograms of original duck.wav chirp, its AM and FM factors, and spectrograms of the chirp recovered from its AM

and FM factors and remapped back after scaling by 0>κ .

Figure 3– Spectrograms of original canary.wav chirp, its AM and FM factors, and spectrograms of the chirp recovered from its AM and FM factors and remapped back after scaling by 0>κ .

©2007 EURASIP 1224

15th European Signal Processing Conference (EUSIPCO 2007), Poznan, Poland, September 3-7, 2007, copyright by EURASIP

Figure 4– Spectra of original and resynthesised duck.wav and canary.wav chirps, respectively, recovered from their AM and

FM factors.

Figure 5– Analysed chirps: duck.wav and canary.wav.

REFERENCES Table 1. D – the global degree of envelope and phasor

amplitude spectra overlap No of file Name of file D in % Hanning

window length

STFT length

1 Canary.wav 0.97 32 256 2 Nightingale.wav 2.90 32 256 3 Cookoo.wav 6.78 32 256 4 Barn_owl.wav 25.43 32 256 5 Duck.wav 28.24 64 512 6 Thank_you.wav 28.88 128 512 7 Sorry.wav 24.96 128 512

[1] E. Hermanowicz and M. Rojewski, “Sound processing using complex dynamic representation”, in Proc. of 2005 European Signal Processing Conference EUSIPCO 2005, Antalya, Turkey, September 5-8, 2005. [2] E. Hermanowicz and M. Rojewski: “Pitch shifter based on complex dynamic representation and direct digital synthe-sis”, Bulletin of the Polish Academy of Sciences, Technical Sciences, vol. 54, No. 4, 2006. [3] E. Bedrosian, “A product theorem for Hilbert transforms”, Proc. of the IEEE, vol. 51, pp. 868-869, May 1963. [4] S.L. Hahn, Hilbert Transforms in Signal Processing, Artech House, 1996.

[5] D. Vakman, Signals, Oscillations and Waves. A Modern Approach, Artech House, 1998.

8. CONCLUSIONS [6] A.H. Nuttall, “On the quadrature approximation to the Hilbert transform of modulated signals” and E. Bedrosian reply, Proc. of the IEEE, vol. 54, pp. 1458-1459, October 1966.

In this paper we assumed that for a certain class of sounds there exists a relationship between their pitch and their complex dynamic representation – CDR (real-valued log-envelope and instantaneous angular frequency). The CDR processing appeared a powerful means for pitch shifting, especially for chirp-like sound signals fulfilling practically the assumption of the Bedrosian theorem. For this aim a concept of a Bedrosian chirp was proposed whose local instantaneous amplitude and instantaneous complex phasor spectra are separated. A quantitative measure to estimate and predict the quality of sound processing using the CDR scaling was defined and its utility verified on the basis of experiments with real-life audio chirps. Scaling the CDR components can be used for entertainment as well as, e.g., in a generator of different melodies for calling signals, where the sound of each note in a melody is derived from a short recording of a voice of a chosen creature.

[7] Y. Xu and D. Yan, “The Bedrosian identity for the Hilbert transform of product function”, Proceedings of American Mathematical Society, vol. 134, No 9, pp. 2719-2728, September 2006. [8] P. Flandrin, ““Chirps” everywhere”, CNRS – Ecole Normale Superiere de Lyon. http://perso.ens-lyon.fr/patrick.flandrin/Marseille02.pdf [9] S.K. Mitra, Digital Signal Processing. A Computer-Based Approach, McGrawHill, International Edition 2001, Chapter 11. [10] T.F. Quatieri, Discrete-Time Speech Signal Processing. Principles and Practice. Prentice Hall PTR, 2002. [11] V. F. Kroupa (ed.), Direct Digital Frequency Synthesiz-ers, IEEE Press, New York, 1999.

©2007 EURASIP 1225

15th European Signal Processing Conference (EUSIPCO 2007), Poznan, Poland, September 3-7, 2007, copyright by EURASIP