Post on 18-Dec-2021
Or
1
Everything you wanted to know about
Glottal Waveforms and Laryngeal Excitation
Cycles but were afraid to ask
Converting Frequency Domains
β’ A common question: A particular digital frequency corresponds
to what analog frequency?
β’ Option 1: Work through the equations relating the various
transforms to find your answer
β’ FT: π π = βββ
π₯ π‘ πβπ2πππ‘ ππ‘
β’ DTFT: π π = Οπ=βββ π₯ π πβπππ
β’ DFT: π π = Οπ=0πβ1 π₯ π πβπ2πππ/π
β’ Replacing x(t) with samples at x[nT]
β’ Οπ=βββ π₯(ππ)πβπ2ππππ
β’ So π = 2πππ
β’ DFT/DTFT easily related, and then also DFT/FT
β’ π = 2ππ/π
β’ π = π/ππ
2
Converting Frequency Domains
β’ Option 2: Recall that these are all linear relationships, and use
ratios based on maximum frequency
β’ βMax frequencyβ of analog signal is 1/π β πΉπ
β’ DTFT, DFT have periods of 2π,π respectively
β’ Yields the relationships π
π=
π
2π=
π
πΉπ
β’ Allows to fluidly convert in either direction
β’ Always βgets the units rightβ (Hz, rads, normalized freq., etc.)
3
Always think: Ratios!
Composite Algorithms
β’ Algorithms composed of one or more steps/pieces
β’ Might involve decision points (βcontrol flowβ) or combinations
from different paths
β’ Allows for sophisticated algorithms but increases system
complexity substantially
β’ Flow graph helps manage system and convey operation to
others
4
Step A
Step B
Step C Step D
Step E
Implementing Composite Algorithms
β’ Always implement components as separate functions/classes
β’ Sometimes these will be in different libraries anyway
β’ Separation of concerns allows for better focus to the code,
generally resulting in better coding style and code that is much
easier to debug
β’ Creates code segments that can be portable among projects
and not tied to a specific implementation
5
A Note on Testing
β’ Good practice to validate components of your code first
β’ Writing each step as separate functions facilitates for good unit
testing
β’ Increases probability of success when putting everything together
β’ Types of testing
β’ Unit
β’ Integration
β’ System/End-to-End
β’ These testing categories get cover progressively larger aspects
of the system, but at higher cost
6
Unit Testing is your friend!
A Note on Testing
β’ How do you know if something is working?
β’ Test signals!
β’ What test signals do you use?
β’ They might be provided for you
β’ You can make them up
β’ You can use a repository of test data
β’ You can acquire them
β’ Classes of test data
β’ βEasyβ vs. βHardβ
β’ βNoisyβ vs. βCleanβ
β’ Different noise characteristics
β’ Corner cases
7
A Note on Testing
β’ A key is to have βhigh qualityβ data to establish βground truthβ
β’ Nothing is worse than getting misled about algorithm performance
due to incorrect data
β’ Allows for quantitative measures of performance if sufficiently large
set available
β’ Simple and clean data can facilitate establishing basic algorithm
functionality
β’ Always a good idea to at least try with a non-ideal test signal as
a sanity check
β’ Some algorithms can catastrophically break down when certain
assumptions are not met or noise is introduced!
8
Test Database / Benchmarks
β’ Huge value!
β’ Provides a suite of test cases
β’ Ground truth for evaluation of algorithm efficacy
β’ Common set to allows for apples-to-apples comparison among
algorithms
β’ βReal worldβ conditions
β’ Facilitates automatic tuning and evaluation of algorithms
β’ Prevalent in many areas of engineering
9
Pitch Detection
β’ This lab is a pitch detection implementation
β’ What exactly is pitch? Not actually strictly defined.
β’ Period of signal in time domain
β’ Fundamental frequency in spectral domain (caution!)
β’ Perceived frequency of signal by human ear
β’ Anatomically as distance between pulses/cycles of larynx
10
T
Basic Pitch Detection Algorithm
β’ Operate on a single frame of audio input
β’ Initial decision: voiced or unvoiced?
β’ If voiced, perform pitch analysis and report pitch frequency
11
Voiced
Decision
Pitch
AnalysisF0
βNo Signalβ
Y
N
Frame
Challenges of Pitch Detection
β’ Frame size selection
β’ Likely not an integer multiple of period length
β’ Probably need at least one period, maybe more
β’ Larger frame sizes = more latency
β’ Pitch values may not be integral values of sampling rate /
frequency bins
β’ May distort analysis sufficiently to pick improper bin
β’ Speech vs. Music
β’ Non-ideal environmental situation
β’ Multiple concurrent signals
β’ Noise
β’ Bad microphone quality
12
Categorization of Frames
β’ Three coarse categorization of frames
β’ Voiced sounds β generated by vibration of the vocal cords
β’ Unvoiced sounds β speech generated without vibration of the
vocal cords
β’ Silence / noise β no active speech
β’ We will only be concerned with classifying and measuring pitch
for Voiced frames
13
Voiced Decision
β’ Voiced signals tend to be louder and more sustained
β’ Unvoiced signals have more abrupt characteristics
β’ Over the span of a frame, we would therefore expect voiced
signals to have a higher energy than unvoiced/silent frames
β’ Calculate the energy of the frame, compare to a threshold for
voiced decision
14
Οπ₯ π 2Y
N
Frame > Es Voiced
Not Voiced
Pitch Calculation - Autocorrelation
β’ π π₯π₯ π =Οπ=0πβ1 π₯ π π₯[πβπ]
Οπ=0πβ1 π₯ π 2
β’ Provides an estimate of how self-similar a signal is given a
particular delay (lag) π
β’ When offset corresponds to period of signal, values coherently
combine to yield a large correlation value
15
Challenges of Harmonics
β’ Pitch estimation searches for a peak value in the autocorrelation
function
β’ Two main challenges
β’ Lag of 0 will generally have the largest overall value, but is certainly
not the value we are looking for
β’ Multiples of the pitch (higher order harmonics) will also yield local
peaks but must be rejected
β’ Solving these challenges is part of the Lab design objectives
16
Basic Pitch Detection Algorithm
17
Voiced
Decision
Autocorrelation
Frame
Peak Detection
Estimated
Pitch
Not
Voiced
Algorithmic Complexity
β’ βBig Oβ notation
β’ π π , π π2 , π logπ , π 1 , π(2π)
β’ Key property of complexity analysis is that the higher order term
dominates
β’ π π + π2 = π(π2)
β’ Always a good idea to know the complexity of the algorithms you
are using
β’ Know what you are getting yourself into!
β’ For a complicated composite algorithm, each step might have
different complexity
β’ Knowing complexity tells you where the dominant computation step
β’ Optimize there!
18
Efficient Cross-Correlation
β’ π π₯π₯ π = π₯ π β π₯ π is an π π2 operation
β’ What is complexity of energy calculation step?
β’ What is complexity of peak detection step?
β’ Calculate correlation using π π₯π₯ π = πΉβ1{πΉ π₯ πΉβ π₯ }
β’ πΉβ π₯ denotes complex conjugate
β’ Replace with an π πππππ algorithm
β’ What is the dominant component of our system now?
β’ Can actually perform joint voiced/unvoiced and pitch analysis
with just π π₯π₯
19
Postprocessing
β’ Improve overall pitch values by post-processing of the estimates
β’ Smoothing of pitch values
β’ Linear
β’ Non-linear (e.g. median)
β’ Can also apply as a pre-processing if we wish (with caution!)
β’ Consider evolution of pitch value over successive frames
β’ Attribute cost associated with changes in pitch and transition voiced
to/from unvoiced
β’ Can work in concert with multiple pitch estimates to pick βthe bestβ
20
Algorithm Selection
β’ Dozens of pitch detection algorithms out there
β’ How do you pick one? (Or decide to develop your own?)
β’ What is your problem domain?
β’ Are the algorithms targeted at this domain?
β’ Is it possible to quantify performance and compare?
β’ Training/test data
β’ Different aspects to compare: measurement accuracy,
computational speed, output latency
β’ What are your requirements?
β’ What is most important? What compromises can be made?
β’ Can an algorithm be modified to meet those requirements?
β’ The more specific you are, the more discriminating you can be
regarding algorithm selection
22
YIN Algorithm
β’ Distance Metric
β’ π π = Ο π₯ π β π₯ π + π 2
β’ Cumulative normalized distance
β’ πβ² π = π(π)/[1/π Οπ=1π π(π)]
β’ Absolute threshold
β’ πππππππ πβ² π πβ² π < πΈ}
β’ Parabolic Interpolation
β’ Expand πβ² π analytically and
find minimum
β’ Best Local Estimate
β’ Examine small frame location
offsets for better πβ² π
23
Algorithm Gross
Error %
Baseline Autocorrelation 10.0
Distance Metric 1.95
+ Cumulative Norm. 1.69
+ Absolute Thresh 0.78
+ Parabolic Interpolation 0.77
+ Best Local Estimate 0.50
βAd-hocβ Algorithm Development
β’ Identify weak points / scenarios in the algorithm
β’ Large set of training data is helpful
β’ Augment the algorithm
β’ Change existing parameters / steps
β’ Add extra steps
β’ Validate the changes
β’ Did performance improve in the desired areas?
β’ Did good performance persist in other areas (no regressions)?
β’ Emergent complex system
β’ Potentially very powerful
β’ Potentially very hard to analyze
β’ Potentially very hard to maintain
24
Assigned Lab
β’ Lead-in to the Final Project
β’ Forming up groups for both Assigned Lab and Final Project
β’ Explore a DSP algorithm from the literature
β’ Implementation in Python for this stage, NOT on tablet yet
β’ Proposal for Assigned Lab
β’ Overview of proposed algorithm, cite source(s)
β’ Plan for testing and validation
β’ Rough idea(s) for Final Project application
β’ Due March 1
β’ Assigned Lab Report upon completion, demo incorporated into
Final Proposal design review
25