Learning From DataLecture 19
A Peek At Unsupervised Learning
k-Means ClusteringProbability Density Estimation
Gaussian Mixture Models
M. Magdon-IsmailCSCI 4100/6100
recap: Radial Basis Functions
Nonparametric RBF Parametric k-RBF-Network
g(x) =N∑
n=1
(αn(x)∑N
m=1 αm(x)
)· yn
αn(x) = φ(||x−xn ||
r
)(bump on x)
r = 0.05
No Training
h(x) = w0 +
k∑
j=1
wj · φ(||x− µj ||
r
)
= wtΦ(x)
(bump on µj)
linear model given µj
choose µj as centers of k-clusters of data
x
y
x
y
k = 4, r = 1k
k = 10, regularized
c© AML Creator: Malik Magdon-Ismail Unsupervised Learning: 2 /23 Unsupervised learning −→
Unsupervised Learning
• Preprocessor to organize the data for supervised learning:Organize data for faster nearest neighbor search
Determine centers for RBF bumps.
• Important to be able to organize the data to identify patterns.Learn the patterns in data, e.g. the patterns in a language before getting into a supervised setting.
amazon.com organizes books into categories
c© AML Creator: Malik Magdon-Ismail Unsupervised Learning: 3 /23 Clustering digits −→
Clustering Digits
21-NN rule, 10 Classes 10 Clustering of Data
0
1
2
3
4
67
89
Average Intensity
Sym
metry
c© AML Creator: Malik Magdon-Ismail Unsupervised Learning: 4 /23 Clustering −→
Clustering
A cluster is a collection of points S
A k-clustering is a partition of the data into k clusters S1, . . . , Sk.
∪kj=1Sj = D
Si ∩ Sj = ∅ for i 6= j
Each cluster has a center µj
c© AML Creator: Malik Magdon-Ismail Unsupervised Learning: 5 /23 k-means error −→
How good is a clustering?
Points in a cluster should be similar (close to each other, and the center)
Error in cluster j:
Ej =∑
xn∈Sj
||xn − µj ||2.
k-Means Clustering Error:
Ein(S1, . . . , Sk;µ1, . . . ,µk) =k∑
j=1
Ej
=N∑
n=1
||xn − µ(xn) ||2
µ(xn) is the center of the cluster to which xn belongs.
c© AML Creator: Malik Magdon-Ismail Unsupervised Learning: 6 /23 −→
k-Means Clustering
You get to pick S1, . . . , Sk and µ1, . . . ,µk to minimize Ein(S1, . . . , Sk;µ1, . . . ,µk)
If centers µj are known, picking the sets is easy:Add to Sj all points closest to µj
If the clusters Sj are known, picking the centers is easy:Center µj is the centroid of cluster Sj
µj =1
|Sj|∑
xn∈Sjxn
c© AML Creator: Malik Magdon-Ismail Unsupervised Learning: 7 /23 Lloyd’s algorithm −→
Lloyd’s Algorithm for k-Means Clustering
Ein(S1, . . . , Sk;µ1, . . . ,µk) =N∑
n=1
||xn − µ(xn) ||2
1: Initialize Pick well separated centers µj.
2: Update Sj to be all points closest µj.
Sj ← {xn : ||xn − µj || ≤ ||xn − µℓ || for ℓ = 1, . . . , k}.
3: Update µj to the centroid of Sj.
µj ←1
|Sj|∑
xn∈Sjxn
4: Repeat steps 2 and 3 until Ein stops decreasing.
c© AML Creator: Malik Magdon-Ismail Unsupervised Learning: 8 /23 Update clusters −→
Lloyd’s Algorithm for k-Means Clustering
Ein(S1, . . . , Sk;µ1, . . . ,µk) =N∑
n=1
||xn − µ(xn) ||2
1: Initialize Pick well separated centers µj.
2: Update Sj to be all points closest µj.
Sj ← {xn : ||xn − µj || ≤ ||xn − µℓ || for ℓ = 1, . . . , k}.
3: Update µj to the centroid of Sj.
µj ←1
|Sj|∑
xn∈Sjxn
4: Repeat steps 2 and 3 until Ein stops decreasing.
c© AML Creator: Malik Magdon-Ismail Unsupervised Learning: 9 /23 Update centers −→
Lloyd’s Algorithm for k-Means Clustering
Ein(S1, . . . , Sk;µ1, . . . ,µk) =N∑
n=1
||xn − µ(xn) ||2
1: Initialize Pick well separated centers µj.
2: Update Sj to be all points closest µj.
Sj ← {xn : ||xn − µj || ≤ ||xn − µℓ || for ℓ = 1, . . . , k}.
3: Update µj to the centroid of Sj.
µj ←1
|Sj|∑
xn∈Sjxn
4: Repeat steps 2 and 3 until Ein stops decreasing.
c© AML Creator: Malik Magdon-Ismail Unsupervised Learning: 10 /23 Application to RBF-Network −→
Application to k-RBF-Network
10-center RBF-network 300-center RBF-network
Choosing k - knowledge of problem (10 digits) or CV.
c© AML Creator: Malik Magdon-Ismail Unsupervised Learning: 11 /23 Probability density estimation −→
Probability Density Estimation
P (x)
P (x) measures how likely it is to generate inputs similar to x.
Estimating P (x) results in a ‘softer/finer’ representation than clustering
Clusters are regions of high probability.
c© AML Creator: Malik Magdon-Ismail Unsupervised Learning: 12 /23 Parzen windows −→
Parzen Windows – RBF density estimation
Basic idea: put a bump of ‘size’ (volume) 1N on each data point.
x
P(x)
P̂ (x) =1
Nrd
N∑
i=1
φ
(||x− xi ||r
)
φ(z) = 1(2π)d/2
e−12z
2
c© AML Creator: Malik Magdon-Ismail Unsupervised Learning: 13 /23 Digits data −→
Digits Data
RBF Density Estimate Density Contours
xy x
y
c© AML Creator: Malik Magdon-Ismail Unsupervised Learning: 14 /23 GMM −→
The Gaussian Mixture Model (GMM)
Instead of N bumps −→ k ≪ N bumps. (Similar to nonparametric RBF −→ parametric k-RBF-network)
Instead of uniform spherical bumps −→ each bump has its own shape.
Bump centers:µ1, . . . ,µk
Bump shapes:Σ1, . . . ,Σk
Gaussian formula for the bump:
N (x;µj,Σj) =1
(2π)d/2|Σj|1/2e−
12(x− µj)
tΣj−1(x− µj).
c© AML Creator: Malik Magdon-Ismail Unsupervised Learning: 15 /23 GMM formula −→
GMM Density Estimate
N (x;µj,Σj) =1
(2π)d/2|Σj|1/2e−
12(x− µj)
tΣj−1(x− µj).
P̂ (x) =k∑
j=1
wj N (x;µj,Σj)
(Sum of k weighted bumps).
wj > 0,k∑
j=1
wj = 1
You get to pick {wj,µj,Σj}j=1,...,k
c© AML Creator: Malik Magdon-Ismail Unsupervised Learning: 16 /23 Maximum likelihood −→
Maximize Likelihood Estimation
Pick {wj,µj,Σj}j=1,...,k to best explain the data.
Maximize the likelihood of the data given {wj,µj,Σj}j=1,...,k
(We saw this when we derived the cross entropy error for logistic regression)
c© AML Creator: Malik Magdon-Ismail Unsupervised Learning: 17 /23 E-M algorithm −→
Expectation-Maximization: The E-M Algorithm
A simple algorithm to get to the local minimum of the likelihood.
Partition variables into two sets. Given one-set, you can estimate the other
‘Bootstrap’ your way to a decent solution.
Lloyd’s algorithm for k-means is an example for ‘hard clustering’
c© AML Creator: Malik Magdon-Ismail Unsupervised Learning: 18 /23 γnj −→
Bump Memberships
Fraction of xn belonging to bump j (a ‘hidden variable’)
γnj
Nj =N∑
n=1
γnj (‘number’ of points in bump j)
wj =Nj
N(probability bump j)
µj =1
Nj
N∑
n=1
γnjxn (centroid of bump j)
Σj =1
Nj
N∑
n=1
γnjxnxtn − µjµ
tj (covariance matrix of bump j)
c© AML Creator: Malik Magdon-Ismail Unsupervised Learning: 19 /23 Parameters given γnj −→
Bump Memberships
Fraction of xn belonging to bump j (a ‘hidden variable’)
γnj
Nj =N∑
n=1
γnj (‘number’ of points in bump j)
wj =Nj
N(probability bump j)
µj =1
Nj
N∑
n=1
γnjxn (centroid of bump j)
Σj =1
Nj
N∑
n=1
γnjxnxtn − µjµ
tj (covariance matrix of bump j)
c© AML Creator: Malik Magdon-Ismail Unsupervised Learning: 20 /23 Restimating γnj −→
Re-Estimating Bump Memberships
γnj =wj N (xn;µj,Σj)∑kℓ=1wℓ N (xn;µℓ,Σℓ)
γnj is the probability that xn came from bump j
probability of bump j: wj
probability density for xn given bump j: N (xn;µj,Σj)
c© AML Creator: Malik Magdon-Ismail Unsupervised Learning: 21 /23 E-M Algorithm −→
E-M Algorithm
E-M Algorithm for GMMs:
1: Start with estimates for the bump membership γnj.
2: Estimate wj,µj,Σj given the bump memberships.
3: Update the bump memberships given wj,µj,Σj;
4: Iterate to step 2 until convergence.
c© AML Creator: Malik Magdon-Ismail Unsupervised Learning: 22 /23 GMM on digits −→
GMM on Digits Data
10-center GMM Density Contours
xy x
y
c© AML Creator: Malik Magdon-Ismail Unsupervised Learning: 23 /23
Top Related