3_statII.pdf · Distribution Functions X is continuous if there exists a function f such that f(x) ≥ 0 for all x, R∞

Transcript

Today

1. Probability

2. Random variables

CS 8803-MDM Lecture 2 – p. 3/20

Probability

We won’t spend that much time on it, but we need to starthere.

CS 8803-MDM Lecture 2 – p. 4/20

Page 3: 1. Probability 2. Random variablesjhuan/EECS730_F12/slides/3_statII.pdf · Distribution Functions X is continuous if there exists a function f such that f(x) ≥ 0 for all x, R∞

Samples Spaces and Events

If we toss a coin twice then the sample space, or set of allpossible outcomes or realizations ω, isΩ = HH,HT, TH, TT.

An event is a subset of this set; for example the event thatthe first toss is heads is A = HH,HT.

CS 8803-MDM Lecture 2 – p. 5/20

Probability

We’ll assign a real number P(A) to each event A, called theprobability of A. To qualify as a probability, P must satisfythree axioms:

1. P(A) ≥ 0 for every A

2. P(Ω) = 1

3. If A1, A2, . . . are disjoint then

(∞⋃

i=1

)

∞∑

i=1

P(Ai). (1)

Note that frequentists and Bayesians agree on these.

CS 8803-MDM Lecture 2 – p. 6/20

Page 5: 1. Probability 2. Random variablesjhuan/EECS730_F12/slides/3_statII.pdf · Distribution Functions X is continuous if there exists a function f such that f(x) ≥ 0 for all x, R∞

Random VariablesThis is where we start talking about data.

CS 8803-MDM Lecture 2 – p. 7/20

Random Variables

A random variable is a mapping, or function

X : Ω → R (2)

that assigns a real number X(ω) to each outcome ω.

For example, if Ω =(x, y) : x2 + y2 ≤ 1

and our outcomes

are samples (x, y) from the unit disk, then these are somerandom variables: X(ω) = x, Y (ω) = y, Z(ω) = x + y.

CS 8803-MDM Lecture 2 – p. 8/20

Distribution Functions

Suppose X is a random variable, x a specific value of it(data).

Cumulative distribution function (CDF): the functionF : R → [0, 1] (sometimes FX) defined by F (x) = P(X ≤ x).

X is discrete if it takes countably many values x1, x2, . . ..

Probability (mass) function for X: f(x) = P(X = x).

CS 8803-MDM Lecture 2 – p. 9/20

Distribution Functions

X is continuous if there exists a function f such thatf(x) ≥ 0 for all x,

∫∞

−∞f(x)dx = 1 and for every a ≤ b,

P(a < X < b) =

∫ b

f(x)dx. (3)

f is the probability density function (PDF).

We have that F (x) =∫ x

−∞f(t)dt and f(x) = F ′(x) wherever

F is differentiable.

CS 8803-MDM Lecture 2 – p. 10/20

Page 9: 1. Probability 2. Random variablesjhuan/EECS730_F12/slides/3_statII.pdf · Distribution Functions X is continuous if there exists a function f such that f(x) ≥ 0 for all x, R∞

Discrete Distributions

Some examples of discrete distributions:

X is the outcome of a coin flip. P(X = 1) = p andP(X = 0) = 1 − p for some p ∈ [0, 1]. We sayX ∼ Bernoulli(p). f(x) = px(1 − p)1−x for x ∈ 0, 1.

Binomial: the distribution of the number of outcomes (ofsay, heads) of a coin flip.

CS 8803-MDM Lecture 2 – p. 11/20

Page 10: 1. Probability 2. Random variablesjhuan/EECS730_F12/slides/3_statII.pdf · Distribution Functions X is continuous if there exists a function f such that f(x) ≥ 0 for all x, R∞

Continuous Distributions

Some examples of continuous distributions:

Uniform: X ∼ Uniform(a, b) if f(x) = 1/(b − a) for x ∈ [a, b], 0otherwise.

Gaussian: X ∼ N(µ, σ2) if f(x) = 1σ√

2πexp

2σ2 (x − µ)2

for

µ ∈ R, σ > 0. We call its PDF φ(x) and its CDF Φ(x).

CS 8803-MDM Lecture 2 – p. 12/20

Page 11: 1. Probability 2. Random variablesjhuan/EECS730_F12/slides/3_statII.pdf · Distribution Functions X is continuous if there exists a function f such that f(x) ≥ 0 for all x, R∞

Multivariate Distributions

Can define a distribution over a vector of random variables.We say this is a multivariate distribution.

Our dataset generally consists of samples from amultivariate distribution. Each of the columns is a randomvariable. We can also consider the whole vector of randomvariables as a random variable.

CS 8803-MDM Lecture 2 – p. 13/20

Page 12: 1. Probability 2. Random variablesjhuan/EECS730_F12/slides/3_statII.pdf · Distribution Functions X is continuous if there exists a function f such that f(x) ≥ 0 for all x, R∞

Expectation

The expected value, or mean, or first moment of X is

E(X) = EX = µ =

∫xf(x)dx. (10)

Note that in the discrete case this means∑

x xf(x).

(∑

aiXi

)=∑

aiE(Xi) (11)

for constants a1, a2, . . . , an. If the Xi are independent,

(∏

aiXi

)=∏

aiE(Xi). (12)

CS 8803-MDM Lecture 2 – p. 18/20

Page 13: 1. Probability 2. Random variablesjhuan/EECS730_F12/slides/3_statII.pdf · Distribution Functions X is continuous if there exists a function f such that f(x) ≥ 0 for all x, R∞

Variance

The kth moment of X is defined to be E(Xk) assuming thatE(Xk) < ∞.

If X has mean µ, the variance of X is

σ2 = V(X) = VX = E(X − µ)2 =

∫(x − µ)2f(x) (13)

and σ = sd(X) =√

V(X).

CS 8803-MDM Lecture 2 – p. 19/20

Page 14: 1. Probability 2. Random variablesjhuan/EECS730_F12/slides/3_statII.pdf · Distribution Functions X is continuous if there exists a function f such that f(x) ≥ 0 for all x, R∞

Sample Statistics

If X1, . . . , Xn are random variables then the sample mean is

X =1

N∑

i=1

Xi (14)

and the sample variance is

S2 =1

N − 1

N∑

i=1

(Xi − X)2. (15)

If X1, . . . , Xn are IID, then

E(X) = E(Xi) = µ, V(X) = σ2/N, E(S2) = σ2. (16)

CS 8803-MDM Lecture 2 – p. 20/20

Page 15: 1. Probability 2. Random variablesjhuan/EECS730_F12/slides/3_statII.pdf · Distribution Functions X is continuous if there exists a function f such that f(x) ≥ 0 for all x, R∞

Today

1. Some inequalities

2. Asymptotic theory

3. Point estimation

CS 8803-MDM Lecture 3 – p. 2/32

Page 16: 1. Probability 2. Random variablesjhuan/EECS730_F12/slides/3_statII.pdf · Distribution Functions X is continuous if there exists a function f such that f(x) ≥ 0 for all x, R∞

Some inequalities

A few very useful inequalities that will come up in manycontexts – in particular, they lie at the heart of learningtheory.

CS 8803-MDM Lecture 3 – p. 3/32

Page 17: 1. Probability 2. Random variablesjhuan/EECS730_F12/slides/3_statII.pdf · Distribution Functions X is continuous if there exists a function f such that f(x) ≥ 0 for all x, R∞

Standard Normal Distribution

We say that a random variable has a standard Normaldistribution if µ = 0 and σ = 1, and we denote it by Z.

If X ∼ N(µ, σ2) then Z = (X − µ)/σ ∼ N(0, 1).

If Z ∼ N(0, 1) then X = µ + σZ ∼ N(µ, σ2).

CS 8803-MDM Lecture 3 – p. 4/32

Page 18: 1. Probability 2. Random variablesjhuan/EECS730_F12/slides/3_statII.pdf · Distribution Functions X is continuous if there exists a function f such that f(x) ≥ 0 for all x, R∞

Markov’s Inequality

Theorem (Markov’s inequality): Suppose X is anon-negative random variable and E(X) exists. Then forany t > 0,

P(X > t) ≤ E(X)

t. (1)

CS 8803-MDM Lecture 3 – p. 5/32

Page 19: 1. Probability 2. Random variablesjhuan/EECS730_F12/slides/3_statII.pdf · Distribution Functions X is continuous if there exists a function f such that f(x) ≥ 0 for all x, R∞

Markov’s Inequality: Proof

Since X > 0,

E(X) =

∫∞

0xf(x)dx (2)

∫ t

0xf(x)dx +

∫∞

xf(x)dx (3)

≥∫

∞

xf(x)dx (4)

≥ t

∫∞

f(x)dx (5)

= tP(X > t). (6)

CS 8803-MDM Lecture 3 – p. 6/32

Page 20: 1. Probability 2. Random variablesjhuan/EECS730_F12/slides/3_statII.pdf · Distribution Functions X is continuous if there exists a function f such that f(x) ≥ 0 for all x, R∞

Chebyshev’s Inequality

Theorem (Chebyshev’s inequality): If µ = E(X) andσ2 = V(X), then

P(|X − µ| ≥ t) ≤ σ2

t2(7)

and

(∣∣∣∣X − µ

∣∣∣∣ ≥ u

)≤ 1

u2(8)

(or P(|Z| ≥ u) ≤ 1u2 if Z = (X − µ)/σ).

For example, P(|Z| > 2) ≤ 1/4 and P(|Z| > 3) ≤ 1/9.

CS 8803-MDM Lecture 3 – p. 7/32

Page 21: 1. Probability 2. Random variablesjhuan/EECS730_F12/slides/3_statII.pdf · Distribution Functions X is continuous if there exists a function f such that f(x) ≥ 0 for all x, R∞

Chebyshev’s Inequality: Proof

Using Markov’s inequality,

P(|X − µ| ≥ t) = P(|X − µ|2 ≥ t2) (9)

≤ E(X − µ)2

t2(10)

=σ2

t2. (11)

The second part follows by setting t = uσ.

CS 8803-MDM Lecture 3 – p. 8/32

Page 22: 1. Probability 2. Random variablesjhuan/EECS730_F12/slides/3_statII.pdf · Distribution Functions X is continuous if there exists a function f such that f(x) ≥ 0 for all x, R∞

Chebyshev’s Inequality: Example

Suppose we test a classifier on a set of N new examples.Let Xi = 1 if the prediction is wrong and Xi = 0 if it is right;then XN = 1

∑Ni=1 Xi is the observed error rate. Each Xi

may be regarded as a Bernoulli with unknown mean p; wewould like to estimate this.

How likely is XN to not be within ǫ of p?

CS 8803-MDM Lecture 3 – p. 9/32

Page 23: 1. Probability 2. Random variablesjhuan/EECS730_F12/slides/3_statII.pdf · Distribution Functions X is continuous if there exists a function f such that f(x) ≥ 0 for all x, R∞

Chebyshev’s Inequality: Example

We have that V(XN ) = V(X)/N = p(1 − p)/N and

P(|XN − p| > ǫ) ≤ V(XN )

ǫ2(12)

=p(1 − p)

Nǫ2(13)

≤ 1

4Nǫ2(14)

since p(1 − p) ≤ 1/4 for all p.

For ǫ = .2 and N = 100 the bound is .0625.

CS 8803-MDM Lecture 3 – p. 10/32

Page 24: 1. Probability 2. Random variablesjhuan/EECS730_F12/slides/3_statII.pdf · Distribution Functions X is continuous if there exists a function f such that f(x) ≥ 0 for all x, R∞

Asymptotic theory

What happens as you get more data.

CS 8803-MDM Lecture 3 – p. 14/32

Page 25: 1. Probability 2. Random variablesjhuan/EECS730_F12/slides/3_statII.pdf · Distribution Functions X is continuous if there exists a function f such that f(x) ≥ 0 for all x, R∞

Convergence

Suppose X1, X2, . . . is a sequence of random variables andX is another random variable. FN is the CDF of XN and F

is the CDF of X.

XN converges in probability to X, XNp→ X, if for every

ǫ > 0,P(|XN − X| > ǫ) → 0 (19)

as N → ∞.

CS 8803-MDM Lecture 3 – p. 15/32

Page 26: 1. Probability 2. Random variablesjhuan/EECS730_F12/slides/3_statII.pdf · Distribution Functions X is continuous if there exists a function f such that f(x) ≥ 0 for all x, R∞

Convergence

XN converges in distribution to X, XN X, if

limN→∞

FN (t) = F (t) (20)

at all t for which F is continuous.

CS 8803-MDM Lecture 3 – p. 16/32

Page 27: 1. Probability 2. Random variablesjhuan/EECS730_F12/slides/3_statII.pdf · Distribution Functions X is continuous if there exists a function f such that f(x) ≥ 0 for all x, R∞

Convergence

XN converges in quadratic mean (or L2) to X, XNqm→ X, if

E(XN − X)2 → 0 (21)

as N → ∞.

CS 8803-MDM Lecture 3 – p. 17/32

Page 28: 1. Probability 2. Random variablesjhuan/EECS730_F12/slides/3_statII.pdf · Distribution Functions X is continuous if there exists a function f such that f(x) ≥ 0 for all x, R∞

Convergence

These are ordered in strength:

XNqm→ X ⇒ XN

p→ X ⇒ XN X (22)

Special case: if P(X = c) = 1 for some c ∈ R,

XN X ⇒ XNp→ X. But in general none of the reverse

implications hold.

CS 8803-MDM Lecture 3 – p. 18/32

Page 29: 1. Probability 2. Random variablesjhuan/EECS730_F12/slides/3_statII.pdf · Distribution Functions X is continuous if there exists a function f such that f(x) ≥ 0 for all x, R∞

(Weak) Law of Large Numbers

Theorem (WLLN): If X1, . . . , XN are IID, and E(Xi) = µ, then

XNp→ µ.

This says that the sample mean XN approaches the truemean µ as N gets large.

CS 8803-MDM Lecture 3 – p. 19/32

Page 30: 1. Probability 2. Random variablesjhuan/EECS730_F12/slides/3_statII.pdf · Distribution Functions X is continuous if there exists a function f such that f(x) ≥ 0 for all x, R∞

WLLN: Proof

To make the proof simpler (though it’s not strictlynecessary), assume the variance is finite (σ < ∞). Thenusing Chebyshev’s inequality,

P(|XN − µ| > ǫ) ≤ V(XN )

ǫ2(23)

=σ2

Nǫ2(24)

which approaches 0 as N → ∞.

CS 8803-MDM Lecture 3 – p. 20/32

Page 31: 1. Probability 2. Random variablesjhuan/EECS730_F12/slides/3_statII.pdf · Distribution Functions X is continuous if there exists a function f such that f(x) ≥ 0 for all x, R∞

Central Limit Theorem

Theorem (CLT): If X1, . . . , XN are IID (with any distribution),with mean µ and variance σ2, then

ZN =XN − µ√

V(XN )=

√N(XN − µ)

σ Z (25)

where Z ∼ N(0, 1). In other words,

limN→∞

P(ZN ≤ z) = Φ(z) =

∫ z

−∞

1√2π

e−x2/2dx. (26)

CS 8803-MDM Lecture 3 – p. 21/32

Page 32: 1. Probability 2. Random variablesjhuan/EECS730_F12/slides/3_statII.pdf · Distribution Functions X is continuous if there exists a function f such that f(x) ≥ 0 for all x, R∞

Central Limit Theorem

This says that probability statements about XN can beapproximated using a Normal distribution. This is written as

ZN =

√N(XN − µ)

σ≈ N(0, 1) (27)

XN ≈ N

(µ,

σ2

). (28)

CS 8803-MDM Lecture 3 – p. 22/32

Page 33: 1. Probability 2. Random variablesjhuan/EECS730_F12/slides/3_statII.pdf · Distribution Functions X is continuous if there exists a function f such that f(x) ≥ 0 for all x, R∞

Acknowledgement: The slides are from Alexander Gray @ George Tech.

Top Related

MATEMATIKA 1 - Študentski.net · MATEMATIKA 1 FUNKCIJE PODAJANJE FUNKCIJ 2 PODAJANJE FUNKCIJ x A je argument, f(x) B je funkcijska vrednost. f: A B f: x ↦ f(x) g f x g f xo: (

mirza.staff.ugm.ac.idmirza.staff.ugm.ac.id/termo/TERMODINAMIKA.docx · Web viewf x =f x 0 + df dx ∆x+ 1 2! d 2 f ∂x 2 ∆x 2 +… V T,P dibuat pendekatan V T,P = V 0 + ∂V ∂T

Of 20 Naive Evaluation q(f(a), a), q(a,c), q(f(c),c), r(a, c), r(c, d), r(e,f) p(X, Y) q(f(X), X) r(X, Y) p(X, c) f(X) X f(a) f(c) a c X Y a c c d X Y.

COLOR LAYER red · Example 9. Find FS of f(x) = (0 if ˇ x 0; 2sinx if 0 x ˇ; and f(x+ 2ˇ) = f(x) for all x. f(x) = (sinx+ jsinxj): Sum of two functions: f1 = sinx has a FS sinx,

2004 Mathematics Advanced Higher Finalised Marking ... · PDF fileSolutions to Advanced Higher Mathematics Paper 1. (a) f (x)=cos2 etan x f′(x)=2(− sin x) cosx etan x + cos2 x

Partial Derivativesdomas.store/download.php?file=Chapters13.pdf · Chapter 13 Partial Derivatives 13.1 Functions of Several Variables 1. f(x;y)j(x;y) 6= (0 ;0)g 2. f(x;y)jx6= x 3yg

MATEMATIKA 1 · 2019-03-20 · MATEMATIKA 1 ODVOD ODVODI FUNKCIJ VEČ SPREMENLJIVK 14 f x y( , ) ( , )x y0 0 grad f x h f x f h f x h f h + − × r rr r r r r sprememba vrednosti

KORDAMINE RIIGIEKSAMIKS III TRIGONOMEETRIAcos2x = cos²x - sin²x = cos²x – (1 - sin²x) = 2cos²x – 1. 2 cos x y 2 cos x y 9 b) Lahendame võrrandi f(x) = g(x) ehk cos2x = cosx.