RMBI 4210 Quantitative Methods for Risk Management...

13
Spring, 2019 Wang Jingjing Page 1 RMBI 4210 Quantitative Methods for Risk Management Prerequisite Concept Revision 1. Probability and Statistics 1.1 Distribution Normal distribution (VaR, Topic 2) X~Ν(µ, ( ), where µ is the mean or expectation of the distribution, is the standard deviation and ( is the variance. The probability density function (PDF) of the normal distribution is + = . (/ 2 2 (345) 6 6 2 The cumulative distribution function (CDF) is + = + 9 2: = . ( [1 + erf ( 92C D ( )] Note: standard normal distribution has µ=0 and ( =1. Reference: https://statistics.laerd.com/statistical-guides/normal-distribution-calculations.php Poisson distribution (times of default, Topic 3) Poisson process describes the number of times an event occurs in an interval of time or space. The average number of events in an interval is designated as , which is called event rate or rate parameter. The probability of observing events in an interval is given by the equation Pr( = ) = 2K K L M! . In Poisson distribution, = = .

Transcript of RMBI 4210 Quantitative Methods for Risk Management...

Page 1: RMBI 4210 Quantitative Methods for Risk Management ...mawjj.people.ust.hk/rmbi4210_notes/review.pdf · Wang Jingjing Page 1 RMBI 4210 Quantitative Methods for Risk Management Prerequisite

Spring,2019WangJingjing Page1

RMBI 4210 Quantitative Methods for Risk Management

Prerequisite Concept Revision

1. Probability and Statistics

1.1 Distribution

Normal distribution (VaR, Topic 2)

X~Ν(µ, 𝜎(), where µ is the mean or expectation of the distribution, 𝜎 is the standard deviation and 𝜎( is the variance.

The probability density function (PDF) of the normal distribution is 𝑓+ 𝑥 = .

(/𝜎2𝑒2

(345)6

6𝜎2

The cumulative distribution function (CDF) is 𝐹+ 𝑥 = 𝑓+ 𝑡92: 𝑑𝑡 = .

([1 + erf(92C

D ()]

Note: standard normal distribution has µ = 0 and𝜎( = 1.

Reference: https://statistics.laerd.com/statistical-guides/normal-distribution-calculations.php

Poisson distribution (times of default, Topic 3)

Poisson process describes the number of times an event occurs in an interval of time or space.

The average number of events in an interval is designated as 𝜆 , which is called event rate or rate parameter. The probability of observing 𝑘 events in an

interval is given by the equation Pr(𝑋 = 𝑘) = 𝑒2K KL

M!.

In Poisson distribution, 𝜆 = 𝐸 𝑋 = 𝑉𝑎𝑟 𝑋 .

Page 2: RMBI 4210 Quantitative Methods for Risk Management ...mawjj.people.ust.hk/rmbi4210_notes/review.pdf · Wang Jingjing Page 1 RMBI 4210 Quantitative Methods for Risk Management Prerequisite

Spring,2019WangJingjing Page2

Example 1

Births in a hospital occur randomly at an average rate of 1.8 births per hour.

What is the probability of observing 4 births in a given hour at the hospital?

𝑃 𝑋 = 4 = 𝑒2..V ..VW

X!= 0.0723.

What is the probability of observing more than or equal to 2 births in a given hour at the hospital?

𝑃 𝑋 ≥ 2 = 𝑃 𝑋 = 2 + 𝑃 𝑋 = 3 +⋯ = 1 − 𝑃 𝑋 = 0 − 𝑃 𝑋 = 1

= 1 − 𝑒2..V1.8_

0!− 𝑒2..V

1.8.

1!= 0.537

Beta Distribution (fitting of loss distribution, Topic 2)

X~Beta(α, β) is a family of continuous probability distributions defined on the interval [0, 1] parametrized by two positive shape parameters, denoted by α and β, that appear as exponents of the random variable and control the shape of the distribution.

It is often used in Bayesian analysis (a statistical procedure which endeavors to estimate parameters of an underlying distribution based on the observed distribution) to describe initial knowledge concerning probability of success. For example, it may describe the probability that a space vehicle will successfully complete a specified mission.

The probability density function (PDF) of the beta distribution is 𝑓 𝑥; 𝛼, 𝛽 = 9d4e(.29)f4e

g(h,i), where

𝐵 𝛼, 𝛽 = k(h)k(i)k(hli)

and Γ 𝑛 = 𝑛 − 1 ! (gamma function). The mean 𝐸 𝑋 = hhli

and variance

𝑉𝑎𝑟 𝑋 = hihli 6(hlil.)

. Note that 𝑥h2.(1 − 𝑥)i2.._ 𝑑𝑥 = k(hli)

k(h)k(i).

Page 3: RMBI 4210 Quantitative Methods for Risk Management ...mawjj.people.ust.hk/rmbi4210_notes/review.pdf · Wang Jingjing Page 1 RMBI 4210 Quantitative Methods for Risk Management Prerequisite

Spring,2019WangJingjing Page3

1.2 Correlation and dependence (Topic 1 - 3)

In statistics, dependence is any statistical relationship between two random variables or two sets of data. Correlation refers to any of a broad class of statistical relationships involving dependence.

Familiar examples of dependent phenomena include the correlation between the physical statures of parents and their offspring, and the correlation between the demand for a product and its price.

Correlations are useful because they can indicate a predictive relationship that can be exploited in practice. For example, an electrical utility may produce less power on a mild day based on the correlation between electricity demand and weather. In this example, there is a causal relationship, because extreme weather causes people to use more electricity for heating or cooling; however, statistical dependence is not sufficient to demonstrate the presence of such a causal relationship (i.e., correlation does not imply causation).

Dependence refers to any situation in which random variables do not satisfy a mathematical condition of probabilistic independence.

There are several correlation coefficients, often denoted ρ or r, measuring the degree of correlation. The most common of these is the Pearson correlation coefficient (commonly called simply “the correlation coefficient”), which is sensitive only to a linear relationship between two variables. The correlation coefficient 𝜌+,p between two random variables X and Y with expected values 𝜇+ and 𝜇p and standard deviations 𝜎+ and 𝜎p is defined as

𝜌+,p =𝑐𝑜𝑣(𝑋, 𝑌)𝜎+𝜎p

=𝐸[(𝑋 − 𝜇+)(𝑌 − 𝜇p)]

𝜎+𝜎p

Important property:

Correlation → Dependence & Independence → No correlation

1.3 Joint Distribution (Topic 1 - 3)

In the study of probability, given at least two random variables X, Y, ..., that are defined on a probability space, the joint probability distribution for X, Y, ... is a probability distribution that gives the probability that each of X, Y, ... falls in any particular range or discrete set of values specified for that variable. In the case of only two random variables, this is called a bivariate distribution, but the concept generalizes to any number of random variables, giving a multivariate distribution.

Page 4: RMBI 4210 Quantitative Methods for Risk Management ...mawjj.people.ust.hk/rmbi4210_notes/review.pdf · Wang Jingjing Page 1 RMBI 4210 Quantitative Methods for Risk Management Prerequisite

Spring,2019WangJingjing Page4

Based on the joint distribution, we can find two types of distribution: the marginal distribution gives the probabilities for any one of the variables with no reference to any specific ranges of values for the other variables, and the conditional probability distribution giving the probabilities for any subset of the variables conditional on particular values of the remaining variables.

Example 2 Draws from a box

Suppose each of two boxes contains twice as many red balls as blue balls, and no others, and suppose one ball is randomly selected from each box, with the two draws independent of each other. The probability of drawing a red ball from either of the boxes is 2/3, and the probability of drawing a blue ball is 1/3. We can present the joint probability distribution as the following table:

Each of the four inner cells shows the probability of a particular combination of results from the two draws; these probabilities are the joint distribution. In any one cell the probability of a particular combination occurring is (since the draws are independent) the product of the probability of the specified result for A and the probability of the specified result for B. The probabilities in these four cells sum to 1, as is always true for probability distributions.

A=Red A=Blue P(B)

B=Red (2/3)(2/3)=4/9 (1/3)(2/3)=2/9 4/9+2/9=2/3

B=Blue (2/3)(1/3)=2/9 (1/3)(1/3)=1/9 2/9+1/9=1/3

P(A) 4/9+2/9=2/3 2/9+1/9=1/3

Moreover, the final row and the final column give the marginal probability distribution for A and the marginal probability distribution for B respectively. For example, for A the first of these cells gives the sum of the probabilities for A being red, regardless of which possibility for B in the column above the cell occurs, as 2/3. Thus the marginal probability distribution for A gives A's probabilities unconditional on B, in a margin of the table.

Example 3 Roll of a die

Consider the roll of a fair die and let A = 1 if the number is even (i.e. 2, 4, or 6) and A = 0 otherwise. Furthermore, let B = 1 if the number is prime (i.e. 2, 3, or 5) and B = 0 otherwise.

Page 5: RMBI 4210 Quantitative Methods for Risk Management ...mawjj.people.ust.hk/rmbi4210_notes/review.pdf · Wang Jingjing Page 1 RMBI 4210 Quantitative Methods for Risk Management Prerequisite

Spring,2019WangJingjing Page5

1 2 3 4 5 6

A 0 1 0 1 0 1

B 0 1 1 0 1 0

Then the joint distribution of A and B, expressed as a probability mass function is

𝑃 𝐴 = 0, 𝐵 = 0 = 𝑃 1 =16,𝑃 𝐴 = 1, 𝐵 = 0 = 𝑃 4,6 =

26

𝑃 𝐴 = 0, 𝐵 = 1 = 𝑃 3,5 =26,𝑃 𝐴 = 1, 𝐵 = 1 = 𝑃 2 =

16

From the above examples, we can see that the joint probability distribution can be expressed either in terms of a joint cumulative distribution function or in terms of a joint probability density function (in the case of continuous variables) or joint probability mass function (in the case of discrete variables).

In mathematics, we can express the joint probability mass function of two discrete random variables X, Y as

𝑃 𝑋 = 𝑥, 𝑌 = 𝑦 = 𝑃 𝑌 = 𝑦 𝑋 = 𝑥 ∙ 𝑃 𝑋 = 𝑥 = 𝑃 𝑋 = 𝑥 𝑌 = 𝑦 ∙ 𝑃 𝑌 = 𝑦

𝑃 𝑋 = 𝑥{𝑎𝑛𝑑𝑌 = 𝑦||{

= 1

where 𝑃 𝑌 = 𝑦 𝑋 = 𝑥 is the probability of 𝑌 = 𝑦 given that 𝑋 = 𝑥 , and 𝑃 𝑌 = 𝑦 𝑋 = 𝑥 =}(+~9∩p~�)

}(+~9).

The joint probability density function 𝑓+,p 𝑥, 𝑦 for two continuous random variables is defined as

𝑓+,p 𝑥, 𝑦 = 𝑓p|+ 𝑦|𝑥 𝑓+ 𝑥 = 𝑓+|p 𝑥|𝑦 𝑓p 𝑦

𝑓+,p 𝑥, 𝑦�

𝑑𝑦𝑑𝑥9

= 1

Page 6: RMBI 4210 Quantitative Methods for Risk Management ...mawjj.people.ust.hk/rmbi4210_notes/review.pdf · Wang Jingjing Page 1 RMBI 4210 Quantitative Methods for Risk Management Prerequisite

Spring,2019WangJingjing Page6

where 𝑓p|+ 𝑦|𝑥 and 𝑓+|p 𝑥|𝑦 are the conditional probabilities of 𝑌 given that 𝑋 = 𝑥 and 𝑋 given 𝑌 = 𝑦 respectively, and 𝑓+ 𝑥 and 𝑓p 𝑦 are the marginal distribution for 𝑋 and 𝑌 respectively.

Example 3 Con’t

Then the unconditional probability that A = 1 is 3/6 = 1/2 (since there are six possible rolls of the die, of which three are even), whereas the probability that A = 1 conditional on B = 1 is 1/3 (since there are three possible prime number rolls—2, 3, and 5—of which one is even).

The joint Gaussian distribution (or bivariate normal distribution)

The PDF of this special case is

where 𝜌 is the correlation between X and Y and

Note that if X and Y are normally distributed and independent, this implies they are "jointly normally distributed", i.e. (X,Y) must have bivariate normal distribution. However, a pair of jointly normally distributed variables need not be independent, they are only uncorrelated. (Independent implies uncorrelated, but not vice versa)

1.4 Conditional Expectation and Conditional Independence (Expected shortfall/ Extreme value theory/default intensities, etc)

Conditional expectation is the expected value of a random variable given that a certain set of conditions is known to occur.

Example of last tutorial (Roll of a die)

The unconditional expectation of A is E[A] = (0+1+0+1+0+1) / 6 = 1/2. But the expectation of A conditional on B = 1 (i.e., conditional on the die roll being 2, 3, or 5) is E[A|B=1] = (1+0+0) / 3 = 1/3, and the expectation of A conditional on B = 0 (i.e., conditional on the die roll being 1, 4, or 6) is E[A|B=0] = (0+1+1) / 3 = 2/3. Likewise, the expectation of B conditional on A = 1 is E[B|A=1] = (1+0+0) / 3 = 1/3, and the expectation of B conditional on A = 0 is E[B|A=0] = (0+1+1) / 3 = 2/3.

Page 7: RMBI 4210 Quantitative Methods for Risk Management ...mawjj.people.ust.hk/rmbi4210_notes/review.pdf · Wang Jingjing Page 1 RMBI 4210 Quantitative Methods for Risk Management Prerequisite

Spring,2019WangJingjing Page7

In mathematics, we let X, Y be continuous random variables, the conditional density of X given Y to be is defined as

𝑓+|p 𝑥|𝑦 =𝑓+,p(𝑥, 𝑦)𝑓p(𝑦)

and the conditional expectation of X given Y=y is

𝐸 𝑋 𝑌 = 𝑦 = 𝑥:

2:𝑓+|p 𝑥|𝑦 𝑑𝑥

For discrete case, say for events A and B, the conditional density of X given the event B is

𝑓 𝑥 𝐵 = 𝑃 𝑋 = 𝑥 𝐵 = 𝑃(𝑋 = 𝑥, 𝐵)

𝑃(𝐵)

and the conditional expectation of X given B is

𝐸 𝑋 𝐵 = 𝑥𝑓(𝑥|𝐵)9

Example 1 Roll a die

Roll a die until we get a 6. Let Y be the total number of rolls and X the number of 1's we get. We compute E[X|Y = y]. The event Y = y means that there were y-1 rolls that were not a 6 and then the yth roll was a six. So given this event, X has a binomial distribution with n = y-1 trials and probability of success p = 1/5. So

𝐸 𝑋 𝑌 = 𝑦 = 𝑛𝑝 =15(𝑦 − 1)

Conditional independence: two events R and B are conditionally independent given a third event Y precisely if the occurrence of R and the occurrence of B are independent events in their conditional probability distribution given Y. In other words, R and B are conditionally independent given Y if and only if, given knowledge that Y occurs, knowledge of whether R occurs provides no information on the likelihood of B occurring, and knowledge of whether B occurs provides no information on the likelihood of R occurring.

Pr 𝑅 ∩ 𝐵 𝑌 = Pr 𝑅 𝑌 Pr(𝐵|𝑌)

Example 2

Let the two events be the probabilities of persons A and B getting home in time for dinner, and the third event is the fact that a snow storm hit the city. While both A and B have a lower probability of getting home in time for dinner, the lower probabilities will still be independent of

Page 8: RMBI 4210 Quantitative Methods for Risk Management ...mawjj.people.ust.hk/rmbi4210_notes/review.pdf · Wang Jingjing Page 1 RMBI 4210 Quantitative Methods for Risk Management Prerequisite

Spring,2019WangJingjing Page8

each other. That is, the knowledge that A is late does not tell you whether B will be late. (They may be living in different neighborhoods, traveling different distances, and using different modes of transportation.) However, if you have information that they live in the same neighborhood, use the same transportation, and work at the same place, then the two events are NOT conditionally independent.

Example 3 Difference between independence and conditional independence

Suppose Norman and Martin each toss separate coins. Let A represent the variable "Norman's toss outcome", and B represent the variable "Martin's toss outcome". Both A and B have two possible values (Heads and Tails). It would be uncontroversial to assume that A and B are independent. Evidence about B will not change our belief in A.

Now suppose both Martin and Norman toss the same coin. Again let A represent the variable "Norman's toss outcome", and B represent the variable "Martin's toss outcome". Assume also that there is a possibility that the coin in biased towards heads but we do not know this for certain. In this case A and B are not independent. For example, observing that B is Heads causes us to increase our belief in A being Heads (in other words P(A|B)>P(B) in the case when a=Heads and b=Heads).

In the second case, the variables A and B are both dependent on a separate variable C, "the coin is biased towards Heads" (which has the values True or False). Although A and B are not independent, it turns out that once we know for certain the value of C then any evidence about B cannot change our belief about A. Specifically: P(A|C) = P(A|B,C)

In such case we say that A and B are conditionally independent given C.

2. Linear Regression (relative to market risk and basis risk, Topic 2)

Regression analysis is a statistical process for estimating the relationships between a dependent variable and one or more independent (explanatory) variables.

If the independent variable(s) sufficiently explain the variation in the dependent variable through the regression function, the model can be used for prediction.

In linear regression, the relationships are modeled using linear regression functions whose unknown model parameters are estimated from the data.

Page 9: RMBI 4210 Quantitative Methods for Risk Management ...mawjj.people.ust.hk/rmbi4210_notes/review.pdf · Wang Jingjing Page 1 RMBI 4210 Quantitative Methods for Risk Management Prerequisite

Spring,2019WangJingjing Page9

Simple linear regression only models the relationship between a dependent variable y and an independent variable x. For modeling n data points, we denote:

𝑦{ = 𝛽_ + 𝛽.𝑥{ + 𝜖{, 𝑖= 1,… , 𝑛

where 𝛽_ is the intercept, 𝛽. is the slope and 𝜖{ is the error (normally distributed as N~(0,𝜎�)). The predicted/estimated value

𝑦�would be obtained by the linear regression function 𝑦� = 𝛽_ + 𝛽.𝑥{, 𝑖 = 1,… , 𝑛

The least squares estimation is the most commonly used method which selects the line with the lowest total sum of squared estimation errors.

Minimize the Sum of Squares of Error (SSE): 𝑆𝑆𝐸 = (𝑦{ − 𝑦�)(�

{~.

By solving the systems of equations, we can obtain 𝛽_and𝛽. as the estimates of the parameters.

𝛽. =(9�29)(��2�)

(9�29)6, 𝛽_ = 𝑦 − 𝛽.𝑥.

Suppose that we know the coefficient correlation 𝜌+p between the two sets of data of X and Y. Then 𝛽. =

���(+,p)���(+)

= ���D�D�D�6

= 𝜌+pD�D�

.

Example 4

How strong is the linear relationship between temperatures in Celsius and temperatures in Fahrenheit? Here's a plot of an estimated regression equation based on n = 11 data points:

Page 10: RMBI 4210 Quantitative Methods for Risk Management ...mawjj.people.ust.hk/rmbi4210_notes/review.pdf · Wang Jingjing Page 1 RMBI 4210 Quantitative Methods for Risk Management Prerequisite

Spring,2019WangJingjing Page10

As 𝜌 = 1 for this case, it is not surprising to see a perfect linear relation between them. Fahreheit = 32+1.8*Celsius.

Example 5

How strong is the linear relationship between the number of stories a building has and its height? One would think that as the number of stories increases, the height would increase, but not perfectly. Some statisticians compiled data on a set of n = 60 buildings reported in the 1994 World Almanac. Here is the line plot and correlation output (𝜌 = 0.951):

Example 6

How strong is the linear relationship between the age of a driver and the distance the driver can see? If we had to guess, we might think that the relationship is negative — as age increases, the distance decreases. A research firm (Last Resource, Inc., Bellefonte, PA) collected data on a sample of n = 30 drivers and the line plot and correlation output (𝜌 =−0.801) look like:

Example 7

How strong is the linear relationship between the height of a student and his or her grade point average? Data were collected on a random sample of n = 35 students in a statistics course at Penn State University and the resulting fitted line plot and correlation output were obtained (𝜌 = −0.053):

Note that one special case of linear regression: CAPM (Capital Asset Pricing Model) (Refer to MATH4512 Topic 3 P2-P26)

Page 11: RMBI 4210 Quantitative Methods for Risk Management ...mawjj.people.ust.hk/rmbi4210_notes/review.pdf · Wang Jingjing Page 1 RMBI 4210 Quantitative Methods for Risk Management Prerequisite

Spring,2019WangJingjing Page11

3. Linear Algebra - Decomposition (for principal component analysis, Topic 3)

3.1 Spectral decomposition (Eigendecomposition)

Definition An eigenvector of an n × n matrix A is a nonzero vector x such that Ax = 𝜆x for some scalar 𝜆. If such x exists, then 𝜆 is called an eigenvalue of A and x is called an eigenvector corresponding to 𝜆.

Definition Let 𝜆 be an eigenvalue of A, then the solution space of (A-𝜆 I)x = 0 is a subspace of Rn which is called the eigenspace of A corresponding to 𝜆.

Theorem (The diagonalization theorem) An n × n matrix A is diagonalizable if and only if A has n linearly independent eigenvectors. More precisely, A = PDP-1, with P an invertible matrix consisting of the eigenvectors as the columns and D a diagonal matrix consisting of the eigenvalues corresponding to the eigenvectors in P.

Example 1

Diagonalize the following matrix, if possible.

Solving the characteristic equation

0 = det 𝐴 − 𝜆𝐼 = −(𝜆 − 1) (𝜆 + 2)(

We obtain 𝜆 = 1 and 𝜆 = −2. For 𝜆 = 1, we solve linear system (𝐴 − 𝐼)𝒙 = 𝟎 and get a basis 𝒗𝟏 = 1,−1,1 � ; for 𝜆 = −2, solving (𝐴 + 2𝐼)𝒙 = 𝟎 gives two linear independent solutions: 𝒗𝟐 = −1,1,0 � and 𝒗𝟑 = −1,0,1 �. Therefore, P and D can be found as followings:

Note: algebraic multiplicity ≥ geometric multiplicity.

3.2 Cholesky decomposition

In a matrix A is Hermitian (a complex square matrix that is equal to its own conjugate transpose, 𝐴 = 𝐴�) and positive-definite (a symmetric n × n real matrix A is said to be positive definite if

Page 12: RMBI 4210 Quantitative Methods for Risk Management ...mawjj.people.ust.hk/rmbi4210_notes/review.pdf · Wang Jingjing Page 1 RMBI 4210 Quantitative Methods for Risk Management Prerequisite

Spring,2019WangJingjing Page12

the scalar zTAz is positive for every non-zero column vector z of n real numbers), then it has a Cholesky decomposition of the form 𝐴 = 𝐿𝐿∗, where 𝐿 is a lower triangular matrix with real and positive diagonal entries and 𝐿∗ denotes the conjugate transpose of 𝐿.

LDL decomposition 𝐴 = 𝐿𝐷𝐿∗, where 𝐿 is a lower unit triangular matrix and D is a diagonal matrix.

Example 2

Cholesky decomposition of a symmetric real matrix:

And here is its LDL decomposition:

The Cholesky-Banachiewicz and Cholesky-Crout Algorithms

The Cholesky decomposition is commonly used in the Monte Carlo method for simulating systems with multiple correlated variables. The correlation matrix is decomposed, to give the lower-triangular L. For example, we can apply Cholesky decomposition to a 2×2correlation matrix to obtain two

Page 13: RMBI 4210 Quantitative Methods for Risk Management ...mawjj.people.ust.hk/rmbi4210_notes/review.pdf · Wang Jingjing Page 1 RMBI 4210 Quantitative Methods for Risk Management Prerequisite

Spring,2019WangJingjing Page13

correlated normal variable 𝑥. and 𝑥( with the given correlation coefficient 𝜌 from the matrix. By doing this, we relate two uncorrelated Gaussian random variables 𝑧. and 𝑧( with the two correlated normal variable 𝑥. and 𝑥( by letting 𝑥. = 𝑧. and 𝑥( = 𝜌𝑧. + 1 − 𝜌(𝑧( .

3.3 Singular value decomposition (SVD)

SVD is a factorization of a real or complex matrix. It is the generalization of the eigendecompostiion of a positive semidefinite normal matrix (z*Az≥ 0 and A*A=AA*, * means conjugate transpose) to any n × p matrix.

Anxp= Unxn Snxp VTpxp

where UTU = Inxn, VTV = Ipxp (i.e. U and V are orthogonal) Example 3 http://web.mit.edu/be.400/www/SVD/Singular_Value_Decomposition.htm