ACTS 4304 - University of Texas at Dallasnah103020/STAMI_AU... · Summary of Parametric...

54
ACTS 4304 FORMULA SUMMARY Lesson 1. Basic Probability Probability Functions F (x)= Pr(X x) S (x)=1 - F (x) f (x)= dF (x) dx H (x)= - ln S (x) h(x)= dH (x) dx = f (x) S (x) Functions of random variables Expected Value E[g(x)] = Z -∞ g(x)f (x)dx n-th raw moment μ 0 n = E[X n ] n-th central moment μ n = E[(X - μ) n ] Variance σ 2 = E[(X - μ) 2 ]= E[X 2 ] - μ 2 Skewness γ 1 = μ 3 σ 3 = μ 0 3 - 3μ 0 2 μ +2μ 3 σ 3 Kurtosis γ 2 = μ 4 σ 4 = μ 0 4 - 4μ 0 3 μ +6μ 0 2 μ 2 - 3μ 4 σ 4 Moment generating function M (t)= E[e tX ] Probability generating function P (z )= E[z X ] 100p-th percentile (1) 100p-th percentile π is any point satisfying F X (π - ) p and F X (π) p. If F X is continuous, it is the unique point satisfying F X (π)= p If F X is discrete, then π is the point that satisfies: Pr(X π) p and Pr(X π) 1 - p (2) Median is 50-th percentile; n-th quartile is 25n-th percentile More concepts (1) Standard deviation (σ) is positive square root of variance (2) Coefficient of variation is CV = σ/μ (3) Mode is x which maximizes f (x) (4) M (n) X (0) = E[X n ], where M (n) is the n-th derivative (5) P (n) X (0) n! = Pr(X = n) (6) P (n) X (1) is the n-th factorial moment of X.

Transcript of ACTS 4304 - University of Texas at Dallasnah103020/STAMI_AU... · Summary of Parametric...

Page 1: ACTS 4304 - University of Texas at Dallasnah103020/STAMI_AU... · Summary of Parametric Distribution Concepts If Xis a member of a scale family with scale parameter with value s,

ACTS 4304

FORMULA SUMMARY

Lesson 1. Basic Probability

Probability Functions

F (x) = Pr(X ≤ x)

S(x) = 1− F (x)

f(x) =dF (x)

dxH(x) = − lnS(x)

h(x) =dH(x)

dx=f(x)

S(x)

Functions of random variables

Expected Value E[g(x)] =

∫ ∞−∞

g(x)f(x)dx

n-th raw moment µ′n = E[Xn]

n-th central moment µn = E[(X − µ)n]

Variance σ2 = E[(X − µ)2] = E[X2]− µ2

Skewness γ1 =µ3

σ3=µ′3 − 3µ′2µ+ 2µ3

σ3

Kurtosis γ2 =µ4

σ4=µ′4 − 4µ′3µ+ 6µ′2µ

2 − 3µ4

σ4

Moment generating function M(t) = E[etX ]

Probability generating function P (z) = E[zX ]

100p-th percentile

(1) 100p-th percentile π is any point satisfying FX(π−) ≤ p and FX(π) ≥ p.

If FX is continuous, it is the unique point satisfying FX(π) = p

If FX is discrete, then π is the point that satisfies:

Pr(X ≤ π) ≥ p and Pr(X ≥ π) ≥ 1− p

(2) Median is 50-th percentile; n-th quartile is 25n-th percentile

More concepts

(1) Standard deviation (σ) is positive square root of variance(2) Coefficient of variation is CV = σ/µ(3) Mode is x which maximizes f(x)

(4) M(n)X (0) = E[Xn], where M (n) is the n-th derivative

(5)P

(n)X (0)n! = Pr(X = n)

(6) P(n)X (1) is the n-th factorial moment of X.

Page 2: ACTS 4304 - University of Texas at Dallasnah103020/STAMI_AU... · Summary of Parametric Distribution Concepts If Xis a member of a scale family with scale parameter with value s,

2

Bayes’ Theorem

Pr(A|B) =Pr(B|A)Pr(A)

Pr(B)

fX(x|y) =fY (y|x)fX(x)

fY (y)

Law of total probability

If Bi is a set of exhaustive (in other words, Pr(∪iBi) = 1) and mutually exclusive (in other wordsPr(Bi ∩Bj) = 0 for i 6= j) events, then for any event A,

Pr(A) =∑i

Pr(A ∩Bi) =∑i

Pr(Bi)Pr(A|Bi)

For continuous distributions,

Pr(A) =

∫Pr(A|x)f(x)dx

Conditional Expectation and Variance Formulae

Double expectation EX [X] = EY [EX [X|Y ]]

Conditional variance V arX [X] = EY [V arX [X|Y ]] + V arY (EX [X|Y ])

Bernoulli ShortcutIf a random variable can only assume two values a and b with probabilities q and 1− q respectively,then its variance is

V ar(X) = (b− a)2q(1− q)

Page 3: ACTS 4304 - University of Texas at Dallasnah103020/STAMI_AU... · Summary of Parametric Distribution Concepts If Xis a member of a scale family with scale parameter with value s,

3

Lesson 2. Parametric Distributions

Forms of probability density functions for common distributions

Distribution Probability density function f(x)

Uniform c, x ∈ [d, u]

Beta cxa−1(θ − x)b−1, x ∈ [0, θ]

Exponential ce−xθ , x ≥ 0

Weibull cxτ−1e−xτ/θτ , x ≥ 0

Gamma cxα−1e−xθ , x ≥ 0

Pareto c(x+θ)α+1 , x ≥ 0

Single-parameter Pareto cxα+1 , x ≥ θ

Lognormal ce−(ln x−µ)2/2σ2

x , x > 0

Summary of Parametric Distribution Concepts

• If X is a member of a scale family with scale parameter θ with value s, then cX is the samefamily and has the same parameter values as X except that the scale parameter θ has valuecs.• All distributions in the tables are scale families with scale parameter θ except for lognormal

and inverse Gaussian.• If X is lognormal with parameters µ and σ, then cX is lognormal with parameters µ + ln c

and σ.• See the above table to learn the forms of commonly occurring distributions. Useful facts are

Uniform on [d, u] E[X] =d+ u

2, V arX =

(u− d)2

12

Uniform on [0, u] E[X2] =u2

3

Gamma V arX = αθ2

• If Y is single-parameter Pareto with parameters α and θ, then Y − θ is two-parameter Paretowith parameters.• X is in the linear exponential family if its probability density function can be expressed as

f(x; θ) =p(x)er(θ)x

q(θ)

Page 4: ACTS 4304 - University of Texas at Dallasnah103020/STAMI_AU... · Summary of Parametric Distribution Concepts If Xis a member of a scale family with scale parameter with value s,

4

Lesson 3. Variance

For any random variables X and Y,E[aX + bY ] = aE[X] + bE[Y ]

V ar(aX + bY ) = a2V ar(X) + 2abCov(X,Y ) + b2V ar(Y )

For independent random variables X1, X2, · · · , Xn, V ar

(n∑i=1

Xi

)=

n∑i=1

V ar(Xi)

For independent identically distributed random variables (i.i.d.) X1, X2, · · · , Xn,

V ar

(n∑i=1

Xi

)= nV ar(X)

The sample mean X =1

n

n∑i=1

Xi

The variance of the sample mean V ar(X) =1

nV ar(X)

Double expectation EX [X] = EY [EX [X|Y ]]

Conditional variance V arX [X] = EY [V arX [X|Y ]] + V arY (EX [X|Y ])

Page 5: ACTS 4304 - University of Texas at Dallasnah103020/STAMI_AU... · Summary of Parametric Distribution Concepts If Xis a member of a scale family with scale parameter with value s,

5

Lesson 4. Mixtures and Splices

• If X is a mixture of n random variables with weights wi such that∑n

i=1wi = 1, then thefollowing can be expressed as a weighted average:

Cumulative distribution function: FX(x) =

n∑i=1

wiFXi(x)

Probability density function: fX(x) =

n∑i=1

wifXi(x)

k-th raw moment: E[Xk] =n∑i=1

wiE[Xik]

• Conditional variance:

V arX [X] = EI [V arX [X|I]] + V arI(EX [X|I])

• For a frailty model, given the hazard rate function for each individual,

h (x|Λ) = Λa(x),

The survival function is of the form:

SX(x) = EΛ

[e−ΛA(x)

]= MΛ (−A(x)) ,

where A(x) =∫ x

0 a(t) dt and MΛ(x) is the moment generating function of the random variableΛ.• Splices: For a spliced distribution, the sum of the probabilities of being in each splice must

add up to 1.

Page 6: ACTS 4304 - University of Texas at Dallasnah103020/STAMI_AU... · Summary of Parametric Distribution Concepts If Xis a member of a scale family with scale parameter with value s,

6

Lesson 5. Property/Casualty Insurance Coverages

Let I(X) be the insurance amount for loss X, SI be the Sum Insured, and FV be the Full Value atthe time of the loss. Then if the insurance equals at least α of the value of the house at the time ofthe loss, the insurance amount is calculated as:

I(X) = min

{SI,

SI

α · FV·X}

Typically α = 80%.

Page 7: ACTS 4304 - University of Texas at Dallasnah103020/STAMI_AU... · Summary of Parametric Distribution Concepts If Xis a member of a scale family with scale parameter with value s,

7

Lesson 6. Health Insurance

Types of Coverage Modifications:

1. Allowed charges2. Deductibles3. Coinsurance4. Out of pocket limits5. Maximum limits6. Internal limits7. Copays

Variations on Major Medical:

1. Comprehensive major medical coverage2. Catastrophic medical3. Short term medical4. High risk pool plans

Dental Insurance: Types of Coverage:

I. Preventive (X-rays, cleaning)II. Basic (fillings, extractions)

III. Prosthetic (inlays, crowns)IV. Orthodontia

Page 8: ACTS 4304 - University of Texas at Dallasnah103020/STAMI_AU... · Summary of Parametric Distribution Concepts If Xis a member of a scale family with scale parameter with value s,

8

Lesson 7. Loss Reserving: Basic Methods

LetR – be the Total ReserveELR – be the Expected Loss RatioEP – be the Earned premiumPTD – be the Amount of Claim Paid-to-Datefult be the ultimate development factorfj be the link ratios from year j − 1 to year j

Expected Loss Ratio Method:RLR = ELR · EP − PTD

Chain Ladder Method:RCL = fult · PTD − PTD = PTD (fult − 1)

Bornhuetter-Ferguson Method:

fult =∏j

fj

RBF = EP · ELR ·(

1− 1

fult

)Connection Between the Three Reserve Methods:

RBF =

(1− 1

fult

)RLR +

1

fultRCL

Page 9: ACTS 4304 - University of Texas at Dallasnah103020/STAMI_AU... · Summary of Parametric Distribution Concepts If Xis a member of a scale family with scale parameter with value s,

9

Lesson 8. Loss Reserving: Other Methods

Method of Projecting Frequency and Severity Separately:

1) List Cumulative Payments Loss triangle.2) List Cumulative Closed Claims triangle.3) Obtain the Cumulative Average Claim Size triangle by dividing the Cumulative Loss Payments

by the corresponding Cumulative Closed Claims.4) Using the volume-weighted average technique, calculate the link factors for the Average Severities.5) Using the volume-weighted average technique, calculate the link factors for the Closed Claims.6) Project Average Severities through Development Years.7) Project Closed Claims through Development Years.8) List the Ultimate Average Severities from the last DY column for each AY.9) List the Ultimate Closed Claims from the last DY column for each AY.

10) Multiply the Ultimate Average Severities by the Ultimate Closed Claims to obtain the UltimateLosses for each AY.

11) List the PTD claims from the Cumulative Payments Loss triangle - same way as in CL method.12) Obtain the Reserve by subtracting the PTD from the Ultimate Losses for each AY.13) The Total Reserve is the sum of all Reserves over the AYs.

Closure Method:

1) List Cumulative Payments Loss Triangle .2) Create the corresponding Incremental Payments Loss Triangle.3) List Cumulative Closed Claims Triangle.4) Create the corresponding Incremental Closed Claims Triangle.5) Working with the Incremental Payments and Closed Claims, create the Incremental Severity Tri-

angle by dividing incremental claim counts into incremental payments.6) Trend all average severity numbers to the last AY by multiplying by 1+trend raised to the corre-

sponding powers and then calculate their averages for each development year.7) List these averages as projected incremental severities for the last AY, development years 1-n− 1

(leave the AYn, DY 0 number the same) and detrend the averages back through accident yearsfor each DY by dividing by 1+trend raised to the corresponding powers.

8) Working with Ultimate Claim Counts and the Incremental Closed Claims, create a triangle showingthe percentage of closed claims. The Ultimate Closed Claims were calculated by working withcumulative closed claims triangle and projecting using the volume-weighted averages.

9) Using these percentages complete the Projected Incremental Closed Claims triangle10) Multiplying the projected incremental severities obtained in Step 7 and annual incremental closed

claims obtained in Step 9, find the projected loss payments.11) Add up the corresponding products by AY to obtain the reserve for each AY and then the total

reserve.

Method of Discounted Loss Reserves:

1) List Projected Cumulative Payments using Chain Ladder method.2) Calculate the Projected Incremental Payments by subtracting projected cumulative payments in

DY t-1 from the projected cumulative payments in DY t. You will obtain a lower hand triangle.3) Discount Projected Incremental Payments.4) Add up the results by AY to obtain the reserve for each AY and then the total reserve.

Page 10: ACTS 4304 - University of Texas at Dallasnah103020/STAMI_AU... · Summary of Parametric Distribution Concepts If Xis a member of a scale family with scale parameter with value s,

10

Lesson 9. Ratemaking: Preliminary Calculations

The Method of Least SquaresGiven the data (xi, yi)

ni=1, one can fit the ”best fit” line y = mx+ b through these data points, where

b =

∑ni=1 yi ·

∑ni=1 x

2i − (

∑ni=1 xiyi)

∑ni=1 xi

n∑n

i=1 x2i − (

∑ni=1 xi)

2

m =n∑n

i=1 xiyi −∑n

i=1 xi∑n

i=1 yi

n∑n

i=1 x2i − (

∑ni=1 xi)

2

Gross Rate FormulaeLet V be the expense ratio or the proportion of premium needed for expenses, contingencies, andprofit. Let it include all expenses other than LAE as a percentage of the Gross Rate.

Let L be the loss cost, developed and trended and including LAE.

Let R be the permissible loss ratio: R = 1− V . Then the gross rate G is:

G =L

1− V=L

R

For separated fixed and variable expenses:

G =L+ F

1− V=L+ F

R

If F is an amount fixed by state regulation and the regulation does not allow F to be grossed up bythe loss ratio, then

G =L

R+ F

Credibility Factor

Z = min

(√n

1082, 1

)Z = min

(√n

nF, 1

), where nF is the number of expected claims needed for full credibility

Z =E

E +K, where E is a measure of exposure and K a constant related to the distribution of the claims.

Buhlmann Credibility

Z =n

n+ k, where k =

ν

aand

• µ, or EHM, is the expected value of the hypothetical mean, or the overall mean: µ = E[E[X|θ]].• a, or VHM, is the variance of the hypothetical mean a = V ar (E[X|θ]).• ν, or EPV, is the expected value of the process variance: ν = E[V ar (X|θ)].

Page 11: ACTS 4304 - University of Texas at Dallasnah103020/STAMI_AU... · Summary of Parametric Distribution Concepts If Xis a member of a scale family with scale parameter with value s,

11

Lesson 10. Ratemaking: Rate Changes and Individual Risk Rating Plans

Overall Average Rate Change

The Loss Cost Method

Average Loss Cost =Expected Losses, Trended and Developed

Number of Earned Exposures

Average Gross Rate =Average Loss Cost + F

R,

where F is fixed expense per policy and R is the permissible loss ratio.

Indicated Rate Change =Average Gross Rate

Current Average Gross Rate− 1

The Loss Ratio Method

Indicated Rate Change =

=Effective Loss Ratio+ Fixed Expense Ratio

R− 1,

where

Effective Loss Ratio =

=Expected Losses, Trended and Developed

Earned Premium at Current Rates

Fixed Expense Ratio =

=Fixed Expenses per Exposure

EP at Current Rates/Number of Earned Exposures=

=Total Fixed Expenses

Earned Premium at Current Rates

Updating Class Differentials

The Loss Ratio method :

Indicated Differentiali = Existing Differentiali ·Ri

Rbase,

where R’s are the experience loss ratios.

The Loss Cost method :

Indicated Differentiali =Loss Costi

Loss Costbase

Balancing Back

1. After changing differentials, the resulting loss cost will not balance back to the expected loss costbecause the average of the differentials is not 1.

2. We thus must multiply the rates by a factor.3. The numerator of the factor is the weighted average of existing differentials.4. The denominator of the factor is the weighted average of the proposed differentials.5. The weights are the earned exposures.

Page 12: ACTS 4304 - University of Texas at Dallasnah103020/STAMI_AU... · Summary of Parametric Distribution Concepts If Xis a member of a scale family with scale parameter with value s,

12

Lesson 11. Policy Limits

All formulas assume Pr(X < 0) = 0.

E[X] =

∫ ∞0

S(x) dx

E[X ∧ u] =

∫ u

0xf(x) dx+ u (1− F (u)) =

∫ u

0S(x) dx

E[Xk] =

∫ ∞0

kxk−1S(x) dx

E[(X ∧ u)k] =

∫ u

0xkf(x) dx+ uk (1− F (u)) =

∫ u

0kxk−1S(x) dx

For inflation, if Y = (1 + r)X, then

E[Y ∧ u] = (1 + r)E

[X ∧ u

1 + r

]

Page 13: ACTS 4304 - University of Texas at Dallasnah103020/STAMI_AU... · Summary of Parametric Distribution Concepts If Xis a member of a scale family with scale parameter with value s,

13

Lesson 12. Deductibles

Payment per Loss:

FY L(x) = FX(x+ d), if Y L = (X − d)+

E[(X − d)+] =

∫ ∞d

(x− d)f(x) dx

E[(X − d)+] =

∫ ∞d

S(x) dx

E[X] = E[X ∧ d] + E[(X − d)+]

Payment per Payment:

FY P (x) =FX(x+ d)− FX(d)

1− FX(d), if Y P = (X − d)+|X > d

SY P (x) =SX(x+ d)

SX(d), if Y P = (X − d)+|X > d

eX(d) =E[(X − d)+]

S(d)=

E[X]−E[X ∧ d]

S(d)− mean excess loss

eX(d) =

∫∞d (x− d)f(x) dx

S(d)

eX(d) =

∫∞d S(x) dx

S(d)

E[X] = E[X ∧ d] + e(d) (1− F (d))

Mean excess loss for different distributions:

eX(d) = θ for exponential

eX(d) =θ − d

2, d < θ for uniform on [0, θ]

eX(d) =θ − d1 + b

, d < θ for beta with parameters a = 1, b, θ

eX(d) =θ + d

α− 1for two-parameter Pareto

eX(d) =

{d

α−1 d ≥ θα(θ−d)+dα−1 d ≤ θ

for single-parameter Pareto

If Y L, Y P are loss and payment random variables for franchise deductible of d, and XL, XP are lossand payment random variables for ordinary deductible of d, then

E[Y L] = E[XL] + dS(d)

E[Y P ] = E[XP ] + d

Page 14: ACTS 4304 - University of Texas at Dallasnah103020/STAMI_AU... · Summary of Parametric Distribution Concepts If Xis a member of a scale family with scale parameter with value s,

14

Lesson 13. Loss Elimination Ratio

The Loss Elimination Ratio is defined as the proportion of the expected loss which the insurer doesn’tpay as a result of an ordinary deductible d:

LER(d) =E[X ∧ d]

E[X]= 1− E[(X − d)+]

E[X]

Loss Elimination Ratio for Certain Distributions:

LER(d) = 1− e−d/θ for an exponential

LER(d) = 1−(

θ

d+ θ

)α−1

for a Pareto with α > 1

LER(d) = 1− (θ/d)α−1

αfor a single-parameter Pareto with α > 1, d ≥ θ

Page 15: ACTS 4304 - University of Texas at Dallasnah103020/STAMI_AU... · Summary of Parametric Distribution Concepts If Xis a member of a scale family with scale parameter with value s,

15

Lesson 14. Increased Limits Factors and Increased Deductible Relativities

Increased Limits FactorsLet X be the random variable for severity. Then the increased limit factor for policy limit of U withthe basic policy limit of B is:

ILF (U) =E[X ∧ U ]

E[X ∧B]

In the presence of risk loads, the increased limits factor for policy limit U with base limit B is:

ILF (U) =LAS(U) + Risk Load(U)

LAS(B) + Risk Load(B),

where LAS is the limited average severity.

Deductible RelativitiesThe indicated deductible relativity is the ratio of the payment per loss with a deductible d to thepayment per loss with the basic deductible b:

IDR(d) =E[(X − d)+]

E[(X − b)+]=

E[X]−E[X ∧ d]

E[X]−E[X ∧ b]The following definition of the Loss Elimination Ratio is sometimes used:

LER(d) =E[X ∧ d]−E[X ∧ b]

E[X]−E[X ∧ b]It is the proportion of losses eliminated relative to the basic deductible.

Then the indicated deductible relativity can be expressed through the LER as:

IDR(d) = 1− LER(d)

Page 16: ACTS 4304 - University of Texas at Dallasnah103020/STAMI_AU... · Summary of Parametric Distribution Concepts If Xis a member of a scale family with scale parameter with value s,

16

Lesson 15. Reinsurance

The proportion of losses in layer (a, b) is equal to:

ILFb − ILFaILF∞

If all losses are below U , we can replace ILF∞ with ILFU to obtain:

ILFb − ILFaILFU

=ILFbILFU

− ILFaILFU

Page 17: ACTS 4304 - University of Texas at Dallasnah103020/STAMI_AU... · Summary of Parametric Distribution Concepts If Xis a member of a scale family with scale parameter with value s,

17

Lesson 16. Risk Measures and Tail Weight

Value-at-Risk: V aRp(X) = πp = F−1X (p)

Tail-Value-at-Risk:

TV aRp(X) = E [X|X > V aRp(X)] =

∫∞V aRp(X) xf(x) dx

1− F (V aRp(X))=

=

∫ 1p V aRy(X) dy

1− p= V aRp(X) + eX (V aRp(X)) =

= V aRp(X) +E[X]−E[X ∧ V aRp(X)]

1− pValue-at-Risk and Tail-Value-at-Risk measures for some distributions:

Distribution V aRp(X) TV aRp(X)

Exponential −θ ln(1− p) θ (1− ln(1− p))

Pareto θ(

(1− p)−1α − 1

)E[X]

(1 + α

((1− p)−

1α − 1

))Normal µ+ zpσ µ+ σ

1−p ·e−z2p/2√

Lognormal eµ+zpσ E[X] · Φ(σ−zp)1−p

Page 18: ACTS 4304 - University of Texas at Dallasnah103020/STAMI_AU... · Summary of Parametric Distribution Concepts If Xis a member of a scale family with scale parameter with value s,

18

Lesson 17. Other Topics in Severity Coverage Modifications

Policy limit - the maximum amount that the coverage will pay. In the presence of a deductible orother modifications, perform the other modifications first, then the policy limit.

Maximum coverage loss is the stipulated amount considered in calculating the payment. Applythis limit first, and then the deductible.If u is the maximum coverage loss and d - the deductible, then Y L = X ∧ u−X ∧ dCoinsurance of α is the portion of each loss reimbursed by insurance.In the presence of the three modifications,

E[Y L] = α (E[X ∧ u]−E[X ∧ d])

If r is the inflation factor,

E[Y L] = α(1 + r)

(E

[X ∧ u

1 + r

]−E

[X ∧ d

1 + r

])

Page 19: ACTS 4304 - University of Texas at Dallasnah103020/STAMI_AU... · Summary of Parametric Distribution Concepts If Xis a member of a scale family with scale parameter with value s,

19

Lesson 18. Bonuses

A typical bonus is a portion of the excess of r% of premiums over losses. If c is the portion of theexcess, r is the loss ratio, P is earned premium, and X is losses, then

B = max (0, c(rP −X)) = crP − cmin(rP,X) = crP − c (X ∧ rP )

For a two-parameter Pareto distribution with α = 2 and θ,

E[X ∧ d] =θd

d+ θ

Page 20: ACTS 4304 - University of Texas at Dallasnah103020/STAMI_AU... · Summary of Parametric Distribution Concepts If Xis a member of a scale family with scale parameter with value s,

20

Lesson 19. Discrete Distributions

For a (a, b, 0) class distributions,

pkpk−1

= a+b

k, pk = Pr(X = k)

Poisson Binomial Negative binomial Geometric

pn e−λ·λn

n!

(mn

)qn(1− q)m−n

(n+r−1n

) (1

1+β

)r (β

1+β

)nβn

(1+β)n+1

Mean λ mq rβ β

Variance λ mq(1− q) rβ(1 + β) β(1 + β)

a 0 − q1−q

β1+β

β1+β

b λ (m+ 1) q1−q (r − 1) β

1+β 0

If N is the random variable having the (a, b, 0) distribution, then

E[N ] =a+ b

1− a

V ar(N) =a+ b

(1− a)2

If µ(j) is a factorial moment of N :

µ(j) = E[N(N − 1) . . . (N − j + 1)]

then

µ(j) =(aj + b)

1− aµ(j−1)

For a (a, b, 1) class distributions, p0 is arbitrary and

pkpk−1

= a+b

kfor k = 2, 3, 4, · · ·

Zero-truncated distributions:pTn =

pn1− p0

, n > 0

Zero-modified distributions:

pMn = (1− pM0 )pTnE[N ] = cm

V ar(N) = c(1− c)m2 + cv,

where

• c is 1− pM0• m is the mean of the corresponding zero-truncated distribution• v is the variance of the corresponding zero-truncated distribution

Page 21: ACTS 4304 - University of Texas at Dallasnah103020/STAMI_AU... · Summary of Parametric Distribution Concepts If Xis a member of a scale family with scale parameter with value s,

21

Lesson 20. Poisson/Gamma

Assume that in a portfolio of insureds, loss frequency follows a Poisson distribution with parameterλ, but λ is not fixed but varies by insured.Suppose λ varies according to a gamma distribution over the portfolio of insureds.If the conditional loss frequency of an insured, if you are not given who the insured is, is Poisson withparameter λ, then the unconditional loss frequency for an insured picked at random is a negativebinomial.The parameters of the negative binomial (r, β) are the same as the parameters of the gamma distri-bution (α, θ): r = α, β = θ.

For a gamma distribution with parameters (α, θ), the mean is αθ and the variance is αθ2.For a negative binomial distribution, the mean is rβ and the variance is rβ(1 + β).

If the Poisson parameter for one hour has a gamma distribution with parameters (α, θ), the Poissonparameter for k hours will have a gamma distribution with parameters (α, kθ).

Page 22: ACTS 4304 - University of Texas at Dallasnah103020/STAMI_AU... · Summary of Parametric Distribution Concepts If Xis a member of a scale family with scale parameter with value s,

22

Lesson 21. Frequency Exposure and Coverage Modifications

Let X be the severity, d - a deductible, v - the probability of paying the claim.

Model Original Exposure CoverageParameters Modification Modification

Exposure n1 Exposure n2 Exposure n1

Pr (X > d) = 1 Pr (X > d) = 1 Pr (X > d) = v

Poisson λ (n2/n1)λ vλ

Binomial1 m, q (n2/n1)m, q m, vq

Negative binomial r, β (n2/n1) r, β r, vβ

1Note that (n2/n1)m must be an integer for exposure modification formula to work.

These adjustments work for (a, b, 1) distributions as well as (a, b, 0) distributions. For (a, b, 1) distri-butions, pM0 = 1−

∑∞k=1 pk is adjusted as follows:

1− pM∗0 =

(1− pM0

)(1− p∗01− p0

),

where asterisks indicate distributions with revised parameters.

Page 23: ACTS 4304 - University of Texas at Dallasnah103020/STAMI_AU... · Summary of Parametric Distribution Concepts If Xis a member of a scale family with scale parameter with value s,

23

Lesson 22. Aggregate Loss Models: Compound Variance

For the collective risk model the aggregate losses are defined as:

S =

N∑i=1

Xi,

where N is the number of claims and Xi is the size of each claim.

For the individual risk model the aggregate losses are defined as:

S =n∑i=1

Xi,

where n is the number of insureds in the group and Xi is the aggregate claims of each individualmember.

For the collective risk model, we assume that aggregate losses have a compound distribution, withfrequency being the primary distribution and severity being the secondary distribution.

E[S] = E[N ]E[X]

V ar(S) = E[N ]V ar(X) + V ar(N)E[X]2

For Poisson Primary:V ar(S) = λE[X2]

Page 24: ACTS 4304 - University of Texas at Dallasnah103020/STAMI_AU... · Summary of Parametric Distribution Concepts If Xis a member of a scale family with scale parameter with value s,

24

Lesson 23. Aggregate Loss Models: Approximating Distribution

The aggregate distribution may be approximated with a normal distribution:

FS(s) = Pr(S ≤ s) = Pr

(S −E[S]

σS≤ s−E[S]

σS

)≈ Φ

(s−E[S]

σS

)If severity is discrete, then the aggregate loss distribution is discrete, and a continuity correction isrequired: if X assumes values a and b, but no value in between, all of the following statements areequivalent:

X > a, X ≥ b, X > c for any c ∈ (a, b)

X ≤ a, X < b, X < c for any c ∈ (a, b)

To evaluate probabilities, assume:

Pr(X > a) = Pr(X ≥ b) = Pr(X >a+ b

2)

Pr(X ≤ a) = Pr(X < b) = Pr(X <a+ b

2)

If severity has a continuous distribution, no continuity correction is made.

Page 25: ACTS 4304 - University of Texas at Dallasnah103020/STAMI_AU... · Summary of Parametric Distribution Concepts If Xis a member of a scale family with scale parameter with value s,

25

Lesson 24. Aggregate Loss Models: Severity Modifications

Page 26: ACTS 4304 - University of Texas at Dallasnah103020/STAMI_AU... · Summary of Parametric Distribution Concepts If Xis a member of a scale family with scale parameter with value s,

26

Lesson 25: Discrete Aggregate Loss Models: The Recursive Formula

Let

pn = Pr(N = n) = fN (n)

fn = Pr(X = n) = fX(n)

gn = Pr(S = n) = fS(n)

Then FS(x) =∑

n≤x gn and

gn =

∞∑n=0

pk∑

i1+i2+···+ik=n

fi1fi2 . . . fik , where

k∏m=1

fim = f∗k is the k- fold convolution of the f’s

If N belongs to the (a, b, 0) class, gn can be calculated recursively:

g0 = PN (f0), where PN (z) is the probability generating function for the primary distribution

gk =1

1− af0

k∑j=1

(a+

bj

k

)fjgk−j , k = 1, 2, 3, . . .

In particular, for a Poisson distribution, where a = 0, b = λ,

gk =λ

k

k∑j=1

jfjgk−j , k = 1, 2, 3, . . .

If N belongs to the (a, b, 1) class, gn can be calculated recursively as well:

g0 = PN (f0), where PN (z) is the probability generating function for the primary distribution

gk =1

1− af0· (p1 − (a+ b)p0) +

1

1− af0

k∑j=1

(a+

bj

k

)fjgk−j , k = 1, 2, 3, . . .

Page 27: ACTS 4304 - University of Texas at Dallasnah103020/STAMI_AU... · Summary of Parametric Distribution Concepts If Xis a member of a scale family with scale parameter with value s,

27

Lesson 26. Aggregate Losses: Aggregate Deductible

E[(S − d)+] = E[S]−E[S ∧ d]

pn = Pr(N = n) = fN (n)

fn = Pr(X = n) = fX(n)

gn = Pr(S = n) = fS(n)

FS(x) =∑n≤x

gn

Determine SS(x) = 1− FS(x) and apply

E[S ∧ d] =

∫ d

0SS(x) dx

Page 28: ACTS 4304 - University of Texas at Dallasnah103020/STAMI_AU... · Summary of Parametric Distribution Concepts If Xis a member of a scale family with scale parameter with value s,

28

Lesson 27. Aggregate Losses: Miscellaneous Topics

Coverage Modifications

If there is a per-policy deductible, the expected annual aggregate payment is either

E[S] = E[N ] ·E[(X − d)+]

orE[S] = E[NP ] · e(d),

where NP is expected number of payments per year and e(d) is the expected payment per payment.

Exact Calculation of Aggregate Loss Distribution

The distribution function of aggregate losses at x is the sum over n of the probabilities that the claimcount equals n and the sum of n loss sizes is less than or equal to x.

(1) Normal Distribution of Severities. If n random variables Xi are independent and normally dis-tributed with parameters µ and σ2, their sum is normally distributed with parameters nµ andnσ2.

(2) Exponential and Gamma (Erlang) Distribution of Severities. The sum of n exponential randomvariables with common mean θ is a gamma distribution with parameters α = n and θ. For aninteger α the gamma distribution is also called an Erlang distribution. The probability that nevents occur before time x is FS|N=n, where S|N = n is Erlang(n, θ) and

FS|N=n(x) = 1−n−1∑j=0

e−x/θ(x/θ)j

j!

If S is compound model with exponential severities,

FS(x) =∞∑n=0

pnFS|N=n(x)

(3) Negative Binomial/Exponential Compound Models. A compound model with negative binomialfrequency with parameters r, integer, and β, and exponential severities with parameter θ isequivalent to a compound model with binomial frequency with parameters m = r and q =β/(1 + β) and exponential severities with parameter θ(1 + β).

(4) Compound Poisson Models. Suppose Sj are a set of compound Poisson models with Poissonparameters λj and severity random variables Xj . Then the sum S =

∑nj=1 Sj is a compound

Poisson model with Poisson parameter λ =∑n

j=1 λj and severity having a weighted average, or

a mixture, distribution of the individual severities Xj . The weights are λj/λ.

Discretizing

The recursive method for calculating the aggregate distribution as well as the direct convolutionmethod require a discrete severity distribution. Usually the severity distribution is continuous. Thus,discretization is needed.

(1) Method of rounding. If h is the span,

fkh = F ((k + 0.5− 0)h)− F ((k − 0.5− 0)h)

(2) Method of local moment matching. For the interval xk = x0 + kh and masses mk0 and mk

1, solvethe following system: {

mk0 +mk

1 = F ((k + 1)h)− F (kh)

xkmk0 + xk+1m

k1 =

∫ (k+1)hkh xf(x) dx

Thenfkh = mk

0 +mk−11

Page 29: ACTS 4304 - University of Texas at Dallasnah103020/STAMI_AU... · Summary of Parametric Distribution Concepts If Xis a member of a scale family with scale parameter with value s,

29

Lesson 29. Maximum Likelihood Estimators

Likelihood formulas

Distribution Estimators

Discrete distribution, individual data px

Continuous distribution, individual data f(x)

Grouped data F (cj)− F (cj−1)

Individual data censored from above at u 1− F (u) for censored observations

Individual data censored from below at d F (d) for censored observations

Individual data truncated from above at u f(x)F (u)

Individual data truncated from below at d f(x)1−F (d)

Page 30: ACTS 4304 - University of Texas at Dallasnah103020/STAMI_AU... · Summary of Parametric Distribution Concepts If Xis a member of a scale family with scale parameter with value s,

30

Lesson 30. Maximum Likelihood Estimators: Special Techniques

Summary of maximum likelihood formulasIn this table, n is the number of uncensored observations, c is the number of censored observations,di is the truncation point for each observation (0 is untruncated), xi is the observation if uncensoredor the censoring point if censored. The last column (CT?) indicates whether the estimator may beused for right-censored or left-truncated data.

Distribution Formula CT?

Exponential θ = 1n

∑n+ci=1 (xi − di) Yes

Lognormal µ = 1n

∑ni=1 lnxi, σ =

√1n

∑ni=1(lnxi)2 − µ2 No

Inverse Exponential θ = n∑ni=1 1/xi

No

Weibull, fixed τ θ = τ

√1n

∑n+ci=1 (xτi − dτi ) Yes

Uniform [0, θ] individual data θ = maxxi No

Uniform [0, θ] grouped data θ = cj (n/nj) No

cj = Upper bound of highest finite interval

nj=number of observations below cj

Two-parameter Pareto, fixed θ α = −n/K, K =∑n+c

i=1 ln(θ + di)−∑n+c

i=1 ln(θ + xi) Yes

Single-parameter Pareto, fixed θ α = −n/K, K =∑n+c

i=1 ln(max(θ, di))−∑n+c

i=1 lnxi Yes

Beta, fixed θ, b = 1 a = −n/K, K =∑n

i=1 lnxi − n ln θ No

Beta, fixed θ, a = 1 b = −n/K, K =∑n

i=1 ln(θ − xi)− n ln θ No

Common likelihood functions, and their resulting estimates

When the likelihood function is · · · Then the MLE is · · ·

L(θ) = θ−ae−b/θ θ = ba

L(θ) = θae−bθ θ = ab

L(θ) = θa(1− θ)b θ = aa+b

Page 31: ACTS 4304 - University of Texas at Dallasnah103020/STAMI_AU... · Summary of Parametric Distribution Concepts If Xis a member of a scale family with scale parameter with value s,

31

Lesson 31. Variance of Maximum Likelihood Estimators

Fisher information matrix and Cramer-Rao asymptotic covariance matrix

The information matrix for a single parameter θ is:

I(θ) = −EX

[d2l

dθ2

]= EX

[(dl

)2]

The asymptotic variance is the inverse of the information matrix.

Asymptotic variance of MLE’s for common distributions

Let n be the sample size and Var means asymptotic variance.

Distribution Formula

Exponential V ar(θ) = θ2

n

Uniform [0, θ] V ar(θ) = nθ2

(n+1)2(n+2)

Weibull, fixed τ V ar(θ) = θ2

nτ2

Pareto, fixed θ V ar(α) = α2

n

Pareto, fixed α V ar(θ) = (α+2)θ2

Lognormal V ar(µ) = σ2

n , Cov(µ, σ) = 0

V ar(σ) = σ2

2n

Delta Method

The delta method is a method of estimating the variance of a function of a random variable from thevariance of the random variable.

1. Delta Method Formula - One Variable

V ar (g(X)) ≈ V ar(X)

(dg

dx

)2

2. Delta Method Formula - Two Variables

V ar (g(X)) ≈ V ar(X)

(∂g

∂x

)2

+ 2Cov(X,Y )∂g

∂x

∂g

∂y+ V ar(Y )

(∂g

∂y

)2

3. Delta Method Formula - General

V ar (g(X)) ≈ (∂g)′Σ(∂g), ∂g =

(∂g

∂x1,∂g

∂x2, · · · , ∂g

∂xn

)′

Page 32: ACTS 4304 - University of Texas at Dallasnah103020/STAMI_AU... · Summary of Parametric Distribution Concepts If Xis a member of a scale family with scale parameter with value s,

32

Lesson 32. Fitting Discrete Distributions

1. For a Poisson with complete data, the method of moments and maximum likelihood estimators ofλ are both x.

2. For a negative binomial with complete data:a. The method of moments estimators are

β =σ2 − xx

r =x2

σ2 − xb. Maximum likelihood sets rβ = x. If one of them is known, the other one is set equal to x

divided by the known one.3. For a binomial with complete data, method of moments may not set m equal to an integer.

Maximum likelihood proceeds by calculating a likelihood profile for each m ≥ xi. The maximumlikelihood estimate of q given m is x/m. When the maximum likelihood for m+ 1 is less than theone for m, the maximum overall is attained at m.

4. For modified (a, b, 1) distributions, pM0 = n0/n and the mean is set equal to the sample mean.5. Fitting λ of a zero-modified Poisson requires numerical techniques.6. Fitting q for a zero-modified binomial for fixed m requires solving a high-degree polynomial unlessm ≤ 3.

7. Fitting β for a zero-modified negative binomial for fixed r requires numerical techniques exceptfor special values of r, like 1.

8. If you are given a table with varying exposures and claims, and individual claims have a Poissondistribution with the same λ, the maximum likelihood estimate of λ is the sample mean, or thesum of all claims over the sum of all exposures.

9. to choose between (a, b, 0) distributions to fit to data, two methods are available:a. Compare the sample variance σ2 to the sample mean x. Choose binomial if it is less, Poisson

if equal, and negative binomial if greater.b. Calculate knk/nk−1, and observe the slope as a function of k. Choose binomial if negative,

Poisson if zero, and negative binomial if positive.

Page 33: ACTS 4304 - University of Texas at Dallasnah103020/STAMI_AU... · Summary of Parametric Distribution Concepts If Xis a member of a scale family with scale parameter with value s,

33

Lesson 33. Hypothesis Tests: Graphic Comparison

These plots are constructed to assess how well the model fits the data.

1. D(x) plots

Let fn be the empirical probability density function and Fn be the empirical distributionfunction. Then for a sample x1 ≤ x2 ≤ · · · ≤ xn:

Fn(x) =number of xj ≤ x

nand Fn(xj) =

j

nLet F ∗ be the fitted distribution function:

F ∗(x) =F (x)− F (d)

1− F (d)

if observed data are left-truncated at d. Note F ∗(x) = F (x) for untruncated data.Then the D(x) plot is the graph of the function

D(x) = Fn(x)− F ∗(x)

2. p− p plots

Let Fn be the empirical distribution function: for a sample x1 ≤ x2 ≤ · · · ≤ xn

Fn(xj) =j

n+ 1

Then the p− p plot is the graph linearly connecting points

(Fn(xj), F∗(xj))

Note the difference in the definition of the Fn(xj) in a D(x) plot and a p− p plot.

Page 34: ACTS 4304 - University of Texas at Dallasnah103020/STAMI_AU... · Summary of Parametric Distribution Concepts If Xis a member of a scale family with scale parameter with value s,

34

Lesson 34. Hypothesis Tests: Kolmogorov-Smirnov

Let F ∗ be the fitted distribution function:

F ∗(x) =F (x)− F (d)

1− F (d)

if observed data are left-truncated at d.

F ∗(x) =F (x)

F (u)

if observed data are right-truncated at u.Note F ∗(x) = F (x) for untruncated data.

For a sample x1 ≤ x2 ≤ · · · ≤ xn the Kolmogorov-Smirnov statistic D is defined as: D = maxj Dj ,where

Dj = max

(∣∣∣∣F ∗(xj)− j

n

∣∣∣∣ , ∣∣∣∣F ∗(xj)− j − 1

n

∣∣∣∣) , if xj 6= xj+1 and

Dj = max

(∣∣∣∣F ∗(xj)− j − 1

n

∣∣∣∣ , ∣∣∣∣F ∗(xj)− j + 1

n

∣∣∣∣) , if xj = xj+1

Page 35: ACTS 4304 - University of Texas at Dallasnah103020/STAMI_AU... · Summary of Parametric Distribution Concepts If Xis a member of a scale family with scale parameter with value s,

35

Lesson 35. Hypothesis Tests: Chi-square

Chi-square StatisticSuppose the data is divided into k groups, n be the total number of observations. Let pj be theprobability that X is in the jth group under the hypothesis, Oj be the number of observations ingroup j and let Ej = npj be the expected number of observations in group j. Then the chi-squarestatistic is:

Q =k∑j=1

(Oj − Ej)2

Ej=

k∑j=1

O2j

Ej− n

Degrees of freedomIf a distribution with parameters is given, or is fitted by a formal approach like maximum likelihoodbut using a different set of data, then there are k − 1 degrees of freedom.

If the r parameters are fitted from the data, then there are k − 1− r degrees of freedom.

ApproximationThe chi-square test assumes that the number of observations in each group is approximately normallydistributed. To make this approximation work, each group should have at least 5 expected (not actual)observations.

DistributionThe chi-square statistic is a sum of the squares of independent standard normal random variables.

A chi-square random variable has a gamma distribution with parameters θ = 2 and α = d/2, whered is the number of degrees of freedom. If d = 2, then it is exponential.

If exposures and claims are given for several periods and each period is assumed to be independent,the chi-square statistic is:

Q =k∑j=1

(Oj − Ej)2

Vj

where Ej is the fitted expected number and Vj is the fitted variance of observations in group j. Thenumber of degrees of freedom in this case is k − r, where r is the number of parameters are fittedfrom the data.

Page 36: ACTS 4304 - University of Texas at Dallasnah103020/STAMI_AU... · Summary of Parametric Distribution Concepts If Xis a member of a scale family with scale parameter with value s,

36

Comparison of the three methods of testing goodness of fit

Kolmogorov-Smirnov Anderson-Darling Chi-square

Should be used only for individ-ual data

Should be used only for individ-ual data

May be used only for individualor grouped data

Only for continuous fits For continuous or discrete fits

Should lower critical value ifu <∞

Should lower critical value ifu <∞

No adjustment of critical valueis needed for u <∞

Critical value should be loweredif parameters are fitted

Critical value should be loweredif parameters are fitted

Critical value is automaticallyadjusted if parameters are fitted

Critical value declines withlarger sample size

Critical value independent ofsample size

Critical value independent ofsample size

No discretion No discretion Discretion in grouping of data

Uniform weight on all parts ofdistribution

Higher weight on tails of distri-bution

Higher weight on intervals withlow fitted probability

Page 37: ACTS 4304 - University of Texas at Dallasnah103020/STAMI_AU... · Summary of Parametric Distribution Concepts If Xis a member of a scale family with scale parameter with value s,

37

Lesson 36. Likelihood Ratio Test and Algorithm, Penalized Loglikelihood Tests

There are two types of methods for selecting a model: judgment-based and score-based. The highestvalue of likelihood function at the maximum or the likelihood ratio method is one of the score-basedmethods.

A free parameter is one that is not specified, and that is therefore maximized using maximum likeli-hood. The number of free parameters to be estimated is denoted by r.

The number of degrees of freedom for the likelihood ratio test is the number of free parameters in thealtermative model, the model of the alternative hypothesis, minus the number of free parameters inthe base model, the model of the null hypothesis.

The Likelihood Ratio Test (LRT) accepts the alternative model if the loglikelihood of it exceeds theloglikelihood of the base model by one-half of the appropriate chi-square percentile (1 minus thesignificance level of the test) at the number of degrees of freedom for the test: the alternative modelis accepted if

2 (lnL1 − lnL0) > c, where Pr (X > c) = α

for X a chi-square random variable with the number of degrees of freedom for the test.

For every number of parameters, the model with the highest loglikelihood is selected.

The Schwarz Bayesian Criterion (SBC)/Bayesian Information Criterion (BIC) subtractsr

2lnn

from the loglikelihood of the model (which is always negative).

Then the score is:lnL− r

2lnn

The model with the highest score is selected.

The Akaike Information Criterion (AIC) subtracts r from the loglikelihood of the model. Then thescore is:

lnL− rThe model with the highest loglikelihood is selected.

Page 38: ACTS 4304 - University of Texas at Dallasnah103020/STAMI_AU... · Summary of Parametric Distribution Concepts If Xis a member of a scale family with scale parameter with value s,

38

Lesson 38. Limited Fluctuation or Classical Credibility: Poisson Frequency

Let

� eF be the exposure needed for full credibility� µ and σ be the expected aggregate claims and the standard deviation per exposure� yp be the coefficient from the standard normal distribution for the desired confidence interval,yp = Φ−1 ((1 + p)/2)� k be the maximum accepted fluctuation

Then

eF = n0 · CV 2, where n0 =(ypk

)2,

CV =σ

µ− is the coefficient of variation for the aggregate distribution

If the claim frequency is Poisson with mean λ and µS , σS and CVS are the mean, standard deviationand the coefficient of variation of claim severity, then the credibility formulas could be summarizedin the following table:

Experience expressed in Number of claims Claim size (severity) Aggregate losses/Pure premium

Exposure units n0λ n0

(CV 2

)n0

(1+CV 2

)Number of claims n0 n0CV

2S n0(1 + CV 2

S )

Aggregate losses n0µS n0µSCV2S n0µS(1 + CV 2

S )

The horizontal axis of the table fills in the ∗ in the statement ”You want ∗ to be within k of expectedP of the time”. The vertical axis of the table fills in the ∗ in the statement ”How many ∗ are neededfor full credibility? ”.

Also, note that1 + CV 2

S

λ=µ2S + σ2

S

λµ2S

=σ2

µ2

Page 39: ACTS 4304 - University of Texas at Dallasnah103020/STAMI_AU... · Summary of Parametric Distribution Concepts If Xis a member of a scale family with scale parameter with value s,

39

Lesson 39. Limited Fluctuation or Classical Credibility: Non-Poisson Frequency

Using the same notation as in the previous lesson with additional notation of µf , σf and CVf are themean, standard deviation and the coefficient of variation of claim frequency, the credibility formulascould be summarized in the following table:

Experience expressed in Number of claims Claim size (severity) Aggregate losses/Pure premium

Exposure units n0

(σ2f

µ2f

)n0

(σ2s

µ2sµf

)n0

(σ2f

µ2f+ σ2

sµ2sµf

)Number of claims n0

(σ2f

µf

)n0

(σ2sµ2s

)n0

(σ2f

µf+ σ2

sµ2s

)Aggregate losses n0µs

(σ2f

µf

)n0

(σ2sµs

)n0µs

(σ2f

µf+ σ2

sµ2s

)

Page 40: ACTS 4304 - University of Texas at Dallasnah103020/STAMI_AU... · Summary of Parametric Distribution Concepts If Xis a member of a scale family with scale parameter with value s,

40

Lesson 40. Limited Fluctuation or Classical Credibility: Partial Credibility

Let

• Z be the credibility factor• M be the manual premium or the prior estimate of total loss (pure premium)• X be the observed total loss (pure premium)

Then the credibility premium PC :

PC = ZX + (1− Z)M = M + Z(X −M)

For n expected claims and nF expected claims needed for full credibility,

Z =

√n

nF

Page 41: ACTS 4304 - University of Texas at Dallasnah103020/STAMI_AU... · Summary of Parametric Distribution Concepts If Xis a member of a scale family with scale parameter with value s,

41

Lesson 41. Bayesian Methods: Discrete Prior

Bayes’ Theorem:

Pr(A|B) =Pr(B|A)Pr(A)

Pr(B)where the left side P is the posterior probability, B is the observations, and A is the prior distribution.

We answer two questions:

1. What is the probability that this risk belongs to some class?2. What is the expected size of the next loss for this risk?

We’ll construct a 4-line table to solve the first type of problem, with 2 additional lines for solving thesecond type of problem. The table has one column for each type of risk.

1. Prior probability that the risk is in each class.2. The likelihood of the experience given the class.3. The probability of being in the class and having the observed experience, or the joint probability.

Product of the first two rows. Sum up the entries in the third row. Each entry of the thirdrow is a numerator in the expression for the posterior probability of being in the class given theexperience given by Bayes’ Theorem, while the sum is the denominator in this expression.

4. Posterior probability of being in each class given the experience. Quotient of the third row overits sum.

5. Expected value, given that the risk is in the class. Also known as hypothetical means6. Expected size of the next loss for this risk, given the experience. Also known as the Bayesian

premium. Product of the 4th and 5th rows. Sum up the entries of the 6th row.

Page 42: ACTS 4304 - University of Texas at Dallasnah103020/STAMI_AU... · Summary of Parametric Distribution Concepts If Xis a member of a scale family with scale parameter with value s,

42

Lesson 42. Bayesian Methods: Continuous Prior

If the prior distribution is continuous, Bayes’ Theorem becomes

π(θ|x1, x2, . . . , xn) =π(θ)f(x1, x2, . . . , xn|θ)

f(x1, x2, . . . , xn)=

π(θ)f(x1, x2, . . . , xn|θ)∫π(θ)f(x1, x2, . . . , xn|θ) dθ

Here

f(x1, x2, . . . , xn|θ) =n∏i=1

f(xi|θ)

f(xn+1|x1, x2, . . . , xn) =

∫f(xn+1|θ)π(θ|x1, x2, . . . , xn) dθ

Page 43: ACTS 4304 - University of Texas at Dallasnah103020/STAMI_AU... · Summary of Parametric Distribution Concepts If Xis a member of a scale family with scale parameter with value s,

43

Lesson 43. Bayesian Credibility: Poisson/Gamma

Suppose claim frequency is Poisson, with parameter λ varying by insured according to a gammadistribution with parameters α and θ: (N |λ) ∈ Γ(α, θ). Let γ = 1/θ. Suppose there are n exposuresand x claims. Then the posterior distribuion of λ is a gamma distribution with parameters α∗ = α+xand γ∗ = γ + n, θ∗ = 1/γ∗: (λ|N) ∈ Γ(α∗, θ∗).

The posterior mean is

PC =α∗γ∗

=α+ nx

γ + n=

γ

γ + n

α

γ+

n

γ + nx

where Z = n/(γ + n) is the credibility factor.

Page 44: ACTS 4304 - University of Texas at Dallasnah103020/STAMI_AU... · Summary of Parametric Distribution Concepts If Xis a member of a scale family with scale parameter with value s,

44

Lesson 44. Bayesian Credibility: Normal/Normal

The normal distribution as a prior distribution is the conjugate prior of a model having the normaldistribution with a fixed variance.

Suppose that the model has a normal distribution with mean θ and fixed variance ν. The priorhypothesis is that θ has a normal distribution with mean µ and variance a. Then the posteriordistribution is also normal with mean µ∗ and variance a∗, where

µ∗ =vµ+ nax

ν + naand a∗ =

na

ν + na

n is the number of claims or person-years, x is the sample mean.If Z = na/ν + na is the credibility factor, then

µ∗ = Zx+ (1− Z)µ =

(na

ν + na

)x+

ν + na

The predictive distribution is also normal with mean µ∗ and variance ν + a∗

Page 45: ACTS 4304 - University of Texas at Dallasnah103020/STAMI_AU... · Summary of Parametric Distribution Concepts If Xis a member of a scale family with scale parameter with value s,

45

Lesson 45. Bayesian Credibility: Bernoulli/Beta

1. Bernoulli/Beta If the prior distribution is a beta with parameters a and b, and you observen Bernoulli trials with k 1’s (successes), then the posterior distribution is beta with parametersa∗ = a+ k and b∗ = b+ n− k. The posterior mean is

E[θ|x] =a∗

a∗ + b∗

If Z = n/(n+ a+ b) is the credibility factor, then

E[θ|x] = Zk + (1− Z)a

a+ b=

(k

n

)(n

n+ a+ b

)+

(a

a+ b

)(a+ b

n+ a+ b

)The predictive distribution for the next claim is also Bernoulli with mean q = a∗/(a∗ + b∗)

2. Negative Binomial/BetaIf the model has a negative binomial distribution with

fx|p(x|p) =

(r + x− 1

x

)pr(1− p)x, x = 0, 1, 2, . . . , p = 1/(1 + β)

and the distribution of p is beta with parameters a, b and θ = 1, then if you have n observationsx1, . . . , xn with mean x, the posterior distribution is beta with parameters a∗ = a + nr andb∗ = b+ nx.

The predictive mean:

E[θ|x] =rb∗

a∗ − 1If Z = nr/(nr + a− 1) is the credibility factor, then

E[θ|x] = Zx+ (1− Z)rb

a− 1= x

(nr

nr + a− 1

)+

(a− 1

nr + a− 1

)rb

a− 1

Page 46: ACTS 4304 - University of Texas at Dallasnah103020/STAMI_AU... · Summary of Parametric Distribution Concepts If Xis a member of a scale family with scale parameter with value s,

46

Lesson 46. Bayesian Credibility: Exponential/Inverse Gamma

1. Assume that claim size has an exponential distribution with mean θ: (X|θ) ∈ Exp(θ):

f(X|θ) =1

θe−x/θ

Assume that θ varies by insured according to an inverse gamma distribution with parameters αand β:

π(θ) =βα

Γ(α)

e−β/θ

θα+1

If n claims x1, . . . , xn are observed, the parameters of the posterior inverse gamma distributionare α∗ = α+ n and β∗ = β + nx.

The predictive mean:

E[θ|x] =β∗

α∗ − 1If Z = n/(n+ α− 1) is the credibility factor, then

E[θ|x] = Zx+ (1− Z)µ = x

(n

n+ α− 1

)+

(α− 1

n+ α− 1

α− 1

The predictive distribution is a two-parameter Pareto with the same parameters α, β.

2. If the claim size has an exponential distribution with mean ∆: (X|∆) ∈ Exp(1/∆):

f(X|∆) = ∆e−x∆

Assume that ∆ varies by insured according to a gamma distribution with parameters α and β,then θ = 1/∆ follows inverse gamma distribution with parameters α and 1/β. The posterior forθ is inverse gamma distribution with α∗ = α + n and β∗ = 1/β + nx, and the posterior for ∆ isgamma with (α∗, 1/β∗).

3. Assume that claim size has a gamma distribution with parameters η and θ: (X|θ) ∈ Gamma(η, θ):

f(X|θ) =1

Γ(η)θηxη−1e−x/θ

Assume that θ varies by insured according to an inverse gamma distribution with parameters αand β:

π(θ) =βα

Γ(α)

e−β/θ

θα+1

If n claims x1, . . . , xn are observed, the parameters of the posterior inverse gamma distributionare α∗ = α+ ηn and β∗ = β + nx.

Page 47: ACTS 4304 - University of Texas at Dallasnah103020/STAMI_AU... · Summary of Parametric Distribution Concepts If Xis a member of a scale family with scale parameter with value s,

47

Lesson 47. Buhlmann Credibility: Basics

The Buhlmann method is a linear approximation of the Bayesian method.

• µ, or EHM, is the expected value of the hypothetical mean, or the overall mean: µ = E[E[X|θ]].• a, or VHM, is the variance of the hypothetical mean a = V ar (E[X|θ]).• ν, or EPV, is the expected value of the process variance: ν = E[V ar (X|θ)].• V ar(X) = a+ ν• Buhlmann’s k: k = ν/a• Buhlmann’s credibility factor Z = n/(n + k), where n is the number of observations: the

number of periods when studying frequency or aggregate losses, the number of claims whenstudying severity.• PC is the Buhlmann’s credibility expectation:

PC = Zx+ (1− Z)µ = µ+ Z(x− µ)

Page 48: ACTS 4304 - University of Texas at Dallasnah103020/STAMI_AU... · Summary of Parametric Distribution Concepts If Xis a member of a scale family with scale parameter with value s,

48

Lesson 48. Buhlmann Credibility: Discrete Prior

The Bayesian method calculates the true expected value. The Buhlmann method is only an approx-imation.

Page 49: ACTS 4304 - University of Texas at Dallasnah103020/STAMI_AU... · Summary of Parametric Distribution Concepts If Xis a member of a scale family with scale parameter with value s,

49

Lesson 49. Buhlmann Credibility: Continuous Prior

Buhlmann credibility with a continuous prior is no different in principle from Buhlmann credibilitywith a discrete prior. The task is to identify the hypothetical mean and process variance, then tocalculate the mean and variance of the former (µ and a) and the mean of the latter (ν). From there,one can calculate k, Z, and the credibility premium. However, since the prior is continuous, themeans and variances of the hypothetical mean and process variance may require integration ratherthan summation.

Page 50: ACTS 4304 - University of Texas at Dallasnah103020/STAMI_AU... · Summary of Parametric Distribution Concepts If Xis a member of a scale family with scale parameter with value s,

50

Lesson 50. Buhlmann-Straub Credibility

Generalizations of Buhlmann credibility. The Buhlmann credibility model assumes one exposure inevery period.

Buhlmann-Straub: there are mj exposures in period j.

Hewitt model: extension of the Buhlmann-Straub model.

Page 51: ACTS 4304 - University of Texas at Dallasnah103020/STAMI_AU... · Summary of Parametric Distribution Concepts If Xis a member of a scale family with scale parameter with value s,

51

Lesson 51. Exact Credibility

Priors, posteriors, predictives, and Buhlmann ν, a, and k for linear exponential model/conjugate priorpairs

Model Prior Posterior Predictive Buhlmann ν Buhlmann a Buhlmann k

Poisson(λ) Gamma Gamma Negative Bi-nomial

α α∗ = α+ nx r = α∗ αθ αθ2 1/θ

γ = 1/θ γ∗ = γ + n β = 1/γ∗

Bernoulli(q) Beta Beta Bernoulli

a a∗ = a+ nx q = a∗a∗+b∗

ab(a+b)(a+b+1)

ab(a+b)2(a+b+1)

a+ b

b b∗ = b+n(1− x)

Normal(θ, ν) Normal Normal Normal

µ µ∗ = νµ+nax µ = µ∗ ν a ν/a

a a∗ = aνna+ν σ2 = a∗ + ν

Exponential(θ) Inversegamma

Inverse gamma Pareto

α α∗ = α+ n α = α∗θ2

(α−1)(α−2)θ2

(α−1)2(α−2)α− 1

θ θ∗ = θ+ θ = θ∗

Page 52: ACTS 4304 - University of Texas at Dallasnah103020/STAMI_AU... · Summary of Parametric Distribution Concepts If Xis a member of a scale family with scale parameter with value s,

52

Lesson 52. Buhlmann as Least Squares Estimate of Bayes

Let Xi are the observations, Yi are the Bayesian predictions and Yi are the Buhlmann predictions.Suppose we’d like to estimate Yi by Yi which is a linear function of Xi: Yi = α + βXi and we’d liketo select α and β in such a way as to minimize the weighted least square difference:∑

pi(Yi − Yi)2

If pi = Pr(Xi, Yi), then

β =Cov(X,Y )

V ar(X)

α = E[Y ]− βE[X]

Moreover,

E[Y ] = E[X]

V ar(X) =∑

piX2i −E[X]2

Cov(X,Y ) =∑

piXiYi −E[X]E[Y ]

Also,

Cov(Xi, Xj) = V ar(µ(Θ)) = a

V ar(Xi) = E[ν(Θ)] + V ar(µ(Θ)) = ν + a

Page 53: ACTS 4304 - University of Texas at Dallasnah103020/STAMI_AU... · Summary of Parametric Distribution Concepts If Xis a member of a scale family with scale parameter with value s,

53

Lesson 53. Empirical Bayes Non-Parametric Methods

Suppose there are r policyholder groups, and each one is followed for ni years, where ni may vary bygroup, i = 1, 2, . . . , r. Experience is provided by year.

Uniform Exposures Non-Uniform Exposures

µ x x =∑imiXi∑imi

ν 1r(n−1)

∑ri=1

∑nj=1(xij − xi)2

∑ri=1

∑nij=1mij(xij−xi)

2∑ri=1(ni−1)

a 1r−1

∑ri=1(xi − x)2 − ν

n

(m−

∑ri=1m

2i

m

)−1 (∑ri=1mi(xi − x)2 − ν(r − 1)

)Z n

n+kmi

mi+k

P iC (1− Z)µ+ ZXi (1− Zi)X + ZiXi

Page 54: ACTS 4304 - University of Texas at Dallasnah103020/STAMI_AU... · Summary of Parametric Distribution Concepts If Xis a member of a scale family with scale parameter with value s,

54

Lesson 54. Empirical Bayes Semi-Parametric Methods

I Poisson Models

µ = ν = x; a = s2 − ν

n; s2 =

∑i

(Xi − X)2

r − 1

a estimated using empirical Bayes semi-parametric methods may be non-positive. In this casethe method fails, and no credibility is given. For non-uniform exposures, use formulae fromLesson 53 to estimate the values of x and a.

II Non-Poisson Models

If the model is not Poisson, but there is a linear relationship between µ and ν, use the sametechnique as for a Poisson model. For example:a) Negative binomial with fixed β:

E[N |r] = rβ, V ar[N |r] = rβ(1 + β)⇒ µ = x, ν = x(1 + β)

b) Gamma with fixed θ:

E[N |α] = αθ, V ar[N |α] = αθ2 ⇒ µ = x, ν = xθ

III Which Buhlmann method should be used

The following six Buhlmann methods have been discussed in preparation for this Exam:

1. Buhlmann2. Buhlmann-Straub3. Empirical Bayes non-parametric with uniform exposures4. Empirical Bayes non-parametric with non-uniform exposures5. Empirical Bayes semi-parametric methods with uniform exposures6. Empirical Bayes semi-parametric methods with non-uniform exposures

The first two methods can only be used if you have a model specifying risk classes with meansand variances.

The second two methods must be used, if all you have is data.

The last two methods should be used, if, in addition to data, a hypothesis that each exposurehas a Poisson (or some other) distribution.