3. Conjugate families of distributions

Click here to load reader

  • date post

    30-Dec-2016
  • Category

    Documents

  • view

    214
  • download

    0

Embed Size (px)

Transcript of 3. Conjugate families of distributions

  • 3. Conjugate families of distributions

    Objective

    One problem in the implementation of Bayesian approaches is analyticaltractability. For a likelihood function l(|x) and prior distribution p(), inorder to calculate the posterior distribution, it is necessary to evaluate

    f(x) =

    l(|x)p() d

    and to make predictions, we must evaluate

    f(y|x) =

    f(y|)p(|x) d.

    General approaches for evaluating such integrals numerically are studied inchapter 6, but in this chapter, we will study the conditions under which suchintegrals may be evaluated analytically.

  • Recommended reading

    Wikipedia entry on conjugate priors.

    http://en.wikipedia.org/wiki/Conjugate_prior

    Bernardo, J.M. and Smith, A.F.M. (1994). Bayesian Theory, Section 5.2.

  • Coin tossing problems and beta priors

    It is possible to generalize what we have seen in Chapter 2 to other coin tossingsituations.

    Suppose that we have defined a beta prior distribution B(, ) for = P (head).

    Then if we observe a sample of coin toss data, whether the sampling mechanismis binomial, negative-binomial or geometric, the likelihood function always takesthe form

    l(|x) = ch(1 )t

    where c is some constant that depends on the sampling distribution and h andt are the observed numbers of heads and tails respectively.

  • Applying Bayes theorem, the posterior distribution is

    p(|x) ch(1 )t 1B(, )

    1(1 )1

    +h1(1 )+t1

    which implies that |x B( + h, + t).

    Thus, using a beta prior, guarantees that the posterior distribution is also beta.In this case, we say that the class of beta prior distributions is conjugate tothe class of binomial (or geometric or negative binomial) likelihood functions.

  • Interpretation of the beta parameters

    The posterior distribution is B( + h, + t) so that () in the prior playsthe role of the number of heads (tails) observed in the experiment. Theinformation represented by the prior distribution can be viewed as equivalentto the information contained in an experiment where we observe heads and tails.

    Furthermore, the posterior mean is given by ( + h)/( + + h + t) whichcan be expressed in mixture form as

    E[|x] = w +

    + (1 w) hh + t

    = wE[] + (1 w)

    where w = +++h+t is the relative weight of the number of equivalent cointosses in the prior.

  • Limiting results

    Consider also what happens as we let , 0. We can interpret this asrepresenting a prior which contains information equivalent to zero coin tosses.

    In this case, the posterior distribution tends to B(h, t) and, for example, theposterior mean E[|x] , the classical MLE.

    The limiting form of the prior distribution in this case is Haldanes (1931)prior,

    p() 1(1 )

    ,

    which is an improper density. We analyze the use of improper densities furtherin chapter 5.

  • Prediction and posterior distribution

    The form of the predictive and posterior distributions depends on the samplingdistribution. We shall consider 4 cases:

    Bernoulli trials.

    Binomial data.

    Geometric data.

    Negative binomial data.

  • Bernoulli trials

    Given that the beta distribution is conjugate in coin tossing experiments, givena (Bernoulli or binomial, etc.) sampling distribution, f(x|), we only need tocalculate the prior predictive density f(x) =

    f(x|)p() d for a beta prior.

    Assume first that we have a Bernoulli trial with P (X = 1|) = and priordistribution B(, ). Then:

    Theorem 3If X is a Bernoulli trial with parameter and B(, ), then:

    f(x) =

    {

    + if x = 1

    + if x = 0

    E[X] =

    + , V [X] =

    ( + )2.

    Given a sample x = (x1, . . . , xn) of Bernoulli data, the posterior distributionis |x B( +

    ni=1 xi, + n

    ni=1 xi).

  • Proof

    P (X = 1) = 1

    0

    P (X = 1|)p() d

    = E[] =

    +

    P (X = 0) = 1 P (X = 1) = +

    E[X] = 0 P (X = 0) + 1 P (X = 1) = +

    V [X] = E[X2] E[X]2

    =

    + (

    +

    )2=

    ( + )2.

    The posterior distribution formula was demonstrated previously.

  • Binomial data

    Theorem 4Suppose that X| BI(m, ) with B(, ). Then, the predictive densityof X is the beta binomial density

    f(x) =(

    mx

    )B( + x, + m x)

    B(, )for x = 0, 1, . . . ,m

    E[X] = m

    +

    V [X] = m(m + + )

    ( + )2( + + 1)

    Given a sample, x = (x1, . . . , xn), of binomial data, the posterior distributionis |x B( +

    ni=1 xi, + mn

    ni=1 xi).

  • Proof

    f(x) = 1

    0

    f(x|)p() d

    = 1

    0

    (mx

    )x(1 )mx 1

    B(, )1(1 )1 d for x = 0, . . . ,m

    =(

    mx

    )1

    B(, )

    10

    +x1(1 )+mx1 d

    =(

    mx

    )B( + x, + m x)

    B(, ).

    E[X] = E[E[X|]]= E[m] remembering the mean of a binomial distribution

    = m

    + .

  • V [X] = E[V [X|]] + V [E[X|]]= E[m(1 )] + V [m]

    = mB( + 1, + 1)

    B(, )+ m2

    ( + )2( + + 1)

    = m

    ( + )( + + 1)+ m2

    ( + )2( + + 1)

    = m(m + + )

    ( + )2( + + 1).

  • Geometric data

    Theorem 5Suppose that B(, ) and X| GE(), that is

    P (X = x|) = (1 )x for x = 0, 1, 2, . . .

    Then, the predictive density of X is

    f(x) =B( + 1, + x)

    B(, )for x = 0, 1, 2, . . .

    E[X] =

    1for > 1

    V [X] =( + 1)( 1)2( 2)

    for > 2.

    Given a sample, x = (x1, . . . , xn), then |x B( + n, +n

    i=1 xi).

    Proof Exercise.

  • Negative binomial data

    Theorem 6Suppose that B(, ) and X| NB(r, ), that is

    P (X = x|) =(

    x + r 1x

    )r(1 )x for x = 0, 1, 2, . . . Then:

    f(x) =(

    x + r 1x

    )B( + r, + x)

    B(, )for x = 0, 1, . . .

    E[X] = r

    1for > 1

    V [X] =r(r + 1)( + 1)

    ( 1)2( 2)for > 2.

    Given a sample x = (x1, . . . , xn) of negative binomial data, the posteriordistribution is |x B( + rn, +

    ni=1 xi).

    Proof Exercise.

  • Conjugate priors

    Raiffa

    The concept and formal definition of conjugate prior distributions comes fromRaiffa and Schlaifer (1961).

    Definition 2If F is a class of sampling distributions f(x|) and P is a class of priordistributions for , p() we say that P is conjugate to F if p(|x) P f(|) F y p() P.

  • The exponential-gamma system

    Suppose that X| E(), is exponentially distributed and that we use agamma prior distribution, G(, ), that is

    p() =

    ()1e for > 0.

    Given sample data x, the posterior distribution is

    p(|x) p()l(|x)

    ()1e

    ni=1

    exi

    +n1e(+nx)

    which is the nucleus of a gamma density |x G( + n, + nx) and thus thegamma prior is conjugate to the exponential sampling distribution.

  • Interpretation and limiting results

    The information represented by the prior distribution can be interpreted asbeing equivalent to the information contained in a sample of size and samplemean /.

    Letting , 0, then the posterior distribution approaches G(n, nx) andthus, for example, the posterior mean tends to 1/x which is equal to the MLEin this experiment. However, the limiting prior distribution in this case is

    f() 1

    which is improper.

  • Prediction and posterior distribution

    Theorem 7Let X| E() and G(, ). Then the predictive density of X is

    f(x) =

    ( + x)+1for x > 0

    E[X] =

    1for > 1

    V [X] =2

    ( 1)2( 2)for > 2.

    Given a sample x = (x1, . . . , xn) of exponential data, the posterior distributionis |x G( + n, + nx).

    Proof Exercise.

  • Exponential families

    The exponential distribution is the simplest example of an exponential familydistribution. Exponential family sampling distributions are highly related tothe existence of conjugate prior distributions.

    Definition 3A probability density f(x|) where R is said to belong to the one-parameterexponential family if it has form

    f(x|) = C()h(x) exp(()s(x))

    for given functions C(), h(), (), s(). If the support of X is independent of then the family is said to be regular and otherwise it is irregular.

  • Examples of exponential family distributions

    Example 12The binomial distribution,

    f(x|) =(

    mx

    )x(1 )mx for x = 0, 1, . . . ,m

    = (1 )m(

    mx

    )exp

    (x log

    1

    )is a regular exponential family distribution.

    Example 13The Poisson distribution,

    f(x|) = xe

    x!= e

    1x!

    exp(x log ) for x = 0, 1, 2, . . .

    is a regular exponential family distribution.

  • Irregular and non exponential family distributions

    Example 14The uniform distribution,

    f(x|) = 1

    for 0 < x <

    is an irregular exponential family distribution.

    Example 15The Cauchy density,

    f(x|) = 1 (1 + (x )2)

    for < x <

    is not an exponential family distribution.

    The Students t and Fishers F distributions and the logistic distribution areother examples of non exponential family distributions.

  • Sufficient statistics and exponential family distributions

    Theorem 8If X| is a one-parameter, regular exponential family distribution, then given asample x = (x1, . . . , xn), a sufficient statistic for is t(x) =

    ni=1 s(xi).

    Proof The likelihood function is

    l(|x) =n

    i=1

    C()h(xi) exp (()s(xi))

    =

    (n

    i=1

    h(xi)

    )C()n exp (()t(x))

    = h(x)g(t, )

    where h(x) = (n

    i=1 h(xi)) and g(t, ) = C()n exp (()t(x)) and therefore,

    t is a sufficient statistic as in Theorem 1.

  • A conjugate prior to an exponential family distribution

    If f(x|) is an exponential family, with density as in Definition 3, then aconjugate prior distribution for exists.

    Theorem 9The prior distribution p() C()a exp (()b) is conjugate to the exponentialfamily distribution likelihood.

    Proof From Theorem 8, the likelihood function for a sample of size n is

    l(|x) C()n exp (()t(x))

    and given p() as above, we have a posterior of the same form

    p(|x) C()a? exp (()b?)

    where a? = a + n and b? = b + t(x) for j = 1, . . . , n.

  • The Poisson-gamma system

    Suppose that X| P(). Then, from Example 13, we have the exponentialfamily form

    f(x|) = e 1x!

    exp(x log )

    and from Theorem 9, a conjugate prior density for has form

    p() (e)a

    exp(b log )

    for some a, b. Thus, p() bea which is the nucleus of a gamma density, G(, ), where = b + 1 and = a.

    Thus, the gamma prior distribution is conjugate to the Poisson samplingdistribution.

  • Derivation of the posterior distribution

    Now assume the prior distribution G(, ) and suppose that we observe asample of n Poisson data. Then, we can derive the posterior distribution viaBayes theorem as earlier.

    p(|x)

    ()1e

    ni=1

    xie

    xi!

    +nx1e(+n)

    |x G( + nx, + n)

    Alternatively, we can use Theorem 9. The sufficient statistic is t(x) =n

    i=1 xiand thus, from Theorem 9, we know immediately that

    |x G( b+1

    +t(x), a

    +n) G( + nx, + n).

  • Interpretation and limiting results

    We might interpret the information in the prior as being equivalent to a theinformation contained in a sample of size with sample mean /.

    Letting , 0, the posterior mean converges to the MLE although thecorresponding limiting prior, p() 1, is improper.

  • Prediction and posterior distribution

    Theorem 10Let X| P() with G(, ). Then:

    X NB(

    ,

    + 1

    )that is,

    f(x) =(x + )x!()

    (

    + 1

    )( 1 + 1

    )xfor x = 0, 1, 2, . . .

    E[X] =

    V [X] =( + 1)

    2.

    Given a sample x = (x1, . . . , xn) of Poisson data, the posterior distribution is|x G( + nx, + n).

    Proof Exercise.

  • The uniform-Pareto system

    As noted in Example 14, the uniform distribution is not a regular exponentialfamily distribution. However, we can still define a conjugate prior distribution.

    Let X U(0, ). Then, given a sample of size n, the likelihood function isl(|x) = 1n, for > xmax, where xmax is the sample maximum.

    Consider a Pareto prior distribution, PA(, ), with density

    p() =

    +1for > > 0

    and mean E[] = /( 1).

    It is clear that this prior distribution is conjugate and that

    |x PA (?, ?)

    where ? = + n and ? = max{, xmax}.

  • Properties and limiting results

    The posterior mean in this case is

    E[|x] = ( + n) max{, xmax} + n 1

    which cannot be represented in a general way as an average of the prior meanand the MLE (xmax).

    We can interpret the information in the prior as being equivalent to a sampleof size where the maximum value is equal to . When we let , 0,then the posterior distribution approaches PA(n, xmax) and in this case, theposterior mean is

    E[|x] = nxmaxn 1

    > xmax = .

    This also corresponds to an improper limiting prior p() 1/.

    In this experiment, no prior distribution (except for a point mass at xmax) willlead to a posterior distribution whose mean coincides with the MLE.

  • Prediction and posterior distribution

    Theorem 11If X| U(0, ) and PA(, ), then:

    f(x) =

    {

    (+1) if 0 < x <

    (+1)x+1if x

    E[X] =

    2( 1)for > 1

    V [X] =( + 2)2

    12( 1)( 2)for > 2.

    Given a sample of size n, then the posterior distribution of is |x PA ( + n, max{, xmax}).

    Proof Exercise.

  • Higher dimensional exponential family distributions

    Definition 4A probability density f(x|) where Rk is said to belong to the k-parameterexponential family if it has form

    f(x|) = C()h(x) exp

    kj=1

    j()sj(x)

    for given functions C(), h(), (), s(). If the support of X is independent of then the family is said to be regular and otherwise it is irregular.

    Example 16The multinomial density is given by f(x|) = m!k

    j=1 xj!

    kj=1

    xjj where

    xj {0, 1, . . . ,m} for j = 1, . . . , k,k

    j=1 xj = m andk

    j=1 j = 1.

  • We can write

    f(x|) = m!kj=1 xj!

    exp

    kj=1

    xj log(j)

    =

    m!kj=1 xj!

    exp

    (m k1j=1

    xj) log k +k1j=1

    xj log(j)

    =

    m!kj=1 xj!

    mk exp

    k1j=1

    xj log(j/k)

    and thus, the multinomial distribution is a regular, k1 dimensional exponentialfamily distribution.

  • It is clear that we can generalize Theorem 8 to the k dimensional case.

    Theorem 12If X| is a k-parameter, regular exponential family distribution, thengiven a sample x = (x1, . . . ,xn), a sufficient statistic for is t(x) =(n

    i=1 s1(xi), . . . ,n

    i=1 sk(xi)).

    Proof Using the same arguments as in Theorem 8, the result is easy to derive.

  • A conjugate prior to an exponential family sampling distribution

    We can also generalize Theorem 9. If f(x|) is an exponential family, withdensity as in Definition 4, then a conjugate prior distribution for exists.

    Theorem 13The prior distribution p() C()a exp

    (kj=1 j()bj

    )is conjugate to the

    k dimensional exponential family distribution likelihood.

    Proof Exercise.

  • The multinomial-Dirichlet system

    Suppose that X| MN (m,) is a k- dimensional multinomial samplingdistribution. Then from Example 16, we can express the multinomial densityas

    f(x|) = m!kj=1 xj!

    mk exp

    k1j=1

    xj log(j/k)

    and therefore, a conjugate prior distribution is

    f() (km)a exp

    k1j=1

    bj log(j/k)

    for arbitrary a, b1, . . . , bk1

    k1j=1

    jbj

    ak1

    j=1 bj

    k k

    j=1

    j1j

    where j = bj + 1 for j = 1, . . . , k 1 and k = ak

    j=1 bj + 1.

  • The Dirichlet distribution

    Definition 5A random variable, = (1, . . . , k) is said to have a Dirichlet distribution, D(1, . . . , k) if

    p() =(k

    j=1 j

    )k

    j=1 (j)

    kj=1

    j1j

    for 0 < j < 1,k

    j=1 j = 1.

    The Dirichlet distribution may be thought of as a generalization of the betadistribution. In particular, the marginal distribution of any j is beta, that is

    j B(j, 0 j), where 0 =k

    j=1 j.

  • Also, the moments of the Dirichlet distribution are easily evaluated:

    E[j] =j0

    V [j] =j(0 j)20(0 + 1)

    Cov[ij] = ij

    20(0 + 1)

  • Prediction and posterior distribution

    Theorem 14Let X| MN (m,) and D(1, . . . , k). Then:

    P (X = x) =m! (0)

    (m + 0)

    kj=1

    (xj + j)xj!(j)

    for xj 0,k

    j=1 xj = m

    where 0 =k

    j=1 j. Also

    E[Xj] = mj0

    V [Xj] = m(0 + m)j(0 j)20(0 + 1)

    Cov[Xi, Xj] = m(0 + m)ij

    20(0 + 1)

  • Given a sample, x = (x1, . . . ,xn) of multinomial data, then

    |x D

    1 + nj=1

    x1j, . . . , k +n

    j=1

    xkj

    .

    Proof Exercise

  • Canonical form exponential family distributions

    Suppose that we have a standard exponential family distribution

    f(x|) = C()h(x) exp

    kj=1

    j()sj(x)

    .Define the transformed variables Yj = sj(X) and transformed parametersj = j() for j = 1, . . . k. Then the density of y| is of canonical form.

    Definition 6The density

    f(y|) = D(y) exp

    kj=1

    yjj e()

    is called the canonical form representation of the exponential family distribution.

  • Conjugate analysis

    Clearly, a conjugate prior for takes the form

    p() exp

    kj=1

    bjj ae()

    exp k

    j=1

    aBjj ae()

    where Bj =

    bja for j = 1, . . . , k. We shall call this form the canonical prior

    for . If a sample of size n is observed, then

    p(|y) exp

    kj=1

    aBj + nyja + n

    j (a + n)e()

    exp

    (aB + ny

    a + n

    T

    (a + n)e()

    )

    The updating process for conjugate models involves a simple weighted averagerepresentation.

  • The posterior mean as a weighted average

    Suppose that we observe a sample of n data from a canonical exponential familydistribution as in Definition 6 and that we use a canonical prior distribution

    p() exp

    kj=1

    aBjj ae()

    exp (aBT ae()) .Then we can demonstrate the following theorem.

    Theorem 15E [Oe()] = aB+nya+n where [Oe()]j =

    j

    e().

    Proof As we have shown that the prior is conjugate, it is enough to provethat, a priori, E[Oe()] = B. However,

    a (B E[Oe()]) =

    a (B Oe()) p() d =

    Op() d

  • Example 17Let X| P(). Then, from Example 13, we can write

    f(x|) = e 1x!

    exp(x log ) =1x!

    exp(x log ) = 1x!

    exp(x e

    ),

    where = e, in canonical exponential family form. We already know that thegamma prior density G(, ) is conjugate here. Thus,

    p() 1e p() (log )

    exp(

    e

    )in canonical form. Now O = dde

    = e = . Thus, from Theorem 15, wehave E[|x] = +n

    +

    n+nx, a weighted average of the prior mean and the

    MLE.

  • Example 18Consider the Bernoulli trial f(x|) = x(1 )1x for x = 0, 1. We have

    f(x|) = (1 )(

    1

    )x= exp

    (x log

    1 log 1

    1

    )= exp

    (x log(1 + e)

    )in canonical form, where = log 1.

    Thus, the canonical prior for must take the form ,

    p() exp(aB a log(1 + e)

    ) (1 + e)a exp(aB).

  • This implies, using the standard change of variables formula, that

    p() (1 )a(

    1

    )aB 1(1 )

    aB1(1 )a(1B)1

    which we can recognize as a beta distribution, B(, ), with parameters = aB and = a(1B).

    Now O log(1 + e) = e

    1+e= and so, from Theorem 15, given a sample of n

    Bernoulli trials, we have

    E[|x] = aB + nxa + n

    = w

    + + (1 w)x

    where w = +++n.

    We previously derived this formula by standard Bayesian analysis on page 95.

  • Mixtures of conjugate priors

    Suppose that a simple conjugate prior distribution p() P does not wellrepresent our prior beliefs. An alternative is to consider a mixture of conjugateprior distributions

    ki=1

    wipi()

    where 0 < wi < 1 andk

    i=1 wi = 1 and pi() P.

    Note that Dalal and Hall (1983) demonstrate that any prior density for anexponential family can be approximated arbitrarily closely by a mixture ofconjugate distributions.

    It is easy to demonstrate that given a conjugate prior mixture distribution theposterior distribution is also a mixture of conjugate densities.

  • Proof Using Bayes theorem, we have

    p(|x) f(x|)k

    i=1

    wipi()

    k

    i=1

    wipi()f(x|)

    k

    i=1

    wi

    (pi()f(x|)d

    )pi()f(x|)pi()f(x|)d

    k

    i=1

    wi

    (pi()f(x|)d

    )pi(|x)

    where pi(|x) P because it is assumed that pi() is conjugate and thereforep(|x) =

    ki=1 w

    ?i pi(|x) where w?i =

    wi(

    pi()f(x|)d)kj=1 wj(

    pj()f(x|)d)

    .

  • Example 19In Example 11, suppose that we use a mixture prior distribution:

    0.25B(1, 1) + 0.75B(5, 5).

    Then, the posterior distribution is given by

    p(|x) (

    0.25 1 + 0.75 1B(5, 5)

    51(1 )51)

    9(1 )3

    0.25101(1 )41 + 0.75 1B(5, 5)

    141(1 )81

    B(10, 4) 1B(10, 4)

    101(1 )41 + 3B(14, 8)B(5, 5)

    1B(14, 8)

    141(1 )81

    = w?1

    B(10, 4)101(1 )41 + (1 w?) 1

    B(14, 8)141(1 )81

  • where w? = B(10,4)B(10,4)+3B(14,8)/B(5,5) = 0.2315.

    Thus, the posterior distribution is a mixture of B(10, 4) and B(14, 8) densitieswith weights 0.2315 and 0.7685 respectively.

  • Software for conjugate models

    Some general software for undertaking conjugate Bayesian analysis has beendeveloped.

    First Bayes is a slightly dated but fairly complete package

    The book on Bayesian computation by Albert (2009) gives a number of Rroutines for fitting conjugate and non conjugate models all contained in theR LearnBayes package.

    http://www.tonyohagan.co.uk/1b/http://cran.r-project.org/web/packages/LearnBayes/index.html

  • References

    Albert, J. (2009) Bayesian Computation with R. Berlin: Springer.

    Dalal, S.R. and Hall, W.J. (1983). Approximating priors by mixtures of natural conjugate

    priors. Journal of the Royal Statistical Society Series B, 45, 278286.Haldane, J.B.S. (1931). A note on inverse probability. Proceedings of the CambridgePhilosophical Society, 28, 5561.Raiffa, H. and Schlaifer, R. (1961). Applied Statistical Decision Theory. Cambridge MA:Harvard University Press.

  • Application I: Bayesian inference forMarkovian queuing systems

    A queue of children

    We will study inference for the M/M/1 queueing system, as developed byArmero and Bayarri (1994a).

  • The M/M/1 queueing system

    The M()/M()/1 system is a queue with a single server and exponentialinter-arrival and service times with rates and respectively.

    Under this model, clients arrive at the checkout according to a Markov processwith rate and if the server is empty, they start an exponential service timeprocess with rate . If the server is busy, the client joins the queue of customerswaiting to be served.

  • Stability and equilibrium distributions

    In practical analysis, it is of particular importance to study the conditions underwhich such a queueing system is stable, i.e. when the queue length does notgo off to infinity as more customers arrive. It can be shown that the system isstable if the traffic intensity, defined by = / is strictly less than 1.

    In the case that the system is stable, it is interesting to consider the equilibriumor stationary distribution of the queue size, client waiting times etc.

    It can be shown that the equilibrium distribution of the number of clients inthe system, N is geometric;

    P (N = n|) = (1 )n for N = 0, 1, 2, . . . with meanE[N |] = 1

    .

    Also, the limiting distribution of the time, W , spent in the system by acustomer is exponential; W |, E( ), with mean E[W |, ] = 1.

    The distributions of other variables of interest such as the duration of a busyperiod are also well known. See e.g. Gross and Harris (1985).

  • Experiment and inference

    In real systems, the arrival and service rates will be unknown and wemust use inferential techniques to estimate these parameters and the systemcharacteristics. A first step is to find a reasonable way of collecting data.

    The simplest experiment is that of observing nl inter-arrival times and nsservice times. In this case, the likelihood is

    l(, |x,y) nletlnsets

    where tl and ts are the sums of inter-arrival and service times respectively.

    If we use conjugate, independent gamma prior distributions,

    G(l, l) G(s, s),

    then the posterior distributions are also gamma distributions:

    |x G(l + nl, l + tl) |y G(s + ns, s + ts).

  • Estimation of the traffic intensity

    It is straightforward to estimate the mean of the traffic intensity . We have

    E[|x,y] = E[

    x,y]= E[|x]E

    [1

    y]=

    l + nll + tl

    s + tss + ns 1

    We can also evaluate the distribution of by recalling that if G(

    a2,

    b2

    ),

    then b 2a is chi-square distributed.

  • The probability that the system is stable

    Thus, we have

    2(l + tl)|x 22(l+nl) 2(s + ts)|y 22(s+ns)

    and therefore, recalling that the ratio of two independent 2 variables dividedby their degrees of freedom is an F distributed variable, we have

    (l + tl)(s + ns)(l + nl)(s + ts)

    |x,y F2(l+nl)2(s+ns).

    The posterior probability that the system is stable is

    p = P ( < 1|x,y)

    = P(

    F 1, go to a).

    Given the Monte Carlo sampled data, we can then estimate the queuesize probabilities, P (N = n|x,y, equilibrium) 1M

    Mi=1(1 i)ni for n =

    0, 1, 2, . . . and the waiting time distribution, P (W w|x,y, equilibrium) 1 1M

    Mi=1 e

    (ii).

  • Estimating the mean of the equilibrium system size distribution

    It is easier to evaluate the predictive moments of N . We have

    E[N |x,y, equilibrium] = E[E[N |]|x,y, < 1]

    = E[

    1

    x,y, < 1]=

    1p

    10

    1 p(|x,y) d =

    and thus, these moments do not exist.

    This is a general characteristic of inference in queueing systems with Markovianinter-arrival or service processes. The same thing also happens with theequilibrium waiting time or busy period distributions. See Armero and Bayarri(1994a) or Wiper (1998).

    The problem stems from the use of (independent) prior distributions for and with p( = ) > 0 and can be partially resolved by assuming a priori thatP ( ) = 0. See Armero and Bayarri (1994b) or Ruggeri et al (1996).

  • Example

    Hall (1991) gives collected inter-arrival and service time data for 98 usersof an automatic teller machine in Berkeley, USA. We shall assume here thatinter-arrival and service times can be modeled as exponential variables.

    0 1 2 3 4 5 60

    5

    10

    15

    20

    25

    30

    35

    x

    freq

    The sufficient statistics are na = ns = 98, ta = 119.71 and ts = 81.35 minutesrespectively. We will assume the improper prior distributions p() 1 andp() 1 which we have seen earlier as limiting cases of the conjugate gammapriors. Then |x G(98, 119.71) and |y G(98, 119.71).

  • The posterior distribution of the traffic intensity

    The posterior expected value of is E[|x,y] 0.69 and the distributionfunction F (|x,y) is illustrated below.

    0 0.2 0.4 0.6 0.8 10

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1Cdf of

    F(

    |dat

    a)

    The posterior probability that the system is stable is 0.997.

  • The posterior distribution of N

    We used a large Monte Carlo sample to estimate the posterior density of N .There is only a very small probability that there are more than 15 clients inthe system.

    0 2 4 6 8 10 12 14 160

    0.05

    0.1

    0.15

    0.2

    0.25

    0.3

    0.35

    n

    P(N

    =n|d

    ata)

  • The posterior distribution of W

    There is only a very small probability that a client spends over 5 minutes inthe system.

    0 1 2 3 4 5 60

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    F(w

    )

    w

    P(W < w)

  • Extensions

    Calculation of busy period and transient time distributions.

    Extension to other queueing systems:

    Markovian systems with various or unlimited numbers of servers, e.g.Armero and Bayarri (1997).

    Networks of queues. Armero and Bayarri (1999). Non Markovian systems. See e.g. Wiper (1998), Ausn et al (2004).

  • ReferencesArmero, C. and Bayarri, M.J. (1994a). Bayesian prediction in M/M/1 queues. QueueingSystems, 15, 419-426.Armero, C. and Bayarri, M.J. (1994b). Prior assessments for prediction in queues. TheStatistician, 43, 139-153Armero, C. and Bayarri, M.J. (1997). Bayesian analysis of a queueing system with unlimited

    service. Journal of Statistical Planning and Inference, 58, 241261.Armero, C. and Bayarri, M.J. (1999). Dealing with Uncertainties in Queues and Networks

    of Queues: A Bayesian Approach. In: Multivariate Analysis, Design of Experiments andSurvey Sampling, Ed. S. Ghosh. New York: Marcel Dekker, pp 579608.Ausn, M.C., Wiper, M.P. and Lillo, R.E. (2004). Bayesian estimation for the M/G/1

    queue using a phase type approximation. Journal of Statistical Planning and Inference,118, 83101.

    Gross, D. and Harris, C.M. (1985). Fundamentals of Queueing Theory (2nd ed.). NewYork: Wiley.

    Hall, R.W. (1991). Queueing Methods. New Jersey: Prentice Hall.Ruggeri, F., Wiper, M.P. and Rios Insua, D. (1996). Bayesian models for correlation in

    M/M/1 queues. Quaderno IAMI 96.8, CNR-IAMI, Milano.Wiper, M.P. (1998). Bayesian analysis of Er/M/1 and Er/M/c queues. Journal ofStatistical Planning and Inference, 69, 6579.