Analisis de Datos de Ondas Gravitacionales 2da · PDF file 11/10/2017  · Analisis...

Click here to load reader

  • date post

    07-Aug-2020
  • Category

    Documents

  • view

    1
  • download

    0

Embed Size (px)

Transcript of Analisis de Datos de Ondas Gravitacionales 2da · PDF file 11/10/2017  · Analisis...

  • Analisis de Datos de Ondas Gravitacionales

    2da parte

    Basada en apuntes desarrollados por J. Romano

  • Gaussianity A single random variable x is said to have a Gaussian probability distribution

    if

    p(x) = 1

    2πσ2 x

    exp

    [

    1

    2

    (x − µx)2

    σ2 x

    ]

    The parameters µx and σ2x are the mean and variance of x.

    A set of random variables xj , where j = 0, 1, · · · , N − 1 (e.g., the discrete time samples xj of a random process x) is said to have a multivariate Gaussian probability distribution if

    p(x1, x2, . . . , xN ) = 1

    (2π)N/2 √

    det Cx exp

    [

    − 1 2

    ∑N−1 i,j=0 C

    −1 xij(xi − µxi)(xj − µxj)

    ]

    where Cxij is the covariance matrix Cxij := ⟨xixj⟩ − ⟨xi⟩⟨xj⟩

    Cxij generalises the variance σ 2 x for a single Gaussian-distributed random

    variable.

    If the discrete random process xj has zero mean (i.e., ⟨xj⟩ = 0), then the elements of the covariance matrix are just the discretised version of the correla- tion function Cx(t) = ⟨x(t′)x(t + t′)⟩ that we defined for a continuous random process x.

  • Statistical inference The ultimate goal of science is to infer nature's state from observations. Since observations are incomplete and imprecise, being often contaminated by noise, our conclusions will always have some level of uncertainty associated with them. Statistical inference (or probability theory) is a way of quantifying and manipulating uncertainty.

  • There are (at least) two ways of interpreting probability:

    (i) degree of belief in a statement

    (ii) long-run relative occurrence of an event in a set of identical experiments

    Interpretation (ii) requires the experiment to be repeatable in principle. Hence, the question ``what's the probability that it's going to rain today" doesn't make sense with respect to this definition, unless you are willing to imagine an ensemble of identical worlds with the same initial weather conditions.

  • Interpretation (i) is more general than interpretation (ii) since it need not be associated with repeatable experiments.

    But both interpretations lead to the same algebra of probability. Namely the probability of a statement X such as ``it will rain today" or ``gravitational waves from a particular supernovae were incident on our detector during the past hour" is given by a number p(X) between 0 and 1 which satisfies:

    (i) p(X = true) = 1, p(X = false) = 0, 0 < p(X = not sure) < 1 (ii) p(X) + p(X̄) = 1, where X̄ means not X (iii) p(X, Y ) = p(X|Y )p(Y ), where p(X|Y ) is conditional probability of X

    given Y

    Property (ii) is called the sum rule; property (iii) is the product rule.

    Note that a joint probability distribution for x and y can be converted into a marginalised distribution for x by integrating out the y-dependence:

    p(X) = ∫

    dY p(X, Y ) = ∫

    dY p(X|Y )p(Y )

    where the second equality follows from the product rule.

  • Frequentist vs. Bayesian statistics

    (i) is a branch of statistical inference that takes Interpretation (ii) as its interpretation of probability.

    (ii) probabilities can only be assigned to random variables -e.g., the outcomes of repeated identical experiments- and not to hypotheses or parameters describing the state of nature, which have fixed but unknown values.

    (iii) one assumes that the measured data are drawn from some underlying probability distribution, which assumes the truth of a particular hypothesis or model.

    Frequency Statistics

  • (iv) one constructs a statistic (i.e., a function of the data that estimates a signal parameter or indicates how well the data fits a particular hypothesis). The statistic is a random variable since the data from which it is constructed are random. (v) one calculates analytically or via Monte Carlo simulations the probability distribution of the statistic (the so-called sampling distribution). (vi) one needs to be clever in choosing a statistic as they are not unique, and have different properties (e.g.,unbiased but large variance, biased but small variance, etc.).

  • (vii) the sampling distribution depends on data values that were not actually observed, which depend in general on how the experiment was carried out or might have been carried out!. (This is related to the so-called stopping problem of frequentist statistics.)

    (viii) from the statistic and sampling distribution one calculates either confidence intervals for parameter estimation or p-values for rejecting null hypotheses. (More on these later.)

    (ix) claims to be more objective than Bayesian statistics since it does not require the introduction of a prior.

  • Bayesian statistics (i) is a branch of statistical inference that takes Interpretation (i) as its interpretation of probability. (ii) probabilities can be assigned to hypotheses or parameters describing the state of nature, which have fixed but unknown values. (iii) one uses Bayes' theorem to update the degree of belief in a particular hypothesis, in light of the data that was actually measured.

  • (iv) the likelihood function contains all that the data has to say about the problem; it's what converts prior probabilities to posterior probabilities based on the observed data. (v) from the posterior distribution, one can construct probability intervals for parameter estimation or posterior odds ratios for comparing various hypotheses. (vi) is not necessarily subjective, as priors can be assigned in a consistent objective fashion, based on only the information at hand (so- called least informative priors).

  • Bayes' theorem Bayes’ theorem is a simple consequence of p(X, Y ) = p(Y, X) and the product rule. It relates the conditional probabilities p(X|Y ) and p(Y |X):

    p(X|Y ) = p(Y |X)p(X) p(Y )

    p(X) is called the prior probability of X; p(X|Y ) is the posterior probability of X given Y. p(Y |X) is called the likelihood of X; p(Y ) is called the evidence; and

    In a typical situation, X will correspond to a model or some hypothesis about the state of nature and Y will be the data that we collect from an experiment. Denoting the hypothesis by H and the data by D, we get

    p(H|D) = p(D|H)p(H) p(D)

  • In words: ``The probability of a hypothesis given the observed data is proportional to the probability of the data assuming the hypothesis is true times the prior probability that the hypothesis is true, before taking into account the observed data."

    Bayes' theorem tells us how probabilities evolve in light of new data.

  • Bayes' theorem: Example The following example is borrowed from Graham Woan:

    A gravitational wave detector may have detected a gravitational wave burst from a Type II supernova. But since burst-like signals in a detector can also be produced by instrumental glitches -in fact, only 1 out of 10,000 bursts are really due to a supernova-the data are checked for glitches using an auxiliary veto channel test. From Monte Carlo simulations, one finds that the veto channel test confirms that the burst is due to a supernova 95% of the time if there really was a GW burst in the data; but falsely claims the that the burst is due to a supernova 1% of the time, when there was no GW burst in the data. It turns out that the measured burst passes the veto channel test.

    What is the probability that it's due to a supernova?

  • Notation: S = burst is due to a supernovae G = burst is due to a glitch + = veto test says the burst is due to a supernovae - = veto test says the burst is due to a glitch We would like to calculate P(S | +).

  • Information available: p(S) = .0001 p(G) = .9999\\ p(+|S) = .95\\ p(+|G) = .01

    Application of Bayes’ theorem

    p(S|+) = p(+|S)p(S) p(+)

    using

    p(+) = p(+|S)p(S) + p(+|G)p(G) ≈ .01

    yields p(S|+) ≈ .01, so very unlikely to be associated with a supernova. Note that p(S|+) ̸= p(+|S).

  • Bayesian parameter estimation

    Parameter estimation in Bayesian statistics is via the posterior distribution P(a|D). The posterior distribution tells you everything you need to know about a parameter, although you can reduce it to a few numbers if you like -e.g., mode, mean, standard deviation, etc. A Bayesian confidence interval is simply defined in terms of the area under the posterior distribution between one parameter value and another.

  • If the posterior distribution depends on two parameters a and b, but you only really care about a, then you can obtain the posterior distribution for a by marginalizing over b:

    p(a|D) =

    ∫ db p(a, b|D) =

    ∫ db p(a|b, D)p(b)

  • Bayesian hypothesis testing It doesn't make sense to talk about a single hypothesis in Bayesian statistics without making reference to alternative hypotheses. This is because we need to specify the alternative hypotheses to calculate the denominator P(D) in Bayes' theorem:

    p(D) = ∑

    i

    p(D|Hi)p(Hi)

    p(H1|D)

    p(H2|D) =

    p(D|H1)

    p(D|H2)

    p(H1)

    p(H2)

    Comparison of two hypotheses is natural in the Bayesian framework:

  • In words: ``Posterior odds equals the likelihood ratio times the prior odds." The li