CIAO Workshop AAS 235/Honolulu 2020 Jan 3-4 Statistics€¦ · 2 = λ VLK: CIAO Workshop...

CIAO Workshop AAS 235/Honolulu 2020 Jan 3-4

Statistics for High-Energy Astronomy

Vinay KashyapCHASC/CXC/CfA

Outline A mechanism to understand how much your data is telling you. Cannot blindly surrender scientific judgement.

1. Photon Counts and the Poisson distribution

2. Gaussian

1. Likelihood and χ2

2. Poisson vs Gaussian

3. Error propagation

3. Fitting

1. Best fit

1. error bars

2. goodness of fit

3. cstat

4. CIAO/Sherpa

VLK: CIAO Workshop AAS235/Honolulu 2020 Jan 43

data summaries:statistics :: astrometry:astrophysics

1. Counts

❖ ACIS and HRC are photon counting detectors. Events are recorded as they arrive, usually sloooowly

❖ What does this imply?

Light curve of steady source HZ 43 binned at 1 sec

Notice: asymmetry, scatter around the mean

Distribution of counts in the light curve binned at 1 sec

Notice:❖ asymmetry❖ +ve integers❖ distribution

Poisson likelihood

1. Poisson Likelihood

❖ p(k|λ) = (1/k!) λᵏ e–λ

❖ The probability of seeing k events when λ are expected

❖ e.g., λ = count rate × time interval ≡ r ⋅ Δt

❖ mean, µ = ∑k k p(k|λ) = λ

❖ variance, σ² = k̅²̅ – k̅2 = λ

p(k|λ) for different λ

2. Gaussian

❖ A Gaussian distribution is convenient

❖ Symmetric, ubiquitous (because of the Central Limit Theorem), easy to handle uncertainties

❖ N(x;µ,σ²) = [1/σ√2π] e –(x-µ)²/2σ²

2.1 Gaussian likelihood

❖ Probability of obtaining observed data given the model

p(x|θ,σθ) dx = N(x; θ,σθ²) dx

❖ When you have several data points

p({xk}|θi) = (2π)–N/2 Πk σk–1 e–(xk-µk)²/2σk²

= (2π)–N/2 (Πk σk–1) exp[–∑k(xk-µk)²/2σk²]

❖ log Likelihood ∝ –∑k (xk-µk)² / 2σk²

2.2 Poisson -> Gaussian

❖ Variance of Poisson is = mean

❖ As λ↑

Pois(k|λ) → N(k;λ,(√λ)²)

❖ Convenient!

2.3 Gaussian Error Propagation❖ How to propagate uncertainty from one stage to another — if g=f(x), and σx is known, what is σg =?= f(σx)

❖ Simple case: if everything is distributed as a Gaussian, and has well-defined means and standard deviations, then at "best fit" values ai, g=g(ai)

σ²g = ∑i ∑k (gk(aᵢ+δaᵢ)-gk(aᵢ))²/N

and expand as Taylor series to get

σ²g = ∑i ∑j (∂g/∂ai)(∂g/∂aj) σaiaj

or ignoring correlations amongst the {ai}, σaiaj = σai²δij

σ²g ≈ ∑i (∂g/∂ai)2 σ²ai

2.3 Error Propagationg = C ⋅ a

→ σg = C ⋅ σₐ uncertainties scale

g = ln(a) → σg = σₐ/a converts to fractional error

g = 1/a → σg = (1/a2) σa ≡ (g/a) σa ⇒ σg/g = σₐ/a fractional errors stay as they are

g = a + b → σ²g = σ²a + σ²b errors square-add

g = g(ai)

σ²g = ∑ᵢ (∂g/∂aᵢ)² σ²ai

3.1 Fitting: Best-fit❖ The best fit is one that maximizes the likelihood

❖ e.g., linear regression — yi = α + β xi + ε

solve by finding extremum of log likelihood

lnL ∝ ∑k (yk – α – βxk)²

∂lnL/∂α = ∂lnL/∂β = 0

⇒ β̂ = Cov(x,y)/Var(x) ≡ ρ(x,y)√Var(x)/Var(y), and α̂ = y̅ – β̂ x̅

Notice notation:

\b̅a̅r̅ and \ĥât ̂to indicate sample averages and best-fit values

Γρεεκ letters for model quantities, Roman for data quantities

3.1.1Error Bars❖ Covariance errors aka curvature errors aka inverse of the Hessian

For Gaussian, ∂²lnL/∂x² ∝ 1/σ² — similarly, compute curvature at best fit and return its inverse as the error

+ easy

– very approximate

❖ Δχ²

Difference from best-fit χ² value is itself a χ² distribution with dof=1, so look for percentiles of that distribution:

Δχ²=+1 ≡ 68% (1σ)

Δχ²=+2.71 ≡ 90% (1.6σ)

+ better than curvature

– gets complicated quickly if parameters are correlated

3.2 Fitting: Goodness-of-fit❖ How good is the model as a description of your data?

❖ How can you tell when you do have a “good” fit?

❖ Recall the log Likelihood — 2× its –ve is called the chi-square,

❖ χ² = ∑k (xk-µk)² / σk²

❖ and its distribution describes the probability of getting (xk,yk) to match “similarly” for several bins

❖ When observed χ² ∼ dof±√2√dof, model is doing excellent job of matching the data. The farther it is from this range, the less likely it is that the model is a good description of the data

❖ But always use your judgement, because this is a probabilistic rule!

❖ Watch out for how σ2 is defined (model variance is better)VLK: CIAO Workshop AAS235/Honolulu 2020 Jan 4

3.3 Fitting: cstat❖ Poisson log Likelihood: –lnΓ(k+1) + k⋅ lnλ – λ

❖ Apply Stirling’s approximation, lnΓ(k+1)=klnk–k

❖ lnPoissonLikelihood = k⋅(lnλ – lnk) + (k – λ)

❖ Just as χ² is -2lnLikelihood,

❖ cstat = 2 ∑i (Mi - Di + Di ⋅ (lnDi – lnMi))

❖ where Di are observed counts, and Mi are model predicted counts in bin i

❖ Watch out: only asymptotically χ², not quite the Poisson likelihood, 0s are thrown away, background must be explicitly modeled

❖ unbiased for low counts than χ², asymptotically χ², rudimentary goodness-of-fit exists (Kaastra 2017, A&A 605, A51) [AnetaS] https://cxc.cfa.harvard.edu/ciao/workshop/jan20/cstat_vs_chisq_SimsNotebook.ipynb

[AnetaS] https://cxc.cfa.harvard.edu/ciao/workshop/jan20/data_for_cstat_vs_chisq_SimsNotebook.tar.gz

Handbook of X-ray Astronomy (Arnaud, Smith, Siemiginowska): https://doi.org/10.1017/CBO9781139034234.008

4. Statistical Tools in CIAO/Sherpa❖ fit: non-linear minimization fitting

❖ projection/conf/covar: uncertainty intervals and error bars

❖ bootstrap/sample_flux: with replacement/parametric bootstrap to get parameter draws/model fluxes

❖ resample_data: to get bootstrap distribution of model parameter draws when data errors are asymmetric

❖ get_draws: MCMC engine pyBLoCXS (Bayesian Low-Counts X-ray Spectral analysis; van Dyk et al. 2001, ApJ 548, 224)

❖ calc_mlr, calc_ftest: model comparison via LRT/F-test

❖ plot_pvalue, plot_pvalue_results: to do posterior predictive p-value checks (Protassov et al. 2002, ApJ 571, 545)

❖ glvary: light curve modeling (Gregory & Loredo 1992, ApJ 398, 146)

❖ celldetect/wavdetect/vtpdetect/mkvtpbkg: source detection in images

❖ aprates: Bayesian aperture photometry (Primini & Kashyap 2014, ApJ 796, 24)

❖ the python interpreter in Sherpa gives access to python libraries, and can be used to call upon packages and libraries in R, which are written by statisticians for statisticians

21VLK: CIAO Workshop AAS235/Honolulu 2020 Jan 4

CIAO Workshop AAS 235/Honolulu 2020 Jan 3-4 Statistics€¦ · 2 = λ VLK: CIAO Workshop...

Documents

Transcript of CIAO Workshop AAS 235/Honolulu 2020 Jan 3-4 Statistics€¦ · 2 = λ VLK: CIAO Workshop...

Stability of Talagrand's Gaussian Transport-Entropy Inequality

REPRESENTATION OF GAUSSIAN PROCESSES EQUIVALENT TO … · 2016. 5. 25. · REPRESENTATION OF GAUSSIAN PROCESSES 301 3. Necessary condition and uniqueness The purpose of this section

Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Gaussian noise stability - Simons Institute for the Theory of … · 2020. 1. 3. · Gaussian noise stability Fix a parameter 0

Multivariate Gaussian Process Emulators with …public.lanl.gov/nurban/pubs/multivariate-nonseparable-emulators.pdf · Multivariate Gaussian Process Emulators with Nonseparable Covariance

Multivariate Gaussian Distribution Auxiliary for SF2955 2011 · PDF fileMultivariate Gaussian Distribution Auxiliary for SF2955 2011 Timo Koski Institutionen for matematik Kungliga

Gaussian basis functions arXiv:1412.0649v1 [physics.chem ...

A Fast Approximation Algorithm for the Gaussian Filter · I propose a fast approximation algorithm for the Gaussian ﬁlter, which computes one Gaussian-ﬁltered pixel in O(1) in

ΠPNBET, POCCИЯ! Ciao Russia lItalia incontra la Russia con le eccellenze del pensiero e del fare Promozione Straordinaria del Made in Italy 2005.

Non-gaussian geometrical measures in redshift space · 2019. 4. 11. · Redshift spacenon-Gaussian ﬁeldsAnistropic non-Gaussian ﬁeldsSummary Density in redshift space ˆs(X,v)=

RadialBasisFunctionRegularizationforLinearInverse ...my2550/papers/inverse.pdf · RadialBasisFunctionRegularizationforLinearInverse ProblemswithRandomNoise ... = e−̺r2/2 (Gaussian),

CS 340 Lec. 18: Multivariate Gaussian Distributions and ...arnaud/cs340/lec18_gausslda_handouts.pdf · CS 340 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant

Day 1 – Hello, bonjour, γειά σου, ciao, hallå, oláeuropeanpolytheatre.eu/wp-content/uploads/2014/10/European-Polyt… · Day 1 – Hello, bonjour, γειά σου, ciao,

a b - Nature · a b c 532 nm 307 pixels k = 1.733 nm/pixel λmax = k·d d Raw data Gaussian fit λ = Supplementary Figure S1. Illustrations of single particle spectral measurement.

Distribusi Multivariate Gaussian - raw.githubusercontent.com · Distribusi Multivariate Gaussian Ali Akbar Septiandri Universitas Al Azhar Indonesia June 26, 2019 1 of 26

Learning Gaussian Covariance Robustlydakane/Covariance.pdf · 2019. 6. 17. · Learning Gaussian Covariance Robustly Daniel M. Kane Departments of CS/Math University of California,

Conformal Loop Ensembles and the Gaussian Free Fieldmath.mit.edu/~sswatson/pdfs/CLEtalk.pdf · Conformal Loop Ensembles and the Gaussian Free Field Sam Watson PuMaGraSS 14 October,

Gaussian Processes - Home | Duke University

Gaussian multiplicative chaos revisited

GAUSSIAN MEASURE vs LEBESGUE MEASURE AND ELEMENTS …web.iku.edu.tr/ias/documents/karakoysem.pdf · GAUSSIAN MEASURE vs LEBESGUE MEASURE AND ELEMENTS OF MALLIAVIN CALCULUS λ: ...