101 a b 101 - · PDF fileναοδομία, τον εσωτερικό διάκο-σμο του ναού, τις εικόνες,
Statistics 101
-
Upload
olivier-teytaud -
Category
Education
-
view
111 -
download
0
Transcript of Statistics 101
A little bit of statistics
P( waow | news ) = ?
Posterior probability
● In case of independent items,● P( Observations | Θ) = product of
P( Observation1 | Θ)
x P( Observation2 | Θ)
x …
x P( ObservationZ | Θ)
Bayes theorem
● Bayes :
P( Θ | observations) P(observations)
= P( observations | Θ) P(Θ)
● So :
P( Θ | observations) = P(observations | Θ)
x P(Θ) / P(observation)
So, by independ. Items + Bayes,
● P( Θ | observations ) is proportional to
P(Θ) x P( obs1 | Θ) x … x P(obsZ | Θ)
● Definitions :– MAP (maximum a posteriori) : find Θ* such that
P(Θ*|observations) is max
– BPE (Bayesian posterior expectation): find ΘE = expectation of (Θ|observations)
– Maximum likelihood : P(Θ) uniform
– there are other possible tools
– ErrorEstimate = Expect. (Θ – estimator)2
log-likelihood
● Instead of probas, use log-probas.● Because :
– Products become sums ==> more precise on a computer for very small probabilities
Finding the MAP (or others estimates)
● Dimension 1 :– Golden Search (unimodal)
– Grid Search (multimodal, slow)
– Robust search (compromise)
– Newton Raphson (unimodal, precise expensive computations)
● Dimension large :– Jacobi algorithm
– Or Gauss-Seidel, or Newton, or NewUoa, or ...
Jacobi algorithm for maximizing in dimension D>1
● x=clever initialization, if possible
● While ( ||x' – x|| > epsilon )
– x'=current x
– For each parameter x(i), optimize it
● by a 1Dim algorithm● with just a few iterates
Jacobi = great when the objective function
– can be restricted to 1 parameter
– and then be much faster
Jacobi algorithm for maximizing in dimension D>1
● x=clever initialization, if possible
● While ( ||x' – x|| > epsilon )
– x'=current x
– For each parameter x(i), optimize it
● One iteration of robust search● But don't decrease the interval if optimum = close to current bounds
Jacobi = great when the objective function
– can be restricted to 1 parameter
– and then be much faster
Possible use
● Computing student's abilities, given item parameters
● Computing item parameters, given student abilities
● Computing both item parameters and student abilities (need plenty of data)
Priors
● How to know P(Θ) ?● Keep in mind that difficulties and abilities are
translation invariant– ==> so you need a reference
– ==> possibly reference = average Θ = 0
● If you have a big database and trust your model (3PL ?), you can use Jacobi+MAP.
What if you don't like Jacobi's result ?
● Too slow ? (initialization, epsilon larger, better 1D algorithm, better implementation...)
● Epsilon too large ?
● Maybe you use Map whereas you want Bpe ?==> If you get convergence and don't like the result, it's not because of Jacobi, it's because of the criterion.
● Maybe not enough data ?
Initializing IRT parameters ?
● Roughy approximations for IRT parameters :– Abilities (Θ)
– Item parameters (a,b,c in 3PL models)
● Priors can be very convenient for that.
Find Θ with quantiles !1. Rank students per performance.
Find Θ with quantiles !2. Cumulative distribution
ABILITIES
Find Θ with quantiles !3. Projections
Mediumstudent
BestN/(N+1)
Worst1/(N+1)
ABILITIES
Find Θ with quantiles !3. Projections
Mediumstudent
BestN/(N+1)
Worst1/(N+1)
ABILITIES
Equation version for approximating abilities Θ
if you have a prior (e.g. Gaussian), then a simple solution : – Rank students per score on the test
– For student i over N, Θ initialized at the prior's quantile 1 – i/(N+1)
E.g. With Gaussian prior mu, sigma,
then ability(i)=mu+sigma*norminv(1-i/(N+1))
With norminv e.g. as in http://www.wilmott.com/messageview.cfm?catid=10&threadid=38771
Equation version for approximating item parameters
Much harder !
There are formulas based on correlation. It's a very rough approximation.
How to estimate b if c=0 ?
Approximating item parameters
Much harder !
There are formulas based on correlation. It's a very rough approximation.
How to estimate b=difficulty if c=0 ?
Simple solution :– Assume a=1 (discrimination)
– Use the curve, or approximate
b = 4.8 x (1/2 - proba(success))
– If you know students' abilities, it's much easier
And for difficulty of items ?Use curve or approximation...
Codes
● IRT in R : there are packages, it's free, and R is a widely supported language for statistics.
● IRT in Octave : we started our implementation, but still very preliminary :– No missing data (the main strength of IRT) ==>
though this would be easy
– No user-friendly interface to data
● Others ? I did not check● ==> Cross-validation for comparing ?
How to get the percentile from the ability
● percentile is norm-cdf( (theta*-mu)/sigma).(some languages have normcdf included)
● Slow/precise implementation of norm-cdf: http://stackoverflow.com/questions/2328258/cumulative-normal-distribution-function-in-c-c
● Fast implementation of norm-cdf: http://finance.bi.no/~bernt/gcc_prog/recipes/recipes/node23.html
● Maybe fast Exp, if you want to save up time :-)