Envelopes and tensor linear regression · 2016. 5. 9. · Figure 1: Visualize the working mechanism...

1
Envelopes and tensor linear regression Xin (Henry) Zhang; Department of Statistics, Florida State University supported by FSU-CRC FYAP program in Summer 2015 1. Envelopes in multivariate linear model Multivariate linear model of Y i R r on X i R p : Y i = α + β X i + i ,i =1,...,n, (1) where i is i.i.d. error with mean 0 covariance Σ > 0, and is independent of X i . Goal: efficient estimation of β R r ×p and Σ R r ×r . Response Envelope model: Suppose there is a subspace E⊆ R r , and let P E and Q E = I r - P E denote projections onto E and E , such that Q E Y|X Q E Y, Q E Y ⊥⊥ P E Y|X. (2) P E Y: material part Q E Y: immaterial part Equivalently: span(β ) ⊆E , Σ = P E ΣP E + Q E ΣQ E . (3) The envelope is then the smallest such subspace E . Parameters in the envelope regression: β = Γθ , Σ = ΓΩΓ T + Γ 0 Ω 0 Γ T 0 , (4) where Γ R r ×u is a semi-orthogonal basis for the envelope E Σ (β ), Γ 0 R r ×(r -u) is the orthog- onal completion of Γ, θ R u , Ω R u×u and Ω 0 R (r -u)×(r -u) . 2. An example: Cattle data from Kenward (1987) Compare two treatment for the control of the parasite 30 cows were randomly assigned to each treatment Weights were measured at weeks 2, 4, ..., 18, 19 The model is Y i = α + β X i + i , where Y i R 10 is the weight profile of each cow and X i ∈{0, 1} indicating two groups. Standard estimation: b β OLS = Y 1 - Y 0 . Envelope estimation: b β Env = b Γ b θ estimated via maximizing the likelihood function. Comparing two methods: the bootstrap standard error of each regression coefficient b β Env,k is 2.6 to 5.9 times smaller than that of b β OLS,k for k =1,..., 10. 240 260 280 300 320 340 360 260 280 300 320 340 360 Weight on week 12 Weight on week 14 E E S Γ T 0 Y Estimated Envelope Γ T Y Figure 1: Visualize the working mechanism of envelope regression: a simpler regression problem of the bivariate response Y =(Y 6 ,Y 7 ) on the binary predictor X of the cattle data. 3. Envelope models and methods for tensor regression Motivations: 1. Data in the form of tensor (multidimensional array) are becoming more and more common in both scientific and business applications, especially in brain imaging analysis. 2.Envelope method is a new and fast evolving tool for dimension reduction and improving effi- ciency in multivariate parameter estimation. Substantial gains are achievable by incorporating envelope method to classical regression problems such as OLS, PLS, RRR, GLM, etc. 3. We propose a parsimonious tensor envelope regression of a tensor-valued response on a scalar- or vector-valued predictor. It models all voxels of the tensor response jointly, while ac- counting for the inherent structural information among the voxels. Efficiency gain is achieved with improved interpretation. Some tensor notations: Multidimensional array A R r 1 ×···×r m is called an m-th order tensor. Mode-k matricization turns a tensor A into a matrix A (k ) R r k ×( Q j 6=k r j ) . Mode-k product of a tensor A and a matrix B R d×r k is defined as A × k B R r 1 ×···×r k-1 ×d×r k+1 ×···×r m . We write A = JC; B (1) ,..., B (m) K for the Tucker decomposition, which is defined as A = C × 1 B (1) × 2 ··· × m B (m) , where C R d 1 ×···×d m is the core tensor and B (k ) R r k ×d k , k =1,...,m, are factor matrices. Tensor response regression Y i R r 1 ×···×r m tensor-valued response on X i R p vector-valued predictor, i =1,...,n i.i.d. samples. ε i R r 1 ×···×r m error tensor with mean 0 and covariance cov{vec(ε)} = Σ of size ( Q m k =1 r k ) 2 . We assume a separable Kronecker covariance structure: Σ = Σ m ⊗···⊗ Σ 1 . Tensor linear model: Y i = B × (m+1) X i + ε i , i =1,...,n. (5) Vectorized model: vec(Y i )= B T (m+1) X i + vec(ε i ). Goal: estimating B R r 1 ×···×r m ×p . For example, a standard way is fitting individual elements of Y on X one-at-a-time. Tensor envelope: T Σ (B)= E Σ m (B (m) ) ⊗···⊗E Σ 1 (B (1) ) is the intersection of all reducing sub- spaces E of Σ = Σ m ⊗···⊗ Σ 1 that contain span(B T (m+1) ) and can be written as E = E m ⊗···⊗E 1 , where E k R r k , k =1,...,m. Tensor envelope parameterization: Let (Γ k , Γ 0k ) R r k ×r k be an orthogonal matrix such that span(Γ k )= E Σ k (B (k ) ), Γ k R r k ×u k . Regression coefficient tensor B = JΘ; Γ 1 ,..., Γ m , I p K for some Θ R u 1 ×···×u m ×p Covariance matrices Σ k = Γ k Ω k Γ T k + Γ 0k Ω 0k Γ T 0k , k =1,...,m Total number of parameters is reduced by p{ m Y k =1 r k - m Y k =1 r k } 4. Estimation 1. Initialize B (0) and Σ (0) = Σ (0) m ⊗···⊗ Σ (m) from standard methods. 2. [Numerical Grassmannian optimization] Estimate envelope basis {Γ k } m k =1 based on B (0) and Σ (0) . The 1D envelope algorithm (Cook and Zhang 2014) is used to obtain a stable and n- consistent envelope basis estimates. 3. [Analytical solutions] Estimate other parameters Θ, {Ω k } m k =1 and{Ω 0k } m k =1 based on {Γ k } m k =1 . 4. [Analytical solutions] Obtain B and Σ from the envelope parameterization. 5. Some numerical results Figure 2: Comparison with OLS: The true and estimated regression coefficient tensors under various signal shapes and signal-to-noise ratios (SNR). 5.1 Simulations To visualize the regression coefficient tensor B and its estimators, we consider the following matrix-valued (order-2 tensor) response regression model, Y i = BX i + σ · i , ,i =1,...,n, X i is either 0 or 1; i follows a matrix normal distribution with covariance kΣ 1 k F = kΣ 2 k F =1, σ> 0 controls the signal-to-noise-ratio (SNR) Y i , i and B all have the same dimension 64 × 64 Sample size is small: n = 20 5.2 ADHD data analysis 285 combined ADHD subjects and 491 normal controls comparing two groups after adjusting for age and sex (i.e. number of predictors p =3) downsized MRI images from 256 × 198 × 256 to 30 × 36 × 30 B has the dimension 30 × 36 × 30 × 3 97, 200 coefficients Figure 3: ADHD Coefficients. Top row: u 1 = u 2 = 10 and u 3 varies as {1, 2, 10, 20}, where u 1 = u 2 = u 3 = 10 is selected by BIC if we force the three dimensions to be the same. Bottom row: (u 1 ,u 2 ,u 3 ) varies as {(8, 9, 1), (9, 10, 2), (10, 11, 3), (30, 30, 36)(OLS )}, where (u 1 ,u 2 ,u 3 ) = (9, 10, 2) is selected by BIC. Figure 4: ADHD P-value maps. Red regions represent p< 0.05. 6. Key References C OOK , R.D. AND Z HANG , X. (2015), Foundations for envelope models and methods, J. of Amer. Stat. Assoc., 110, 599–611. C OOK , R.D. AND Z HANG , X. (2016), Algorithms for envelope estimation, J. of Comput. Graph. Stat., In press. L I ,L. AND Z HANG , X. (2015), Parsimonious Tensor Response Regression, arXiv preprint arXiv:1501.07815

Transcript of Envelopes and tensor linear regression · 2016. 5. 9. · Figure 1: Visualize the working mechanism...

Page 1: Envelopes and tensor linear regression · 2016. 5. 9. · Figure 1: Visualize the working mechanism of envelope regression: a simpler regression problem of the bivariate response

Envelopes and tensor linear regressionXin (Henry) Zhang; Department of Statistics, Florida State University

supported by FSU-CRC FYAP program in Summer 2015

1. Envelopes in multivariate linear model

Multivariate linear model of Yi ∈ Rr on Xi ∈ Rp:

Yi = α + βXi + εi, i = 1, . . . , n, (1)

where εi is i.i.d. error with mean 0 covariance Σ > 0, and is independent of Xi.Goal: efficient estimation of β ∈ Rr×p and Σ ∈ Rr×r.Response Envelope model: Suppose there is a subspace E ⊆ Rr, and let PE and QE = Ir −PEdenote projections onto E and E⊥, such that

QEY|X ∼ QEY, QEY ⊥⊥ PEY|X. (2)

•PEY: material part•QEY: immaterial part

Equivalently:span(β) ⊆ E , Σ = PEΣPE + QEΣQE . (3)

The envelope is then the smallest such subspace E .Parameters in the envelope regression:

β = Γθ, Σ = ΓΩΓT + Γ0Ω0ΓT0 , (4)

where Γ ∈ Rr×u is a semi-orthogonal basis for the envelope EΣ(β), Γ0 ∈ Rr×(r−u) is the orthog-onal completion of Γ, θ ∈ Ru, Ω ∈ Ru×u and Ω0 ∈ R(r−u)×(r−u).

2. An example: Cattle data from Kenward (1987)

•Compare two treatment for the control of the parasite• 30 cows were randomly assigned to each treatment•Weights were measured at weeks 2, 4, ..., 18, 19

The model is Yi = α+βXi+εi, where Yi ∈ R10 is the weight profile of each cow and Xi ∈ 0, 1indicating two groups.• Standard estimation: βOLS = Y1 −Y0.• Envelope estimation: βEnv = Γθ estimated via maximizing the likelihood function.•Comparing two methods: the bootstrap standard error of each regression coefficient βEnv,k is

2.6 to 5.9 times smaller than that of βOLS,k for k = 1, . . . , 10.

240 260 280 300 320 340 360

260

280

300

320

340

360

Weight on week 12

Wei

ght o

n w

eek

14

E

E S

ΓT

0 Y

Estimated Envelope ΓTY

Figure 1: Visualize the working mechanism of envelope regression: a simpler regressionproblem of the bivariate response Y = (Y6, Y7) on the binary predictor X of the cattle data.

3. Envelope models and methods for tensor regression

Motivations:1. Data in the form of tensor (multidimensional array) are becoming more and more common in

both scientific and business applications, especially in brain imaging analysis.2. Envelope method is a new and fast evolving tool for dimension reduction and improving effi-

ciency in multivariate parameter estimation. Substantial gains are achievable by incorporatingenvelope method to classical regression problems such as OLS, PLS, RRR, GLM, etc.

3. We propose a parsimonious tensor envelope regression of a tensor-valued response on ascalar- or vector-valued predictor. It models all voxels of the tensor response jointly, while ac-counting for the inherent structural information among the voxels. Efficiency gain is achievedwith improved interpretation.

Some tensor notations:•Multidimensional array A ∈ Rr1×···×rm is called an m-th order tensor.

•Mode-k matricization turns a tensor A into a matrix A(k) ∈ Rrk×(∏

j 6=k rj).

•Mode-k product of a tensor A and a matrix B ∈ Rd×rk is defined as A ×k B ∈Rr1×···×rk−1×d×rk+1×···×rm.•We write A = JC;B(1), . . . ,B(m)K for the Tucker decomposition, which is defined as A =

C ×1 B(1) ×2 · · · ×m B(m), where C ∈ Rd1×···×dm is the core tensor and B(k) ∈ Rrk×dk,k = 1, . . . ,m, are factor matrices.

Tensor response regression•Yi ∈ Rr1×···×rm tensor-valued response on Xi ∈ Rp vector-valued predictor, i = 1, . . . , n i.i.d.

samples.• εi ∈ Rr1×···×rm error tensor with mean 0 and covariance covvec(ε) = Σ of size (

∏mk=1 rk)

⊗2.•We assume a separable Kronecker covariance structure: Σ = Σm ⊗ · · · ⊗Σ1.• Tensor linear model:

Yi = B×(m+1) Xi + εi, i = 1, . . . , n. (5)

• Vectorized model: vec(Yi) = BT(m+1)

Xi + vec(εi).

•Goal: estimating B ∈ Rr1×···×rm×p. For example, a standard way is fitting individual elementsof Y on X one-at-a-time.

Tensor envelope: TΣ(B) = EΣm(B(m)) ⊗ · · · ⊗ EΣ1

(B(1)) is the intersection of all reducing sub-spaces E of Σ = Σm⊗· · ·⊗Σ1 that contain span(BT

(m+1)) and can be written as E = Em⊗· · ·⊗E1,

where Ek ⊆ Rrk, k = 1, . . . ,m.Tensor envelope parameterization:• Let (Γk,Γ0k) ∈ Rrk×rk be an orthogonal matrix such that span(Γk) = EΣk

(B(k)), Γk ∈ Rrk×uk.•Regression coefficient tensor

B = JΘ;Γ1, . . . ,Γm, IpK for some Θ ∈ Ru1×···×um×p

•Covariance matricesΣk = ΓkΩkΓ

Tk + Γ0kΩ0kΓ

T0k, k = 1, . . . ,m

• Total number of parameters is reduced by

pm∏k=1

rk −m∏k=1

rk

4. Estimation

1. Initialize B(0) and Σ(0) = Σ(0)m ⊗ · · · ⊗Σ(m) from standard methods.

2. [Numerical Grassmannian optimization] Estimate envelope basis Γkmk=1 based on B(0) andΣ(0). The 1D envelope algorithm (Cook and Zhang 2014) is used to obtain a stable and

√n-

consistent envelope basis estimates.3. [Analytical solutions] Estimate other parameters Θ, Ωkmk=1andΩ0kmk=1 based on Γkmk=1.4. [Analytical solutions] Obtain B and Σ from the envelope parameterization.

5. Some numerical results

Figure 2: Comparison with OLS: The true and estimated regression coefficient tensors undervarious signal shapes and signal-to-noise ratios (SNR).

5.1 SimulationsTo visualize the regression coefficient tensor B and its estimators, we consider the followingmatrix-valued (order-2 tensor) response regression model,

Yi = BXi + σ · εi, , i = 1, . . . , n,

Xi is either 0 or 1; εi follows a matrix normal distribution with covariance ‖Σ1‖F = ‖Σ2‖F = 1,σ > 0 controls the signal-to-noise-ratio (SNR) Yi, εi and B all have the same dimension 64× 64

Sample size is small: n = 20

5.2 ADHD data analysis285 combined ADHD subjects and 491 normal controlscomparing two groups after adjusting for age and sex (i.e. number of predictors p = 3)downsized MRI images from 256× 198× 256 to 30× 36× 30

B has the dimension 30× 36× 30× 3⇒ 97, 200 coefficients

Figure 3: ADHD Coefficients. Top row: u1 = u2 = 10 and u3 varies as 1, 2, 10, 20, whereu1 = u2 = u3 = 10 is selected by BIC if we force the three dimensions to be the same. Bottom row:(u1, u2, u3) varies as (8, 9, 1), (9, 10, 2), (10, 11, 3), (30, 30, 36)(OLS), where (u1, u2, u3) = (9, 10, 2)is selected by BIC.

Figure 4: ADHD P-value maps. Red regions represent p < 0.05.

6. Key References

COOK, R.D. AND ZHANG, X. (2015), Foundations for envelope models and methods, J. of Amer. Stat. Assoc., 110,599–611.COOK, R.D. AND ZHANG, X. (2016), Algorithms for envelope estimation, J. of Comput. Graph. Stat., In press.LI,L. AND ZHANG, X. (2015), Parsimonious Tensor Response Regression, arXiv preprint arXiv:1501.07815