Model-Check Based on Residual Partial Sums Process of Heteroscedastic spatial Linear Regression Models

Written by Wayan Somayasa
on November 09, 2021

Jurnal Matematika Vol. 2 No. 1, Desember 2011. ISSN : 1693-1394

Model-Check Based on Residual Partial Sums

Process of Heteroscedastic spatial Linear Regression Models

Wayan Somayasa

Jurusan Matematika FMIPA Universitas Haluoleo Kendari 93232

Abstract: It is common in practice to evaluate the correctness of an assumed linear regression model by conducting a model-check method in which the residuals of the observations are investigated. In the asymptotic context instead of observing the vector of the residuals directly, one investigates the partial sums of the observations. In this paper we derive a functional central limit theorem for a sequence of residual partial sums processes when the observations come from heteroscedastic spatial linear regression models. Under a mild condition it is shown that the limit process is a function of Brownian sheet. Several examples of the limit processes are also discussed. The limit theorem is then applied in establishing an asymptotically Kolmogorov type test concerning the adequacy of the fitted model. The critical regions of the test for finite sample sizes are constructed by Monte Carlo simulation.

Keywords: heteroscedastic linear regression model, least squares residual, partial sums process, Brownian sheet, asymptotic model-check.

1. Introduction

Let us consider an experiment performed under n × n experimental conditions taken from a regular lattice given by

Ξ_n := {(l/n, k/n) : 1 ≤ l,k ≤ n}, n ∈ N.

Without loss of generality we consider the unit square I := [0, 1] × [0, 1] as an experimental region instead of any compact subset of R² . For convenience we take the observations carried out in Ξn row-wise initializing at the point (1 /n, 1 /n) and put them together in an n × n matrix Y(Ξ_n) : = (Y_lk)n=n_l =ι ∈ Rn×n, where the observation in the point (l/n,k/n) is denoted by Y_lk, 1 ≤ l,k ≤ n. Consequently, we have a sequence of observable random matrices (Y(Ξ_n))_n≥ ₁ C Rⁿ×ⁿ. As usual we furnish the vector space Rⁿ×ⁿ with the Euclidean inner product

< A, B > _r n×n := trace (A^τ B), A, B ∈ R ⁿ×ⁿ.

Let fi,..., f_p : I → R be known, real-valued regression functions defined on I. For a real-valued function f defined on I, let f (Ξ_n) := (f (l/n,k/n))n=n_ι=₁ ∈ Rⁿ×ⁿ. Our aim is to construct an asymptotic test procedure for the hypothesis

^p

H0 : Y(Ξn) = βifi(Ξn)+Envs.H1 : Y(Ξn) = g(Ξn) + En, (1)

i=1

where (β ₁,..., β_p)^τ =: β ∈ R^p is a vector of unknown parameters, E_n := (ε_lk)n=1 _e=₁ is an n × n random matrix whose components are independent, real-valued random variables ε_lk , 1 ≤ l, k ≤

n, defined on a common probability space (Ω, F, P) having mean 0 and variance σlk, 0 < σ2_k < ∞, 1 ≤ ℓ, k ≤ n, and g : I → R is the unknown true regression function. Thus, under nullhypothesis we consider a heteroscedastic linear model, while under the alternative we assume a non-parametric heteroscedastic regression model. It is worth mentioning that under H₀ and Hι we need not to assume any specific distribution for the random errors εℓk , 1 ≤ ℓ, k ≤ n. Under the assumption fι(Ξ_n),..., f_p(Ξ_n) are linearly independent in Rⁿ×ⁿ, the corresponding matrix of least squares residuals of the observations under H₀ is given by

_χn, n P ∖^fi^(Ξn)^, En)Rⁿ×ⁿ^fi^(Ξn)

Rⁿ := ⁽r^lk)k=1 ,ι=1 = Eⁿ > , . . , _μ .

^=1 ∖^fi (Ξn)^{, f}i (Ξn))Rⁿ×ⁿ

Recently, for a fixed n ≥ 1, MacNeill and Jandhyala [7], and Xie and MacNeill [11] define an operator T_n : R ⁿ×ⁿ → C (I), given by

[nz2] [nz 1]

Tn(A)(z 1 , z2) := y y aIk

k=1 I =1

[nz2] [nz1]

+ (nz 1 - [nz 1]) ^ a[nz 1]+1 I + (nz2 — [nz2]) ^ ^ak,[nz2]+1 I=1 k=1

+ (nz 1 - [nz 1])(nz2 - [nz2])a[nz ₁]+1,[nz₂]+1, (z 1 ,z2) ∈ I,

for every A = (a_lk)n=,= =₁, where [t] := max{n ∈ N : n ≤ t}, t ∈ R and T_n(A)(t,s) = 0, if t = 0 or s = 0. Here C(I) is the space of continuous functions on I furnished with the supremum norm. By the operator T_n, the matrix of the least squares residuals is induced into a stochastic process {T_n(R_n)(t, s) : (t, s) ∈ I} having sample paths in C(I). Let us call this process residual partial sums process. It is common in practice to test (1) by investigating a functional of the residual partial sums process such as a Kolmogorov type statistic, defined by K_n := max₁ <_l,_k<_n T_n(R_n)(l∕n,k∕n). Therefore in order to establish this test problem we need to investigate the limit process of the sequence {T_n(R_n)(t,s) : (t,s) ∈ I}_n> ₁ under H₀ as well as under H₁. In MacNeill and Jandhyala [7] and in Xie and MacNeill [11] the limit process of this sequence was derived explicitly in which homoscedasticity was assumed, i.e. σ2_k = σ², for 1 ≤ l,k ≤ n. It was shown therein that under the condition of the regression functions are continuously differentiable, the limit process is a complicated function of the Brownian sheet. In Somayasa [10] the limit process of such a sequence was also derived by generalizing the approach of Bischoff [4] from one to higher dimensional case. In contrast to the result of MacNeill and Jandhyala [7] and Xie and MacNeill [11], Somayasa [10] got the structure of the limit process as a projection of the Brownian sheet onto its reproducing kernel Hilbert space. In this paper we establish the limit process of the heteroscedastic linear regression model defined above, see Section 2. In Section 3 we discuss examples of the limit process corresponding to polynomial models. In Section 4 we construct the critical region of the Kolmogorov type test.

2. Residual partial sums limit process

In the sequel we characterize the heteroscedasticity of the regression model by defining a function h : I → R>₀, such that σj_k = h(l/n, k/n), 1 ≤ l,k ≤ n, n ∈ N, where h is assumed to be a function of bounded variation in the sense of Vitali, see Clarkson and Adams [5].

Definition 2.1. A stochastic process {Bh(t, s) : (t, s) ∈ I} is called a h-Brownian sheet on C(I), if

1.

2.

3.

Bh(t, s) = 0 almost surely (a.s.), if t = 0 or s = 0.

For every rectangle [1₁ ,t₂] × [s ₁, s₂] ⊂ 1, 0 ≤ 1₁ ≤ 1₂ ≤ 1, 0 ≤ s ₁ ≤ s₂ ≤ 1,

^δ[11 ,t2]× [s 1 ,s2]Bh ^N ^⁰^, Jt t [ ^{h dλ}^ ^,

where ∆[11 ,t2] × [s 1 ,s2] Bh := Bh (12,s2) - Bh (11, s2) - Bh (12, s 1) + Bh (11, s 1), and λi is the Lebesgue measure on I. Random variable ∆[ ₁₁ ,_t₂] × [ _s₁ ,_s₂] B_h is called the increment of B_h over [ 1₁ ,t ₂] × [ s ₁, s ₂]. For any two rectangles I₁ ⊂ I, I₂ ⊂ I with I₁ ∩ I₂ = 0, ∆_i₁ B_h and ∆_i₂ B_h are mutually independent.

We refer the reader to MacNeill and et al. [?] for the existence of such a process. In case h is a constant function, Bh is the Brownian sheet whose existence has been studied by Yeh [12], Kuelbs [6], and Park [9]. As a consequent of Definition 2.1, the covariance function of Bh is given by

KB_h (11 ,s 1; 12 ,s 2):= Cov (Bh (11 ,s 1) ,Bh (12 ,s 2))= / h dλ i,

J [0,t 1 Λt2] × [0,s 1 Λs2]

(1₁, s 1), (1₂, s₂) ∈ I, where x ∧ y stands for the minimum between x and y.

Theorem 2.1. Let (E_n)_n≥ ₁, E_n := (ε_lk)n=n₁₁ =₁ be a sequence of n × n random matrix such that ε_lkare mutually independent with E(ε_ℓk) = 0 and V ar (ε_ℓk) = h(ℓ/n, k/n), 1 ≤ ℓ, k ≤ n, n ≥ 1. Then n Tn (En) -→ B_h, as n → ∞, in C (I). Here -→ stands for the convergence in distribution (weakly), see Billingsley [2], p. 23.

Proof. See MacNeill and et al. [?].

Theorem 2.2. Let f₁, . . . , f_p be continuous and have bounded variation in the sense of Hardy (Clarkson and Adams [5]) on I. If f₁,..., f_p are linearly independent in L₂(I,λ ₁), where L₂(I,λ ₁) is the Hilbert space of squared integrable functions on I with respect to λI , then

1m D

_nTn(Rn) —→ B_h f, as n → ∞, in C (I),

where

Bhf (t, s) := Bh

I f^τ dλI

[0,t] × [0,s]

, (t, s) ∈ I,

[0 ,t] × [0 S ]

f1 dλ I,...,

J [0,t] × [0,s]

fp dλI

J⁽R’ f dBh :

dBh , .

.

., J (R) f p dBh ) .

Here W := (∫₁ f_i f j dλ ₁) p^,p₁j.=₁ ∈ R^p×^p is invertible. Furthermore Bhff function given by

KB_h_i ⁽^t,s; t'^,s') := Co^v (B_h, j⁽t,s) ^,B_h, f⁽t' ,s'))

is a process with the covariance

=

J [0,tΛt'][0,sΛs']

I f^τ dλI [0,t^']×[0,s^']

^˜f h dλI [0,t]×[0,s]

_-

+

f^τ

[0,t]×[0,s]

( f^τ

[0,t]×[0,s]

dλI

^˜fh dλI

[0,t'] × [0,s’] /

dλ i) W ^-¹ (J f i f j hdλ i^

p, p

W -¹i=1 ,j=1

Here and in the sequel ∫(R) denotes Riemann-Stieltjes integral, see Young [13] and Somayasa [10], p. 115.

Proof. The proof of Theorem 2.2 in Bischoff [3] and the result of Bischoff [4] can be extended to the case of higher experimental regions.

3. Examples

In this section we discuss several examples of the residual partial sums limit processes of constant, first-order and second-order regression models.

3.1. Constant regression model

As a simple case, we consider a constant model, i.e. Y(Ξ_n) = βf₁ (Ξ_n) + E_n, where β is an unknown parameter and f₁ (t, s) = 1, for (t, s) ∈ I. Then the residual partial sums limit process of this model is given by

Bh,f˜₀(t,s) := Bh(t, s) - tsBh(1, 1), (t, s) ∈I,

which is the standard Brownian bridge when h is constant, see e.g. McNeill and Jandhyala [7] and Somayasa [10], p. 20.

3.2. First order regression model

Let us consider a first-order regression model

Y(Ξn) = β1f1(Ξn) + β2f2(Ξn) + β3f3(Ξn) + En

where β1 , β2 and β3 are unknown parameters, f1 (t, s) = 1, f2 (t, s) = t and f3(t, s) = s, for (t, s) ∈ I. Associated to this model we have

(t, s) ∈ I.

Wayan Somayasa/ Model check for heteroscedastic spatial linear model 15

where β1 , β2, β3, β4, β5 and β6 are unknown parameters, f1 (t, s) = 1, f2 (t, s) = t, f3(t, s) = s, f4(t, s) = t², f5 (t, s) = ts, f6 (t, s) = s², for (t, s) ∈ I. Accordingly the matrix W and W^-¹ are given

by

∕ 1 1 / 2 1 / 2 1 / 3 1 / 4 1 / 3 ∖

1/2 1/3 1/4 1/4 1/6 1/6

_W 1/2 1/4 1/3 1/6 1/6 1/4

= 1 / 3 1 / 4 1 / 6 1 / 5 1 / 8 1 / 9 ^,

1 / 4 1 / 6 1 / 6 1 / 8 1 / 9 1 / 8

1/3 1/6 1/4 1/9 1/8 1/5

∕ 26 -54 -54 30 36 30 ∖

-54 228 36 -180 -72 0
-54 36 228 0 -72 -180

W¹= .

30 -180 0 180 0 0

36 - 72 - 72 0 144 0

30 0 -180 0 0 180

Let y1 , y2, y3, y4, y5, y6 : I → R be functions of I defined by y1 (t, s) := 26ts - 27t²s - 27ts² + 10t³s + 9t²s² + 10ts³, y2(t, s) := -54ts + 114t²s + 18ts² - 60t³s - 18t²s² - 60ts³, y3(t, s) := -54ts + 18t²s + 114ts² - 18t²s² - 60ts³, y₄(t, s) := 30ts - 90t²s + 60t³s, y₅(t, s) := 36ts - 36t²s - 36ts² + 36t²s² and y₆(t, s) := 30ts - 90ts² + 60ts³. The residual partial sums limit process of this model can be expressed

by

B_h,_f˜₂(t,s) := Bh(t, s) - y1(t,s)Bh(1, 1)

- y2(t,s) Bh(1, 1) - _{[0 1]} Bh(t, s)dt
- y3(t, s) Bh(1, 1) - _{0 1} Bh(1, s)ds
- y4(t,s) Bh(1, 1) - _[0_,_1] 2Bh(t, 1)tdt
- y5(t,s) Bh(1, 1) - Bh(t, 1)dt - Bh(1, s)ds + Bh(t,s)dtds

[0,1] [0,1] [0,1]

- y6(t,s) Bh(1, 1) - _[01] 2Bh(1, s)sds , (t, s) ∈ I.

4. Kolmogorov type test

Kolmogorov type test for Hypotheses (1) is a test based on the statistic K_n,_f := max₀≤_l,_k≤_n n ∑^l_i₌₀ ∑^₌₀ r_ij.

We put r_ij = 0 if i = 0 or j = 0. We note that by the property of the partial sums, it holds

Kn,f = sup₀_≤t,s≤₁_n¹ Tn(Rn)(t, s)

Theorem 4.1. For a fixed α ∈ (0, 1), let c˜_α be the α-quantile of sup₀_≤t,s≤₁ B_h,˜_f (t, s), i.e. a constant such that P sup₀_≤t,s≤₁ B_h,˜_f (t, s) ≤ c˜α = α. Then an asymptotically size α test based on Kn,f is given by

reject H0 if and only if Kn,f ≥ c˜1-α .

Proof. Let χ ⊂ Rⁿ×ⁿ be the sample space of the model. We define a sequence of non randomized test (δ_n)_n≥ι, where δ_n : χ → {0, 1}, such that for Y_n ∈ χ,

^δⁿ⁽^Yⁿ ) : ¹ { Y n _;SUPO ≤t,s≤ 1 i T n (γ n-χ n (χ T χ n ) - 1χ T γ n ) ≥f — ^ } ^,

where 1_A is the indicator function of A. Then by Theorem 2.1 and by the continuity of supremum function, we have

lim E₀ (δ_n) = P < sup ⁿ→∞ [θ≤t,s≤ 1

= P sup

0≤t,s≤1

1

Tn En n

- Xnn (X: Xn ) ^-¹X: En) ≥ C₁-J

Bhff(t,s) ≥ C1-J = α,

where E₀ is the expectation operator under H₀. The proof is complete because the expression lim_n→∞ E₀(δ_n) = α holds uniformly under H₀. □

Since the quantile C(₁_-α) of sup₀≤_t,_s≤ ₁ B_h f (t,s) can not be calculated analytically we approximate the finite sample size quantile of Kn,f by Monte Carlo simulations generated according to following algorithm.

step 1: Fix n₀ ∈ N.

step 2: Generate M i.i.d. pseudo random matrices En₀) := (ε_lkj)n=₁ n=₁, with independent components generated from N (0, h(l/n₀, k/n₀)) random variables, 1 ≤ l,k ≤ n₀, j = 1,..., M.

step 4: Calculate the matrix of residuals Rj by the equation

D (j) ₌ p(j) _ ∑ f^oJsEnO^O^nOf^^

n ⁰ n ⁰ ⅛ (fi (Ξ n 0 ) ,fi (Ξ n O ) > R n 0 ×n 0 '

step 5: Calculate the statistic K^j := max₀ ≤_kfl≤_n₀ T _n₀ (Rjo) (l/n ₀, k/n 0).

step 6: Calculate the simulated (1 — α)-quantiles of sup₀≤_t,_s≤ ₁ B_hff (t, s): Let K_nMfj) be the j’th smallest observation, i.e. K⁽M_f¹) ≤ ... ≤ K⁽M_f^j) ≤ K(M_f^j⁺¹⁾ ≤ ... ≤ K(M_fM), then the simulated n0,f n0,f n0,f n0,f

(1 - α)-quantile is given by
[ KMf ■ M⁽¹^-^α)), if M(1 — α) ∈ N,

c^C(1-α) ⁼

L KM_f[M⁽¹^-α^)]+1), otherwise,

where [M (1 - α)] = max{k ∈ N : k ≤ M (1 - α)}.

The simulation results obtained by using the statistical software package R 2.0.1 are presented in Table 1 for α = 0.005, 0.010, 0.025, 0.050, 0.100, 0.150, 0.200, 0.250, 0.360 and 0.500, with the corresponding sample size n₀ = 30 and the number of replications M = 10⁶.

Models	^c0.5000	^c0.6500	^c0.7500	^c0.8000	^c0.8500
Constant	0.3740	0.4409	0.4839	0.5139	0.5200
First order	0.3691	0.3963	0.4101	0.4179	0.4306
Second order	0.3295	0.3674	0.3873	0.3982	0.4233
Models	^c0.9000	^c0.9500	^c0.9750	^c0.9900	^c0.9950
Constant	0.6342	0.6857	0.7381	0.8209	0.8540
First order	0.4631	0.4798	0.5107	0.5425	0.5538
Second order	0.4380	0.4479	0.4867	0.4976	0.4977

Table 1. The simulated c˜(₁_-α) , for h(t, s) = ts, (t, s) ∈ I.

References

[1] Alexander, K.S., and R. Pyke (1986). A uniform central limit theorem for set-indexed partial-sum processes with finite variance. Annals of Probability, 14, 582-597.
[2] Billingsley, P. (1968). Convergence of Probability Measures. John Wiley & Sons, Inc., New York, Singapore, Berlin, Brisbane.
[3] Bischoff, W. (1998). A functional central limit theorem for regression models. Ann. Stat., 26 (4), 1398-1410.
[4] Bischoff, W. (2002). The structure of residual partial sums limit processes of linear regression models. Theory of Stochastic Processes, 2 (24), 23-28.
[5] Clarkson, J.A. and Adams, C.R. (1933). On definition of bounded variation for functions of two variables. Transactions of the American Mathematical Society, 35 (4), 824-854.
[6] Kuelbs, J. (1968). The invariance principle for a lattice of random variables. The Ann. of Math. Stat., 39 (2), 382-389.
[7] MacNeill, I.B. and Jandhyala, V.K. (1993). Change point methods for spatial data. Multivariate Environmental Statistics, eds. By G.P. Patil and C.R. Rao. Elsevier Science Publisher B.V., 298-306.
[8] MacNeill, I.B., Mao, Y. and Xie, L. (1994). Modeling heteroscedastic age-period-cohort cancer data. The Canadian Journal of Statistics, 22 (4), 529-539.
[9] Park, W.J. (1971). Weak convergence of probability measures on the function space. J. of Multivariate Analysis, 1, 433-444.
[10] Somayasa, W. (2007). Model-checks based on least squares residual partial sums processes, Ph. D. Thesis, Faculty of Mathematic, Karlsruhe Institute of Technology.
[11] Xie, L. and MacNeill, I.B. (2006). Spatial residual processes and boundary detection. South African Statist. J., 40 (1), 33-53.
[12] Yeh, J. (1960). Wiener measure in a space of functions of two variables. Trans. Amer. Math. Soc., 95, 433-450.
[13] Young, W.H. (1917). On multiple integration by parts and the second theorem of mean. The Ann. Math. Stat., 43 (4), 1235-1246.

Model-Check Based on Residual Partial Sums Process of Heteroscedastic spatial Linear Regression Models

Model-Check Based on Residual Partial SumsProcess of Heteroscedastic spatial Linear Regression Models

Wayan Somayasa

1. Introduction

2. Residual partial sums limit process

3. Examples

4. Kolmogorov type test

References

Discussion and feedback

Model-Check Based on Residual Partial Sums

Process of Heteroscedastic spatial Linear Regression Models